Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMQX is not able to connect to Postgres since version 5.0.14 #9907

Closed
rubengees opened this issue Feb 3, 2023 · 18 comments · Fixed by #9978
Closed

EMQX is not able to connect to Postgres since version 5.0.14 #9907

rubengees opened this issue Feb 3, 2023 · 18 comments · Fixed by #9978

Comments

@rubengees
Copy link

What happened?

In version 5.0.13 EMQX was able to connect to Postgres with the following configuration (k8s):

- name: EMQX_AUTHENTICATION__1__MECHANISM
  value: password_based
- name: EMQX_AUTHENTICATION__1__BACKEND
  value: postgresql
- name: EMQX_AUTHENTICATION__1__ENABLE
  value: "true"
- name: EMQX_AUTHENTICATION__1__DATABASE
  value: emqx
- name: EMQX_AUTHENTICATION__1__POOL_SIZE
  value: "2"
- name: EMQX_AUTHENTICATION__1__USERNAME
  value: user
- name: EMQX_AUTHENTICATION__1__PASSWORD
  valueFrom:
    secretKeyRef:
      name: secret
      key: password
- name: EMQX_AUTHENTICATION__1__PASSWORD_HASH_ALGORITHM__NAME
  value: bcrypt
- name: EMQX_AUTHENTICATION__1__SERVER
  value: '"some-complicated-hostname-1234-0.b.db.ondigitalocean.com:25060"'
- name: EMQX_AUTHENTICATION__1__QUERY
  value: "SELECT password, is_superuser FROM mqtt_user where username = ${username} LIMIT 1"
- name: EMQX_AUTHENTICATION__1__SSL__ENABLE
  value: "true"
- name: EMQX_AUTHENTICATION__1__SSL__CACERTFILE
  value: /etc/ca-certificate-postgresql.crt

As of version 5.0.14 this does not work anymore. Same for the latest version 5.0.16. The logs are:

emqx 2023-02-03T12:24:02.199598+00:00 [error] REPORT_CB/2 CRASH: [{mfa,["supervisor",":","start_children","/2"]},{line,392},{label,{supervisor,start_error}},{report,[{
supervisor,{<0.3204.0>,ecpool_pool_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_child,{worker,1},invalid_authorization_specification}}},{offende
r,[{pid,undefined},{id,worker_sup},{mfargs,{ecpool_worker_sup,start_link,['emqx_connector_pgsql:9:10',emqx_connector_pgsql,[{host,"some-complicated-hostname-1234-0.b.db.ondig
italocean.com"},{port,25060},{username,<<"user">>},{password,#Fun<emqx_secret.0.22610105>},{database,<<"emqx">>},{auto_reconnect,2},{pool_size,3
},{ssl,required},{ssl_opts,[{cacertfile,"/etc/ca-certificate-postgresql.crt"},{verify,verify_none},{versions,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1]},{ciphers,["TLS_AES_
256_GCM_SHA384","TLS_AES_128_GCM_SHA256","TLS_CHACHA20_POLY1305_SHA256","TLS_AES_128_CCM_SHA256","TLS_AES_128_CCM_8_SHA256","ECDHE-ECDSA-AES256-GCM-SHA384","ECDHE-RSA-
AES256-GCM-SHA384","ECDHE-ECDSA-AES256-SHA384","ECDHE-RSA-AES256-SHA384","ECDH-ECDSA-AES256-GCM-SHA384","ECDH-RSA-AES256-GCM-SHA384","ECDH-ECDSA-AES256-SHA384","ECDH-R
SA-AES256-SHA384","DHE-DSS-AES256-GCM-SHA384","DHE-DSS-AES256-SHA256","AES256-GCM-SHA384","AES256-SHA256","ECDHE-ECDSA-AES128-GCM-SHA256","ECDHE-RSA-AES128-GCM-SHA256"
,"ECDHE-ECDSA-AES128-SHA256","ECDHE-RSA-AES128-SHA256","ECDH-ECDSA-AES128-GCM-SHA256","ECDH-RSA-AES128-GCM-SHA256","ECDH-ECDSA-AES128-SHA256","ECDH-RSA-AES128-SHA256",
"DHE-DSS-AES128-GCM-SHA256","DHE-DSS-AES128-SHA256","AES128-GCM-SHA256","AES128-SHA256","ECDHE-ECDSA-AES256-SHA","ECDHE-RSA-AES256-SHA","DHE-DSS-AES256-SHA","ECDH-ECDS
A-AES256-SHA","ECDH-RSA-AES256-SHA","ECDHE-ECDSA-AES128-SHA","ECDHE-RSA-AES128-SHA","DHE-DSS-AES128-SHA","ECDH-ECDSA-AES128-SHA","ECDH-RSA-AES128-SHA","RSA-PSK-AES256-
GCM-SHA384","RSA-PSK-AES256-CBC-SHA384","RSA-PSK-AES128-GCM-SHA256","RSA-PSK-AES128-CBC-SHA256","RSA-PSK-AES256-CBC-SHA","RSA-PSK-AES128-CBC-SHA"]},{reuse_sessions,tru
e},{depth,10},{secure_renegotiate,true}]}]]}},{restart_type,transient},{significant,false},{shutdown,infinity},{child_type,supervisor}]}]}]; Reason: {error,function_cl
ause,[{supervisor,format_log_multi,[[{mfa,["supervisor",":","start_children","/2"]},{line,392},{label,{supervisor,start_error}},{report,[{supervisor,{<0.3204.0>,ecpool
_pool_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_child,{worker,1},invalid_authorization_specification}}},{offender,[{pid,undefined},{id,worker
_sup},{mfargs,{ecpool_worker_sup,start_link,['emqx_connector_pgsql:9:10',emqx_connector_pgsql,[{host,"some-complicated-hostname-1234-0.b.db.ondigitalocean.com"},{port,25060},
{username,<<"user">>},{password,#Fun<emqx_secret.0.22610105>},{database,<<"emqx">>},{auto_reconnect,2},{pool_size,3},{ssl,required},{ssl_opts,[{
cacertfile,"/etc/ca-certificate-postgresql.crt"},{verify,verify_none},{versions,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1]},{ciphers,["TLS_AES_256_GCM_SHA384","TLS_AES_128_
GCM_SHA256","TLS_CHACHA20_POLY1305_SHA256","TLS_AES_128_CCM_SHA256","TLS_AES_128_CCM_8_SHA256","ECDHE-ECDSA-AES256-GCM-SHA384","ECDHE-RSA-AES256-GCM-SHA384","ECDHE-ECD
SA-AES256-SHA384","ECDHE-RSA-AES256-SHA384","ECDH-ECDSA-AES256-GCM-SHA384","ECDH-RSA-AES256-GCM-SHA384","ECDH-ECDSA-AES256-SHA384","ECDH-RSA-AES256-SHA384","DHE-DSS-AE
S256-GCM-SHA384","DHE-DSS-AES256-SHA256","AES256-GCM-SHA384","AES256-SHA256","ECDHE-ECDSA-AES128-GCM-SHA256","ECDHE-RSA-AES128-GCM-SHA256","ECDHE-ECDSA-AES128-SHA256",
"ECDHE-RSA-AES128-SHA256","ECDH-ECDSA-AES128-GCM-SHA256","ECDH-RSA-AES128-GCM-SHA256","ECDH-ECDSA-AES128-SHA256","ECDH-RSA-AES128-SHA256","DHE-DSS-AES128-GCM-SHA256","
DHE-DSS-AES128-SHA256","AES128-GCM-SHA256","AES128-SHA256","ECDHE-ECDSA-AES256-SHA","ECDHE-RSA-AES256-SHA","DHE-DSS-AES256-SHA","ECDH-ECDSA-AES256-SHA","ECDH-RSA-AES25
6-SHA","ECDHE-ECDSA-AES128-SHA","ECDHE-RSA-AES128-SHA","DHE-DSS-AES128-SHA","ECDH-ECDSA-AES128-SHA","ECDH-RSA-AES128-SHA","RSA-PSK-AES256-GCM-SHA384","RSA-PSK-AES256-C
BC-SHA384","RSA-PSK-AES128-GCM-SHA256","RSA-PSK-AES128-CBC-SHA256","RSA-PSK-AES256-CBC-SHA","RSA-PSK-AES128-CBC-SHA"]},{reuse_sessions,true},{depth,10},{secure_renegot
BC-SHA384","RSA-PSK-AES128-GCM-SHA256","RSA-PSK-AES128-CBC-SHA256","RSA-PSK-AES256-CBC-SHA","RSA-PSK-AES128-CBC-SHA"]},{reuse_sessions,true},{depth,10},{secure_renegot
iate,true}]}]]}},{restart_type,transient},{significant,false},{shutdown,infinity},{child_type,supervisor}]}]}],#{chars_limit => unlimited,depth => 100,encoding => utf8
,single_line => true}],[{file,"supervisor.erl"},{line,1657}]},{supervisor,format_log,2,[{file,"supervisor.erl"},{line,1619}]}]}
emqx 2023-02-03T12:24:02.203864+00:00 [error] msg: start_ecpool_error, mfa: emqx_plugin_libs_pool:start_pool/3, line: 48, pool_name: 'emqx_connector_pgsql:9:10', reason: invalid_authorization_specification
emqx 2023-02-03T12:43:03.134463+00:00 [warning] msg: start_resource_failed, mfa: emqx_resource_manager:start_resource/2, line: 494, id: <<"emqx_connector_pgsql:9">>, reason: {start_pool_failed,'emqx_connector_pgsql:9:10',invalid_authorization_specification}

What did you expect to happen?

EMQX connects to Postgres.

How can we reproduce it (as minimally and precisely as possible)?

No response

Anything else we need to know?

This may be related to #9614. I tried to configure the host in both these ways:

- name: EMQX_AUTHENTICATION__1__SERVER
  value: '"some-complicated-hostname-1234-0.b.db.ondigitalocean.com:25060"'
- name: EMQX_AUTHENTICATION__1__SERVER
  value: "some-complicated-hostname-1234-0.b.db.ondigitalocean.com:25060"

None worked or changed the logs.

EMQX version

$ ./bin/emqx_ctl broker
sysdescr  : EMQX
version   : 5.0.16
datetime  : 2023-02-03T12:44:45.431454132+00:00
uptime    : 20 minutes, 48 seconds

OS version

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
$ uname -a
Linux emqx-core-2 5.10.0-0.bpo.15-amd64 #1 SMP Debian 5.10.120-1~bpo10+1 (2022-06-13) x86_64 GNU/Linux

Log files

See above.

@rubengees rubengees added the BUG label Feb 3, 2023
@tmirun
Copy link

tmirun commented Feb 6, 2023

same here with PostgreSQL version 15.1

@tigercl
Copy link
Member

tigercl commented Feb 8, 2023

@rubengees Sorry for keep you waiting.

According to the log, this should not be caused by a server configuration error. You can see the invalid_authorization_specification error in the log.

It may be that your username and password are configured incorrectly, or you have enabled TLS for PostgreSQL, or you may not have opened the corresponding port.

@timesy
Copy link

timesy commented Feb 8, 2023

I have the same problem on 5.0.14. A downgrade to 5.0.13 made it possible to connect again with the same configuration as before.

@tigercl
Copy link
Member

tigercl commented Feb 8, 2023

I have the same problem on 5.0.14. A downgrade to 5.0.13 made it possible to connect again with the same configuration as before.

Also is there an invalid_authorization_specification error in the logs?

@timesy
Copy link

timesy commented Feb 8, 2023

Yes, I have the following error:

2023-02-08T06:14:54.746613+00:00 [error] line: 48, mfa: emqx_plugin_libs_pool:start_pool/3, msg: start_ecpool_error, pool_name: 'emqx_connector_pgsql:5:6', reason: invalid_authorization_specification 2023-02-08T06:14:54.746743+00:00 [error] id: <<"emqx_connector_pgsql:5">>, line: 495, mfa: emqx_resource_manager:start_resource/2, msg: start_resource_failed, reason: {start_pool_failed,'emqx_connector_pgsql:5:6',invalid_authorization_specification}

@thalesmg
Copy link
Contributor

Hi @rubengees and @timesy !

I'm trying to reproduce the problem locally, but I could not make it
happen when using correct configurations. One way I found to
reproduce the issue locally was to manually change the pg_hba.conf
file to use the ident auth method, something like:

host    all all 0.0.0.0/0 ident
host    all all ::/0      ident

Using this config and correct credentials indeed yielded the same
error code:

2023-02-14T17:47:57.196365+00:00 [error] REPORT_CB/2 CRASH: [{mfa,["supervisor",":","start_children","/2"]},{line,392},{label,{supervisor,start_error}},{report,[{supervisor,{<0.2803.0>,ecpool_pool_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_child,{worker,1},invalid_authorization_specification}}},{offender,[{pid,undefined},{id,worker_sup},{mfargs,{ecpool_worker_sup,start_link,['emqx_authn_pgsql:1:2',emqx_connector_pgsql,

So, could you share your pg_hba.conf? And also check if changes
have been made to that, concurrently with the EMQX upgrade?

@thalesmg
Copy link
Contributor

Also, maybe check if the auth method in your pg_hba.conf should perhaps be password for the CIDR that matches the EMQX host inside your k8s cluster that is connecting to the remote Postgres server.

@timesy
Copy link

timesy commented Feb 14, 2023

Thanks for taking the time @thalesmg ! As my database is hosted on Azure I can't (to my knowledge) get the pg_hba.conf file.

However I had a working installation, did the upgrade where it didn't work and then a downgrade shortly after where it was working again. I had this happen on two different instances with two different Postgres databases. I am using the emqx-operator, if this changes anything?

@thalesmg
Copy link
Contributor

@timesy thanks for the response, I'll have to investigate further, as I don't have access to an Azure instance. 🤔

Meanwhile, could you please check if the node IP from which EMQX is running in your k8s cluster can currently access the remote database (i.e.: the firewall port configurations are correct at the k8s and Azure sides)? One possible reason for error 28000 in Postgres is related to firewall configuration. (refs: 1 and 2).

@timesy
Copy link

timesy commented Feb 14, 2023

I am honestly not sure what the best way is to test it from inside the pod as no psql is installed and I have no root access in the container, any pointers?

@rubengees
Copy link
Author

rubengees commented Feb 14, 2023

@thalesmg Thanks for answering! Same case for me, I use a managed DigitalOcean Postgres (14) database which I have no admin access to. My pods can access the database as can all other pods in the k8s cluster (from which some successfully do). Let me know If I can provide any useful information.

@thalesmg
Copy link
Contributor

I am honestly not sure what the best way is to test it from inside the pod as no psql is installed and I have no root access in the container, any pointers?

If you have permissions to deploy pods to the same nodes in which EMQX is running, one option would be to run postgres pods on those same nodes and just try to connect using psql the same credentials to the database. Another quick thing to test would be a simple port check. If you have permission to access the remote console in the EMQX pod, you could try:

case gen_tcp:connect("my.db.host", 5432, []) of
    {ok, P} ->
        gen_tcp:close(P),
        ok;
    Error ->
        Error
end.

If this returns ok, then the port is accessible from the EMQX node.

@thalesmg
Copy link
Contributor

I believe I found the issue between the versions. It seems that, when one chooses to use SSL connections for Postgres in EMQX, the SSL configuration is slightly different in 5.0.16, which yields the different behavior.

We will work on a fix for that.

Meanwhile, would you like to try a patch in your server to verify you can connect? I've made a quick fix to test the hypothesis above and it worked locally for me, after reproducing your issue.

emqx_connector_pgsql.beam.zip

To use the patch, just unzip the .beam file from the above attached zip and move it to the patches directory under EMQX's data dir (for your installation it's probably /var/lib/emqx/) and either restart the pod or run emqx eval 'c:lm().' inside your EMQX pod. This doc reference may help with those steps.

@rubengees
Copy link
Author

rubengees commented Feb 15, 2023

If still relevant: I ran the snippet you sent with my credentials via emqx remote_console on the running pod in both version 5.0.13 and 5.0.16. Both return ok.

I tried the patch and it appears to be working! I mounted it into /opt/emqx/data/patches/emqx_connector_pgsql.beam and upon restart the pods are now able to connect to postgres.

@thalesmg thalesmg self-assigned this Feb 15, 2023
@timesy
Copy link

timesy commented Feb 15, 2023

If still relevant: I ran the snippet you sent with my credentials via emqx remote_console on the running pod in both version 5.0.13 and 5.0.16. Both return ok.

I tried the patch and it appears to be working! I mounted it into /opt/emqx/data/patches/emqx_connector_pgsql.beam and upon restart the pods are now able to connect to postgres.

Same for me, thanks for the quick fix!

thalesmg added a commit to thalesmg/emqx that referenced this issue Feb 15, 2023
…vior (5.0)

Fixes emqx#9907

At v5.0.14, we changed the `ssl` option for the Postgres connector
from `true` to `required`, but there was another transformation in
`conn_opts/2` that led to an incorrect configuration.  This change
ended up preventing users from connecting to Postgres with their
previous configurations after upgrading EMQX.
@thalesmg
Copy link
Contributor

I've made another patched .beam that should be more correct after the initial quick fix, if you are willing to give it a try as well 😸 :

emqx_connector_pgsql.beam.zip

(worked locally for me in the same scenario as before)

@timesy
Copy link

timesy commented Feb 16, 2023

Just tested it on my k8s cluster and this patch works as well!

@thalesmg
Copy link
Contributor

Thank you @timesy and @rubengees for reporting and testing the patches. 🍻

This fix should be included in the next release (v5.0.18).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants