Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL: empty password returned by client when auth_type = pam #285

Closed
achix opened this issue Mar 19, 2018 · 23 comments

Comments

@achix
Copy link

@achix achix commented Mar 19, 2018

Hello,
this is with the latest pgbouncer (under bhyve) 1.8.1, configured to work with PAM, against PostgreSQL 10.3 (FreeBSD), . It works for some day(s) till I start getting :

client shell

pgbouncer@ubuntubig:~$ psql -h localhost -p 6432 -U amantzio dynacom  
psql: ERROR:  pgbouncer cannot connect to server

pgbouncer log

2018-03-19 13:18:08.075 24938 LOG Stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us wait time 0 us
2018-03-19 13:19:08.080 24938 LOG Stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us wait time 0 us
2018-03-19 13:19:35.385 24938 WARNING lookup failed: smadev.internal.net: result=0
2018-03-19 13:19:35.385 24938 LOG S-0x2217ff0: dynacom/amantzio@(bad-af):0 closing because: server dns lookup failed (age=0)
2018-03-19 13:19:37.639 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:57659 closing because: query_wait_timeout (age=1204)
2018-03-19 13:19:37.639 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:57659 Pooler Error: query_wait_timeout
2018-03-19 13:19:58.751 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25019 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:19:58.751 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25019 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:19:58.751 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25019 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:19:58.753 24938 WARNING lookup failed: smadev.internal.net: result=0
2018-03-19 13:19:58.753 24938 LOG S-0x2217ff0: dynacom/amantzio@(bad-af):0 closing because: server dns lookup failed (age=0)
2018-03-19 13:20:08.083 24938 LOG Stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us wait time 0 us
2018-03-19 13:20:17.943 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25021 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:17.943 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25021 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:17.943 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25021 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:17.946 24938 WARNING lookup failed: smadev.internal.net: result=0
2018-03-19 13:20:17.946 24938 LOG S-0x2217ff0: dynacom/amantzio@(bad-af):0 closing because: server dns lookup failed (age=0)
2018-03-19 13:20:19.330 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25022 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:19.330 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25022 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:19.330 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25022 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:20.086 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25023 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:20.086 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25023 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:20.087 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25023 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:20.691 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25024 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:20.691 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25024 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:20.691 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25024 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:20.698 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25025 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:20.698 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25025 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:20.698 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25025 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:23.110 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25026 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:23.110 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25026 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:23.110 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25026 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:23.740 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25027 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:23.741 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25027 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:23.741 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25027 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:23.743 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25028 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:23.743 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25028 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:23.743 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25028 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:25.324 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25029 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:25.324 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25029 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:25.324 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25029 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:26.033 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25030 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:26.033 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25030 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:26.033 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25030 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:26.035 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25031 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:26.035 24938 LOG C-0x2213030: dynacom/amantzio@172.16.0.1:25031 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:26.035 24938 WARNING C-0x2213030: dynacom/amantzio@172.16.0.1:25031 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:38.509 24938 LOG C-0x2213030: pgbouncer/pgbouncer@[::1]:53830 login attempt: db=pgbouncer user=pgbouncer tls=no
2018-03-19 13:20:44.459 24938 LOG C-0x2213030: pgbouncer/pgbouncer@[::1]:53830 closing because: client close request (age=5)
2018-03-19 13:20:49.384 24938 LOG C-0x2213030: dynacom/amantzio@[::1]:53832 login attempt: db=dynacom user=amantzio tls=no
2018-03-19 13:20:49.384 24938 LOG C-0x2213030: dynacom/amantzio@[::1]:53832 closing because: pgbouncer cannot connect to server (age=0)
2018-03-19 13:20:49.384 24938 WARNING C-0x2213030: dynacom/amantzio@[::1]:53832 Pooler Error: pgbouncer cannot connect to server
2018-03-19 13:20:49.385 24938 LOG S-0x2217ff0: dynacom/amantzio@10.9.200.131:5432 new connection to server (from 172.16.0.3:55038)
2018-03-19 13:20:49.393 24938 ERROR S: login failed: FATAL: empty password returned by client
2018-03-19 13:20:49.393 24938 LOG S-0x2217ff0: dynacom/amantzio@10.9.200.131:5432 closing because: login failed (age=0)

postgresql log

172.16.0.3(54936) [1419] 5aaf9a6e.58b 2018-03-19 13:09:35.055 EET SMA  amantzio@dynacom line:6 HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.
172.16.0.3(54936) [1419] 5aaf9a6e.58b 2018-03-19 13:10:35.471 EET SMA  amantzio@dynacom line:7 LOG:  disconnection: session time: 0:01:00.589 user=amantzio database=dynacom host=172.16.0.3 port=54936
172.16.0.3(55038) [2030] 5aaf9d10.7ee 2018-03-19 13:20:48.873 EET [unknown] [unknown]@[unknown] line:1 LOG:  connection received: host=172.16.0.3 port=55038
172.16.0.3(55038) [2030] 5aaf9d10.7ee 2018-03-19 13:20:48.875 EET [unknown] amantzio@dynacom line:2 FATAL:  empty password returned by client

If I restart pgbouncer, it works again (till it fails in the same fashion again).
Then I edited hosts and specified the ip in order to avoid the DNS lookup failure mentioned in the pgbouncer log at 2018-03-19 13:19:35.385, and I'll report back.

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Mar 28, 2018

@achix Have you managed to fix this issue?

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Apr 3, 2018

The problem reappeared today, doing some heavy stuff with lots of connections.
One thing I didn't mention is that I made some patches to both pgsql and pgbouncer :

% diff -u /usr/local/src/postgresql-10.3/src/backend/utils/misc/guc.c.orig /usr/local/src/postgresql-10.3/src/backend/utils/misc/guc.c
--- /usr/local/src/postgresql-10.3/src/backend/utils/misc/guc.c.orig    2018-02-27 00:10:47.000000000 +0200
+++ /usr/local/src/postgresql-10.3/src/backend/utils/misc/guc.c 2018-03-15 13:43:45.870975000 +0200
@@ -3266,7 +3266,7 @@
                {"search_path", PGC_USERSET, CLIENT_CONN_STATEMENT,
                        gettext_noop("Sets the schema search order for names that are not schema-qualified."),
                        NULL,
-                       GUC_LIST_INPUT | GUC_LIST_QUOTE
+                       GUC_LIST_INPUT | GUC_REPORT
                },
                &namespace_search_path,
                "\"$user\", public",
postgres@smadev:~/database% 
pgbouncer@ubuntubig:~$ diff -u pgbouncer-1.8.1/src/varcache.c.orig pgbouncer-1.8.1/src/varcache.c
--- pgbouncer-1.8.1/src/varcache.c.orig 2018-03-15 15:44:30.176395800 +0200
+++ pgbouncer-1.8.1/src/varcache.c      2018-03-15 15:44:04.521229924 +0200
@@ -35,6 +35,7 @@
  {"TimeZone",                    VTimeZone },
  {"standard_conforming_strings", VStdStr },
  {"application_name",            VAppName },
+ {"search_path",                 VSearchPath },
  {NULL},
 };
pgbouncer@ubuntubig:~$ diff -u pgbouncer-1.8.1/include/varcache.h.orig pgbouncer-1.8.1/include/varcache.h
--- pgbouncer-1.8.1/include/varcache.h.orig     2018-03-15 15:45:19.131905549 +0200
+++ pgbouncer-1.8.1/include/varcache.h  2018-03-15 15:45:46.182373422 +0200
@@ -5,6 +5,7 @@
        VTimeZone,
        VStdStr,
        VAppName,
+       VSearchPath,
        NumVars
 };

I also attach my pgbouncer conf file .
pgbouncer.ini-transaction-pam-ldap-ok.txt

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented May 29, 2018

The problem is still there. When this happens then I cannot connect to pgbouncer via psql. It just does not prompt for password. But trying with a different user works fine.
Can anyone point me to some direction (debugging , logging) anything that would help solve the problem?

Could my patch described above be a possible cause for this? I'll with a stock installation. I'll try after reverting my patch and also by adding with extra verbosity.

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented May 29, 2018

Here are the logs for the connection attempt via psql which seems to work and the one who doesn't. User does not really matter, in other moments it is the other way around

psql -h localhost -p 6432 -U amantzio dynacom
Password for user amantzio:

gives in the log

2018-05-29 22:24:19.679 3631 LOG Stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us wait time 0 us
2018-05-29 22:25:19.680 3631 LOG Stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us wait time 0 us
2018-05-29 22:25:26.862 3631 NOISE safe_accept(8) = 10 (::1/51766)
2018-05-29 22:25:26.862 3631 NOISE new fd from accept=10
2018-05-29 22:25:26.862 3631 NOISE resync: done=0, parse=0, recv=0
2018-05-29 22:25:26.862 3631 NOISE safe_recv(10, 4096) = 8
2018-05-29 22:25:26.862 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:51766 pkt='!' len=8
2018-05-29 22:25:26.862 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:51766 C: req SSL
2018-05-29 22:25:26.862 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:51766 P: nak
2018-05-29 22:25:26.863 3631 NOISE safe_send(10, 1) = 1
2018-05-29 22:25:26.863 3631 NOISE resync: done=8, parse=8, recv=8
2018-05-29 22:25:26.863 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:51766 P: got connection: ::1:51766 -> ::1:6432
2018-05-29 22:25:26.863 3631 NOISE safe_accept(8) = Resource temporarily unavailable
2018-05-29 22:25:26.865 3631 NOISE resync: done=0, parse=0, recv=0
2018-05-29 22:25:26.865 3631 NOISE safe_recv(10, 4096) = 88
2018-05-29 22:25:26.865 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:51766 pkt='!' len=88
2018-05-29 22:25:26.865 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:51766 got var: user=amantzio
2018-05-29 22:25:26.865 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:51766 got var: database=dynacom
2018-05-29 22:25:26.865 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:51766 using application_name: psql
2018-05-29 22:25:26.865 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:51766 got var: client_encoding=SQL_ASCII
2018-05-29 22:25:26.865 3631 LOG C-0x1f59660: dynacom/amantzio@[::1]:51766 login attempt: db=dynacom user=amantzio tls=no
2018-05-29 22:25:26.865 3631 NOISE safe_send(10, 9) = 9
2018-05-29 22:25:26.865 3631 NOISE resync: done=88, parse=88, recv=88
2018-05-29 22:25:26.865 3631 NOISE resync: done=0, parse=0, recv=0
2018-05-29 22:25:26.866 3631 NOISE safe_recv(10, 4096) = 0
2018-05-29 22:25:26.866 3631 LOG C-0x1f59660: dynacom/amantzio@[::1]:51766 closing because: client unexpected eof (age=0)
2018-05-29 22:25:26.866 3631 NOISE safe_close(10) = 0

while ,
psql -h localhost -p 6432 -U aalexiadis dynacom
psql: ERROR: pgbouncer cannot connect to server

does not even prompt for password and log gives

2018-05-29 22:27:28.543 3631 NOISE safe_accept(8) = 10 (::1/52780)
2018-05-29 22:27:28.543 3631 NOISE new fd from accept=10
2018-05-29 22:27:28.543 3631 NOISE resync: done=0, parse=0, recv=0
2018-05-29 22:27:28.543 3631 NOISE safe_recv(10, 4096) = 8
2018-05-29 22:27:28.543 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:52780 pkt='!' len=8
2018-05-29 22:27:28.543 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:52780 C: req SSL
2018-05-29 22:27:28.543 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:52780 P: nak
2018-05-29 22:27:28.543 3631 NOISE safe_send(10, 1) = 1
2018-05-29 22:27:28.543 3631 NOISE resync: done=8, parse=8, recv=8
2018-05-29 22:27:28.543 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:52780 P: got connection: ::1:52780 -> ::1:6432
2018-05-29 22:27:28.543 3631 NOISE safe_accept(8) = Resource temporarily unavailable
2018-05-29 22:27:28.544 3631 NOISE resync: done=0, parse=0, recv=0
2018-05-29 22:27:28.544 3631 NOISE safe_recv(10, 4096) = 90
2018-05-29 22:27:28.544 3631 NOISE C-0x1f59660: (nodb)/(nouser)@[::1]:52780 pkt='!' len=90
2018-05-29 22:27:28.544 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:52780 got var: user=aalexiadis
2018-05-29 22:27:28.544 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:52780 got var: database=dynacom
2018-05-29 22:27:28.544 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:52780 using application_name: psql
2018-05-29 22:27:28.544 3631 DEBUG C-0x1f59660: (nodb)/(nouser)@[::1]:52780 got var: client_encoding=SQL_ASCII
2018-05-29 22:27:28.544 3631 LOG C-0x1f59660: dynacom/aalexiadis@[::1]:52780 login attempt: db=dynacom user=aalexiadis tls=no
2018-05-29 22:27:28.544 3631 LOG C-0x1f59660: dynacom/aalexiadis@[::1]:52780 closing because: pgbouncer cannot connect to server (age=0)
2018-05-29 22:27:28.544 3631 WARNING C-0x1f59660: dynacom/aalexiadis@[::1]:52780 Pooler Error: pgbouncer cannot connect to server
2018-05-29 22:27:28.544 3631 NOISE safe_send(10, 56) = 56
2018-05-29 22:27:28.544 3631 NOISE safe_close(10) = 0
2018-05-29 22:27:28.544 3631 NOISE S-0x1f52b10: dynacom/aalexiadis@(bad-af):0 inet socket: smadev.internal.net
2018-05-29 22:27:28.544 3631 NOISE S-0x1f52b10: dynacom/aalexiadis@(bad-af):0 dns socket: smadev.internal.net
2018-05-29 22:27:28.544 3631 NOISE dns: ttl over: smadev.internal.net
2018-05-29 22:27:28.544 3631 NOISE dns: ares_gethostbyname(smadev.internal.net)
2018-05-29 22:27:28.544 3631 NOISE dns: xares_host_cb(smadev.internal.net)=Successful completion
2018-05-29 22:27:28.544 3631 NOISE DNS: smadev.internal.net[0] = 10.9.200.131:0 [STREAM]
2018-05-29 22:27:28.544 3631 NOISE dns: deliver_info(smadev.internal.net) addr=10.9.200.131:0
2018-05-29 22:27:28.544 3631 DEBUG S-0x1f52b10: dynacom/aalexiadis@(bad-af):0 dns_callback: inet4: 10.9.200.131:5432
2018-05-29 22:27:28.544 3631 DEBUG S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 launching new connection to server
2018-05-29 22:27:28.544 3631 NOISE connect(10, 10.9.200.131:5432) = Operation now in progress
2018-05-29 22:27:28.544 3631 DEBUG S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 S: connect ok
2018-05-29 22:27:28.544 3631 LOG S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 new connection to server (from 172.16.0.3:17398)
2018-05-29 22:27:28.544 3631 NOISE S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 P: startup
2018-05-29 22:27:28.544 3631 NOISE safe_send(10, 42) = 42
2018-05-29 22:27:28.546 3631 NOISE resync: done=0, parse=0, recv=0
2018-05-29 22:27:28.546 3631 NOISE safe_recv(10, 4096) = 9
2018-05-29 22:27:28.546 3631 NOISE S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 S: pkt 'R', len=9
2018-05-29 22:27:28.546 3631 DEBUG S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 calling login_answer
2018-05-29 22:27:28.546 3631 DEBUG S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 S: req cleartext password
2018-05-29 22:27:28.546 3631 DEBUG S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 P: send clear password
2018-05-29 22:27:28.546 3631 NOISE safe_send(10, 6) = 6
2018-05-29 22:27:28.546 3631 NOISE resync: done=9, parse=9, recv=9
2018-05-29 22:27:28.546 3631 NOISE resync: done=0, parse=0, recv=0
2018-05-29 22:27:28.546 3631 NOISE safe_recv(10, 4096) = 97
2018-05-29 22:27:28.546 3631 NOISE S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 S: pkt 'E', len=97
2018-05-29 22:27:28.546 3631 ERROR S: login failed: FATAL: empty password returned by client
2018-05-29 22:27:28.546 3631 LOG S-0x1f52b10: dynacom/aalexiadis@10.9.200.131:5432 closing because: login failed (age=0)
2018-05-29 22:27:28.546 3631 NOISE safe_send(10, 5) = 5
2018-05-29 22:27:28.546 3631 NOISE safe_close(10) = 0

Apparently some state in the 2nd case is seriously broken. If i restart pgbouncer it resumes normal operation.
I'll post the scripts that can reproduce this problem shortly.

@svb007

This comment has been minimized.

Copy link

@svb007 svb007 commented May 30, 2018

You may be able to benefit from my patch in #276 called cannot_connect_to_server.patch.txt . It is a workaround that basically causes pgbouncer to re-prompt for the password after "server_login_retry" seconds, and not go into this endless loop where it obviously has lost the password, and cannot simply do a reconnect. Basically pgbouncer is disconnecting clients quickly with "Cannot connect to server", and in the background it tries to re-establish a connection to the database, but it seems that something has caused pgbouncer at this stage to lose the password for this user. But instead of it just simply prompting for a new password, it indefinitely disconnects with "cannot connect to server" until you restart pgbouncer.

(EDIT) As in my case, you might have "broken" pgbouncer slightly with your patch

(PS - just to clarify, I'm not a pgbouncer developer, just a user)

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented May 30, 2018

Hello,
the problem is there with the stock code. But OTOH it is documented : https://pgbouncer.github.io/config.html that playing around with query_wait_timeout (as in my case) might cause unexpected results.
Might be a combination of query_wait_timeout being too low and client_login_timeout much higher. I'll do more tests. I'll take a look at your patch as well, thanks for sharing :)

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented May 30, 2018

I get the problem consistently with values like

query_wait_timeout = 0.1
client_login_timeout = 10

whereas the problem does not happen either with high values for query_wait_timeout or lower values for client_login_timeout.

@fvannee

This comment has been minimized.

Copy link

@fvannee fvannee commented May 31, 2018

I am having exactly the same issue when putting pgbouncer under heavy load. Also with PAM authentication on PG10/pgbouncer 1.8.1
I do not have any modifications to the source code though. Settings are default except for
max_client_conn = 200 default_pool_size = 5
When trying to connect in parallel with several processes to pgbouncer and run a query:

2018-05-31 11:29:37.952 54162 LOG C-0x1b71240: rosmerta_vof/florisvannee@10.231.40.70:39796 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:37.953 54162 LOG C-0x1b713d8: rosmerta_vof/florisvannee@10.231.40.70:39798 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:37.964 54162 LOG C-0x1b71570: rosmerta_vof/florisvannee@10.231.40.70:39800 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:37.968 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39802 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:38.224 54162 LOG S-0x1b6bd90: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:38.248 54162 LOG S-0x1b6bf28: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:38.276 54162 LOG S-0x1b6c0c0: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:38.310 54162 LOG S-0x1b6c258: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:38.836 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39802 closing because: client close request (age=0)
2018-05-31 11:29:38.836 54162 LOG S-0x1b6bd90: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:38.880 54162 LOG C-0x1b713d8: rosmerta_vof/florisvannee@10.231.40.70:39798 closing because: client close request (age=0)
2018-05-31 11:29:38.880 54162 LOG S-0x1b6bf28: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:38.894 54162 LOG C-0x1b71570: rosmerta_vof/florisvannee@10.231.40.70:39800 closing because: client close request (age=0)
2018-05-31 11:29:38.894 54162 LOG S-0x1b6c0c0: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:38.942 54162 LOG C-0x1b71240: rosmerta_vof/florisvannee@10.231.40.70:39796 closing because: client close request (age=1)
2018-05-31 11:29:38.942 54162 LOG S-0x1b6c258: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:38.944 54162 LOG C-0x1b71570: rosmerta_vof/florisvannee@10.231.40.70:39804 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:38.979 54162 LOG S-0x1b6c258: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:38.986 54162 LOG C-0x1b71240: rosmerta_vof/florisvannee@10.231.40.70:39806 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:39.005 54162 LOG C-0x1b713d8: rosmerta_vof/florisvannee@10.231.40.70:39808 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:39.013 54162 LOG S-0x1b6c0c0: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:39.034 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39810 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:39.048 54162 LOG S-0x1b6bf28: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:39.080 54162 LOG S-0x1b6bd90: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:39.633 54162 LOG C-0x1b71240: rosmerta_vof/florisvannee@10.231.40.70:39806 closing because: client close request (age=0)
2018-05-31 11:29:39.633 54162 LOG S-0x1b6c0c0: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:39.640 54162 LOG C-0x1b71570: rosmerta_vof/florisvannee@10.231.40.70:39804 closing because: client close request (age=0)
2018-05-31 11:29:39.640 54162 LOG S-0x1b6c258: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:39.671 54162 LOG C-0x1b713d8: rosmerta_vof/florisvannee@10.231.40.70:39808 closing because: client close request (age=0)
2018-05-31 11:29:39.671 54162 LOG S-0x1b6bf28: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:39.692 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39810 closing because: client close request (age=0)
2018-05-31 11:29:39.692 54162 LOG S-0x1b6bd90: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:39.749 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39812 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:39.755 54162 LOG C-0x1b713d8: rosmerta_vof/florisvannee@10.231.40.70:39814 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:39.783 54162 LOG S-0x1b6bd90: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:39.802 54162 LOG C-0x1b71570: rosmerta_vof/florisvannee@10.231.40.70:39816 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:39.811 54162 LOG C-0x1b71240: rosmerta_vof/florisvannee@10.231.40.70:39818 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:39.812 54162 LOG S-0x1b6bf28: rosmerta_vof/florisvannee@unix:5432 new connection to server
2018-05-31 11:29:39.814 54162 ERROR S: login failed: FATAL: empty password returned by client
2018-05-31 11:29:39.814 54162 LOG S-0x1b6bf28: rosmerta_vof/florisvannee@unix:5432 closing because: login failed (age=0)
2018-05-31 11:29:40.397 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39812 closing because: client close request (age=0)
2018-05-31 11:29:40.397 54162 LOG S-0x1b6bd90: rosmerta_vof/florisvannee@unix:5432 closing because: unclean server (age=0)
2018-05-31 11:29:40.480 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39820 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1
2018-05-31 11:29:40.480 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39820 closing because: pgbouncer cannot connect to server (age=0)
2018-05-31 11:29:40.480 54162 WARNING C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39820 Pooler Error: pgbouncer cannot connect to server
2018-05-31 11:29:40.482 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39822 login attempt: db=rosmerta_vof user=florisvannee tls=no
2018-05-31 11:29:40.482 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39822 closing because: pgbouncer cannot connect to server (age=0)
2018-05-31 11:29:40.482 54162 WARNING C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39822 Pooler Error: pgbouncer cannot connect to server
2018-05-31 11:29:40.697 54162 LOG C-0x1b71708: rosmerta_vof/florisvannee@10.231.40.70:39824 login attempt: db=rosmerta_vof user=florisvannee tls=TLSv1.2/ECDHE-RSA-AES128-GCM-SHA256/ECDH=prime256v1

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented May 31, 2018

@svb007 I shortly tested your patch and it seems to withstand the stress tests. Do you have any performance numbers in relation to various server_login_retry setting? In theory the bigger server_login_retry is the closer we get to the stock code behavior (more rejections). The smaller it is more chances for actually served connections. But, I think I witnessed very high load with swapping/thrashing with default server_login_retry. Reducing it the system got more responsive. I'll have to verify this. I'll try to study the pgbouncer code a little bit and perhaps come back with some tests and numbers regarding load, throughput.

@svb007

This comment has been minimized.

Copy link

@svb007 svb007 commented May 31, 2018

In my case, I kept it at the default of 15 seconds, because when there is a real outage, I want clients to be disconnected quickly, but to prompt for the password again at least every 15 seconds. In my environment that is acceptable, because all we want is for services to recover after an outage, without needing to restart pgbouncer. If you set it very low, just make sure you test a real outage scenario to make sure pgbouncer doesn't get overloaded with connection requests. TBH, I think if pgbouncer can detect the error "empty password returned by client" (and others like it) while doing launch_new_connection(pool) (objects.c line 602) , it should immediately prompt for the password again. That should give the quickest recovery possible.

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jun 4, 2018

The problem can be reproduced :

  1. Just launch it with gdb
  2. set a breakpoint at objects.c:595
  3. connect to the database via psql
  4. hit cont (or run) giving passwords etc.. until you get a valid psql prompt.
  5. Then issue any valid SQL command (e.g. \d ).
  6. it should stop at the breakpoint again. - make sure to wait there for a time interval longer than query_wait_timeout
  7. after this time elapses hit cont (or run). This will fail with :

=> \d ERROR: query_wait_timeout server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
8) this will cause the specific database/user pool to go in a broken mode, corrected only with restarting pgbouncer.

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jun 4, 2018

A much simpler version of the above without gdb :

  1. Fresh start pgbouncer, connect via psql (port 6432) to the database, giving credentials, etc, you want to get valid psql prompt
  2. from another term just stop postgresql server.
  3. return to psql and give a command ( \d ) . This will fail with :
    ERROR: query_wait_timeout server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed.
  4. From the other term start postgresql
  5. The database/pool is in broken state, keeps sending empty password to the server

It seems to be a PAM problem. In order for the above scenarios to show the problem there has to be : auth_type = pam.

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jun 5, 2018

An update to this. This is not empty password. This is password with the first character 0-fied. The rest chars of password are intact. In PAM-related code something is zeroing this first char. (and this is not disable_users in loader.c as some might assume). Still looking into it. If anyone is listening and could help track down the code that does this would be great.
Basically how this should work is that some later event (release_server) would clear pool->last_connect_failed making it possible for subsequent calls to check_fast_fail to succeed.

@svb007

This comment has been minimized.

Copy link

@svb007 svb007 commented Jun 6, 2018

This patch seems to fix your specific issue :
pam_password.patch.txt

The reason seems to be in client.c line 483 where set_pool is called without a password, which results in PAM mode, in the password being totally erased:

    /* find pool */
    return set_pool(client, dbname, username, "", false);
@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jun 6, 2018

That seems to work, but is it safe? isn't this code there (passwd="") for a reason? Maybe it has to do with postponing PAM password handling at a later phase.

@svb007

This comment has been minimized.

Copy link

@svb007 svb007 commented Jun 6, 2018

Not sure, and I also tested now, if you manage to change the password while the server is down, the old behavior will be back, unless you also have the cannot_connect_to_server.patch.txt in place.
What I do know, is that every other AUTH type in the set_pool function does not overwrite the password. It seems like the correct fix to me.

@fvannee

This comment has been minimized.

Copy link

@fvannee fvannee commented Jun 6, 2018

Patch seems to work great! Thanks!

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jun 6, 2018

It seems that both patches are needed, but then the last patch never does any job because of cf_server_login_retry. If cf_server_login_retry elapses then the first patch will do the job anyway.

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jun 7, 2018

As @svb007 pointed out, it should be wrong to cache PAM passwords indefinitely. Here is a slight variation of the first patch (which does the right thing for PAM) but only for the case of PAM, which means the other auth types remain unaffected.
PAM_empty_passwd_patch.txt
It seems to solve the problem without sacrificing any of the rest of functionalities.

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jul 2, 2018

Just an update. We went live. After one week (today), and after we faced some problems (non fatal) I applied the last patch of @svb007 (pam_password.patch.txt) as well. Tested it, it doesn't cache passwords in PAM mode. Looks good after one day. (fingers crossed)
THANX!

@achix achix changed the title FATAL: empty password returned by client FATAL: empty password returned by client when auth_type = pam Jan 20, 2019
@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Jan 20, 2019

Hello people, its been almost a year, any reason those two patches not applied yet to master?

@achix

This comment has been minimized.

Copy link
Author

@achix achix commented Sep 10, 2019

Great news!
@svb007 it seems we got good news on this!

@petere petere closed this in b30b49f Sep 10, 2019
@svb007

This comment has been minimized.

Copy link

@svb007 svb007 commented Sep 10, 2019

@achix that certainly is great news. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.