Pgbouncer has 0 active and idle server connections when max_client_conn is reached #1070

Avinodh · 2024-05-14T16:26:46Z

I have a service using the following Pgbouncer configs:

[databases]
    <db_client> = host=<rds-db-endpoint> port=5432 dbname=<db> pool_size=200 pool_mode=transaction
    
[pgbouncer]
listen_addr=0.0.0.0
pidfile=/home/pgbouncer/pid
ignore_startup_parameters=extra_float_digits
unix_socket_dir=/tmp
auth_type=md5
auth_file=<auth_file>
pool_mode=transaction
server_round_robin=1
server_check_query=select 1
dns_max_ttl=2
dns_nxdomain_ttl=2
default_pool_size=70
stats_users=stats
admin_users=admin
log_connections=1
log_disconnections=1
listen_port=5432
max_client_conn=600
min_pool_size=0
pkt_buf=4096
reserve_pool_size=0
reserve_pool_timeout=0
server_lifetime=300
server_reset_query=
tcp_keepalive=1
tcp_keepidle=30
tcp_keepintvl=10
tcp_keepcnt=3
tcp_user_timeout=60000

There are 9 instances of this Pgbouncer running behind a load balancer. Further elaborating on some of the configs:

max_client_conn = 600 (total client connections across the fleet = 600 * 9 = 5400)
pool_size = 200 (total server connections from pgbouncer to DB = 200 * 9 = 1800)
server_lifetime = 300

I noticed that during an increased period of traffic where we we saturate client connections, ie, hit max_client_conn on all the various pgbouncer instances:

. . .
LOG C-0xaaaad346fd60: (nodb)/(nouser)@ <redacted> closing because: no more connections allowed (max_client_conn) (age=0s)
WARNING C-0xaaaad346fd60: (nodb)/(nouser)@ <redacted> pooler error: no more connections allowed (max_client_conn)
. . .

we also see that the pgbouncer_pools_sv_active and pgbouncer_pools_sv_idle metrics exported by pgbouncer-exporteracross all pgbouncer hosts drops to 0.

On the DB (AWS RDS), we see this manifesting in the form of significantly increased CPU (30% -> 90%) and from 30 steady state DB connections to ~10 connections.

This leads to client connections to remain backed up, with pgbouncer seemingly unable to hand out new server connections. When I churned the pgbouncer instances, ie, by terminating and bringing up new instances, the issue mitigated I saw idle + active server connections increasing. This also caused the waiting clients to drain.

I am looking for advice/ideas on what could potentially cause pgbouncer to get into this state. Specifically:

In what situations can both sv_active and sv_idle metrics drop to 0?
What would make pgbouncer be unable to establish server connections even though the DB ins reachable and healthy?
What other metrics/connection states can I track which might explain why there were 0 active and idle server connections?

Thanks!

The text was updated successfully, but these errors were encountered:

Avinodh · 2024-05-17T14:27:53Z

@JelteF - In case you have observed this behavior before.

Avinodh · 2024-05-28T16:04:17Z

@JelteF - I saw this previous issue #1054 related to a large number of incoming connections stalling pgbouncer when using PAM authentication. In my issue however, we are just using the auth_file based authentication.
Could this still be an issue in this mode?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pgbouncer has 0 active and idle server connections when max_client_conn is reached #1070

Pgbouncer has 0 active and idle server connections when max_client_conn is reached #1070

Avinodh commented May 14, 2024

Avinodh commented May 17, 2024 •

edited

Avinodh commented May 28, 2024

Pgbouncer has 0 active and idle server connections when max_client_conn is reached #1070

Pgbouncer has 0 active and idle server connections when max_client_conn is reached #1070

Comments

Avinodh commented May 14, 2024

Avinodh commented May 17, 2024 • edited

Avinodh commented May 28, 2024

Avinodh commented May 17, 2024 •

edited