Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pgbouncer has 0 active and idle server connections when max_client_conn is reached #1070

Open
Avinodh opened this issue May 14, 2024 · 2 comments

Comments

@Avinodh
Copy link
Contributor

Avinodh commented May 14, 2024

I have a service using the following Pgbouncer configs:

[databases]
    <db_client> = host=<rds-db-endpoint> port=5432 dbname=<db> pool_size=200 pool_mode=transaction
    
[pgbouncer]
listen_addr=0.0.0.0
pidfile=/home/pgbouncer/pid
ignore_startup_parameters=extra_float_digits
unix_socket_dir=/tmp
auth_type=md5
auth_file=<auth_file>
pool_mode=transaction
server_round_robin=1
server_check_query=select 1
dns_max_ttl=2
dns_nxdomain_ttl=2
default_pool_size=70
stats_users=stats
admin_users=admin
log_connections=1
log_disconnections=1
listen_port=5432
max_client_conn=600
min_pool_size=0
pkt_buf=4096
reserve_pool_size=0
reserve_pool_timeout=0
server_lifetime=300
server_reset_query=
tcp_keepalive=1
tcp_keepidle=30
tcp_keepintvl=10
tcp_keepcnt=3
tcp_user_timeout=60000

There are 9 instances of this Pgbouncer running behind a load balancer. Further elaborating on some of the configs:

  1. max_client_conn = 600 (total client connections across the fleet = 600 * 9 = 5400)
  2. pool_size = 200 (total server connections from pgbouncer to DB = 200 * 9 = 1800)
  3. server_lifetime = 300

I noticed that during an increased period of traffic where we we saturate client connections, ie, hit max_client_conn on all the various pgbouncer instances:

. . .
LOG C-0xaaaad346fd60: (nodb)/(nouser)@ <redacted> closing because: no more connections allowed (max_client_conn) (age=0s)
WARNING C-0xaaaad346fd60: (nodb)/(nouser)@ <redacted> pooler error: no more connections allowed (max_client_conn)
. . .

we also see that the pgbouncer_pools_sv_active and pgbouncer_pools_sv_idle metrics exported by pgbouncer-exporteracross all pgbouncer hosts drops to 0.

On the DB (AWS RDS), we see this manifesting in the form of significantly increased CPU (30% -> 90%) and from 30 steady state DB connections to ~10 connections.

This leads to client connections to remain backed up, with pgbouncer seemingly unable to hand out new server connections. When I churned the pgbouncer instances, ie, by terminating and bringing up new instances, the issue mitigated I saw idle + active server connections increasing. This also caused the waiting clients to drain.

I am looking for advice/ideas on what could potentially cause pgbouncer to get into this state. Specifically:

  1. In what situations can both sv_active and sv_idle metrics drop to 0?
  2. What would make pgbouncer be unable to establish server connections even though the DB ins reachable and healthy?
  3. What other metrics/connection states can I track which might explain why there were 0 active and idle server connections?

Thanks!

@Avinodh
Copy link
Contributor Author

Avinodh commented May 17, 2024

@JelteF - In case you have observed this behavior before.

@Avinodh
Copy link
Contributor Author

Avinodh commented May 28, 2024

@JelteF - I saw this previous issue #1054 related to a large number of incoming connections stalling pgbouncer when using PAM authentication. In my issue however, we are just using the auth_file based authentication.
Could this still be an issue in this mode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant