Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of incoming connections blocks pgBouncer completely #1054

Open
qris opened this issue Apr 24, 2024 · 4 comments
Open

Large number of incoming connections blocks pgBouncer completely #1054

qris opened this issue Apr 24, 2024 · 4 comments

Comments

@qris
Copy link

qris commented Apr 24, 2024

When the PAM authentication queue is full, pgBouncer sleeps and stops servicing the event loop:

usleep(PAM_QUEUE_WAIT_SLEEP_MCS);

If we can't run the event loop instead, then I think the slog_debug above should be an slog_warning to let the admin know that something is up.

@eulerto
Copy link
Member

eulerto commented Apr 24, 2024

You are not the first one that complains about it. This message can help users detect that your authentication service cannot keep up with the storm of authentication requests. On the other hand, this message will be printed in every new authentication request until the queue is lesser than PAM_REQUEST_QUEUE_SIZE (20). I'm probably worrying too much because 100 ms (PAM_QUEUE_WAIT_SLEEP_MCS) pauses seem sufficient to alleviate the pressure on the authentication service.

What is your authentication service? Are you observing long pauses because of the current behavior?

@qris
Copy link
Author

qris commented Apr 24, 2024

Thanks, we’re using LDAP to an AD server. I’m not really sure how to benchmark that part (Pam requests), but you’re probably right that they should be fast. However I’m finding that I can overload it with 40+ new connections, so have increased that queue size locally.

@qris
Copy link
Author

qris commented Apr 25, 2024

If I send 100 connections direct to Postgres then they queue and are accepted in turn, and it takes about 9 seconds for the last one to succeed (so perhaps 90ms per connection including the auth round trip). If I send 100 connections to pgBouncer then nothing at all happens for about 12 seconds (and new connections are also blocked for ~12 seconds) and then they all succeed over then next ~3 seconds.

@JelteF
Copy link
Member

JelteF commented Apr 29, 2024

Sleeping on PgBouncer its main thread should really be a no-go. Afaict a flood of auth requests can this way easily cause queries on already established connections not to go through.

The whole locking logic of this piece of code doesn't make much sense to me either. request->status is being assigned and read from different threads without any locks or atomic operations. I'm honestly surprised it's working as well as it is. I expect these missing locks are the cause of the slowness issues you're seeing. The main thread is probably reading outdated values in pam_poll because pam_auth_worker is writing to request->status without a lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants