Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server connect rate-limit / preallocating reserved connections (hot spares)? #478

Open
jeffdoering opened this issue Apr 17, 2020 · 2 comments

Comments

@jeffdoering
Copy link

We are seeing application-induced login storms against our PG database where applications with aggressive timeouts drop connections to our bouncer when queries temporarily get slower (for various reason). That part is sort of fine although not ideal.

Unfortunately; the dropped client connections lead the bouncer to shut down server connections closing because: client unexpected eof (age=3413s). We see a high cost on PG to establish new connections which is why the bouncer is so valuable. Unfortunately in this case the situation spirals - clients drop connections due to slowness, new connections to the DB spike it's CPU slowing thing further, and essentially the bouncer pool of server connections gets torn down.

As the connection pool shrinks; clients hit hard timeout failures without killing new connections so the pool starts to grow again - but it struggles to open new connections successfully as the high PG CPU makes everything slow and the cycle repeats.

Ultimately the original slow-down is a problem; but the connection login storms appears to significantly amplify the impact and causes the system to thrash.

We are looking at improving client logic to try and not drop connection so aggressively causing loss of the backend connection. However; that may be unavoidable in various cases.

We are interested in whether there are configuration patterns that accomplish something like a rate-limit on new connections to PG (let clients timeout waiting for connections if needed) and even more ideally a mechanism to pre-allocate (slowly) hot-spare connections that can be used to replace dropped connections. That way as long as the initial slowdown and connection loss is small enough; it can be immediately mitigated by draining a pool of hot-spares and then slowly replenish the spares.

I don't think pgbouncer provides this pattern now but am interested in options on mitigating this pattern assuming we will have transient slowdowns and want to protect the overall infrastructure.

@eulerto
Copy link
Member

eulerto commented May 30, 2020

It seems you should tune parameter server_idle_timeout to retain connections for a large period. Another idea is to set min_pool_size so you will always have at least that number of open connections.
You did not provide details about your setup, but if you have set reserve_pool_size and you are using all pool connections, it will take some time to open the connections from this "reserve pool".

@markokr
Copy link
Contributor

markokr commented May 30, 2020

The mistake here seems to be client simply closing slow connections. They should instead keep them and send cancel request instead. That would allow to send signal to postgres backends that client lost interest in the result and that they should stop.

If on the event when database gets slow, clients start to launch additional backends where they launch same query, without waiting for result of first one and without canceling it, there cannot be good outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants