-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dead connection between pgbouncer and the server #138
Comments
Have you tried setting the keepalives_count connection parameter or tuning the connection keepalive behaviour in general? (see https://www.postgresql.org/docs/9.5/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS ) |
Also pgbouncer supports it's own keepalive configuration in the ini file:
|
Yes, we've tried to both set system wide and pgbouncer specific TCP keepalives settings with no luck. Do you have any suggested setup? I'm just wondering if we did any mistake. Thank you, |
I know that Linux kernel rejects too short timeouts. But the succesful setup I've run is: ; 4m idle + 1m check You can try to lower them, but not under 1m range. But after first failures, pgbouncer fast-fail should kick in and later clients should get faster rejects. |
@pracucci did you manage to solve this issue? My team is hitting the same problem. We can get around it by changing Linux’s |
@sdemontfort Tuning Linux TCP stack via sysctl - on the
|
Thanks for the response @pracucci 😃 I have read up about a socket-level @petere, is this a change that you'd be willing to include in source? |
@sdemontfort If I'm not missing anything, |
I don't think it's if the connection drops while the client is transmitting, I think that it sets the allowed amount of time that data sent can go unacknowledged before the connection is forcibly closed:
https://patchwork.ozlabs.org/patch/62889/ I believe the case would be more like:
However, happy to be proven wrong if I'm missing some understanding 😄 |
@pracucci you may be happy to know that I managed to solve the problem without changing Linux OS-level settings. I've forked this repo and introduced the socket-level option Here's my relevant PgBouncer config using the forked repo:
The numbers come from this amazingly helpful blog post: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
I'll likely be creating a PR in the coming days to suggest this change into PgBouncer source, as I think it's important. |
Excellent news @sdemontfort. Would be great having the PR merged in pgbouncer 🤞 |
We do have pgbouncer configured with 2 databases: master and slave. Today we got a network issue between the pgbouncer instance and the slave server instance (all network packets were dropped) and thus all slave connections got stuck.
We do have TCP keep alive configured, but since
tcp_retries2
is the default one (15
) it takes a very long time before stuck connections get dropped.We're looking for a solution that doesn't involve setting
query_timeout
(since we do have some very long queries) and lowertcp_retries2
system wide (since it will affect all applications running on the same server).I'm wondering if patching pgbouncer adding support for
TCP_USER_TIMEOUT
could be a solution. Do you have any suggestion about it?Thank you,
Marco
The text was updated successfully, but these errors were encountered: