Set last_connect_time for pool only when it really failed to connect #127
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
and rename last_connect_time to last_connect_failed_time to make it
clearer its purpose.
Last couple of months we were observing very strange issues with pgbouncer.
From time to time it was starting to fail and producing following log lines:
'launch_new_connection: last failed, wait'
And clients were disconnected with the message: "pgbouncer cannot connect to server"
We run pgbouncer with server_login_retry = 15 seconds (default setting)
That means after 15 seconds it should try to establish connection to backend and after success reset
last_connect_failed
flag. But it was never happened (flag was never reset).We had enabled debug logs (verbose = 2) and started investigating.
Every time problems were starting with:
WARNING sbuf_connect failed: Resource temporarily unavailable (EAGAIN, full backlog on unix domain socket)
laster there were a lot of messages "launch_new_connection: last failed, wait" and after 15 seconds:
2016-04-06 13:43:29.221 75936 DEBUG S-0x1734db0: db_name/user_name@unix:5432 launching new connection to server
2016-04-06 13:43:29.221 75936 DEBUG S-0x1734db0: db_name/user_name@unix:5432 S: connect ok
2016-04-06 13:43:29.221 75936 DEBUG S-0x1734db0: db_name/user_name@unix:5432 use it for pending cancel req
That means, connection was opened successfully, but this connection was used for forwarding "cancel request" and closed right after. As a consequence, LOGIN phase was not complete and
last_connect_failed
flag was not reset. BUT!last_connect_time
value was updated much earlier, before it even tried to establish connection with the postgres.And again we have "launch_new_connection: last failed, wait" during 15 seconds.
Obviously we have some issues with our setup:
pool_size
is too small and pgbouncer is opening new connections too often.net.core.somaxconn = 128
is too small and definitely must be increased (this should help to mitigatebacklog
problems)Sure, we already fixed 1 and 2, but in my opinion pgbouncer should handle such case (when there are pending cancel requests) correctly and this pull request intended to fix it.