You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found some connectivity issues on our database connections. When one node of our cluster was not available anymore (because a node upgrade or any other reason), we found that some SQLs failed with this kind of errors:
write: broken pipe
ERROR: server is shutting down (SQLSTATE 57P01)
This was happening because the connection was pointing to a "dying" cluster node, so the connection was not valid anymore.
We temporarily fixed the issue by setting the ConnMaxLifetime to 2 minutes. This way we have a connection life short enough to avoid having connections to unavailable nodes. This patch solved the issue for a while but it led us to another one: high latency on our requests because of too frequent reconnections.
We then thought of another solution. Let the connections live forever and apply some kind of retryer: when a connection fails and it is no longer valid, remove it from the connection pool and retry the query with a new connection.
You don’t need to implement any logic to retry failed statements when this happens. As part of the connection pooling in database/sql, handling failed connections is built-in. If you execute a query or other statement and the underlying connection has a failure, Go will reopen a new connection (or just get another from the connection pool) and retry, up to 10 times.
We also saw at golang’s repo that even the number of retries was reduced from 10 to 2, which makes sense... why should you retry many times on an already dead connection? golang/go@c468f94
So we did some research at pgx repo and we found that there was a related known issue which is fixed on the v5 versión of the driver: jackc/pgx#672 (comment)
We upgraded our workloads to use the new v5 pgx drivers without too much effort and we test that the retryer works well.
That's why we propose to upgrade the pgx driver on this repo
@aeneasr just to let you now that we managed to fix the issue in all of our development components and verified that everything is working fine after the upgrade to pgx/v5. We're now using infinite connection lifetimes as Go handles reconnections and protect us from cluster changes (upgrades, maintenance, ...).
aeneasr
added a commit
to gobuffalo/pop
that referenced
this issue
Mar 13, 2023
Version 5 of pgx finally resolves some long-awaited issues when PostgreSQL shuts down.
The upgrade is a drop-in replacement with the only change being the pgconn import path.
See ory/hydra#3432
Signed-off-by: aeneasr <3372410+aeneasr@users.noreply.github.com>
Version 5 of pgx finally resolves some long-awaited issues when PostgreSQL shuts down.
The upgrade is a drop-in replacement with the only change being the pgconn import path.
See ory/hydra#3432
Signed-off-by: aeneasr <3372410+aeneasr@users.noreply.github.com>
Preflight checklist
Describe the bug
We found some connectivity issues on our database connections. When one node of our cluster was not available anymore (because a node upgrade or any other reason), we found that some SQLs failed with this kind of errors:
write: broken pipe
ERROR: server is shutting down (SQLSTATE 57P01)
This was happening because the connection was pointing to a "dying" cluster node, so the connection was not valid anymore.
We temporarily fixed the issue by setting the
ConnMaxLifetime
to 2 minutes. This way we have a connection life short enough to avoid having connections to unavailable nodes. This patch solved the issue for a while but it led us to another one: high latency on our requests because of too frequent reconnections.We then thought of another solution. Let the connections live forever and apply some kind of retryer: when a connection fails and it is no longer valid, remove it from the connection pool and retry the query with a new connection.
Looking at golang’s documentation (http://go-database-sql.org/errors.html) we saw that golang itself should be dealing with this retries:
We also saw at golang’s repo that even the number of retries was reduced from 10 to 2, which makes sense... why should you retry many times on an already dead connection? golang/go@c468f94
So we did some research at pgx repo and we found that there was a related known issue which is fixed on the v5 versión of the driver: jackc/pgx#672 (comment)
We upgraded our workloads to use the new v5 pgx drivers without too much effort and we test that the retryer works well.
That's why we propose to upgrade the pgx driver on this repo
hydra/driver/registry_sql.go
Line 12 in 5585539
Reproducing the bug
Performing the same steps with a
v5 pgx
driver it does not fail.Relevant log output
No response
Relevant configuration
No response
Version
1.11.8 (but applies to latest also)
On which operating system are you observing this issue?
None
In which environment are you deploying?
None
Additional Context
No response
The text was updated successfully, but these errors were encountered: