SQL errors when the connection used is no longer valid (pgx driver version) #3432

chacmool · 2023-02-07T08:07:24Z

Preflight checklist

I could not find a solution in the existing issues, docs, nor discussions.
I agree to follow this project's Code of Conduct.
I have read and am following this repository's Contribution Guidelines.
This issue affects my Ory Network project.
I have joined the Ory Community Slack.
I am signed up to the Ory Security Patch Newsletter.

Describe the bug

We found some connectivity issues on our database connections. When one node of our cluster was not available anymore (because a node upgrade or any other reason), we found that some SQLs failed with this kind of errors:

write: broken pipe
ERROR: server is shutting down (SQLSTATE 57P01)

This was happening because the connection was pointing to a "dying" cluster node, so the connection was not valid anymore.

We temporarily fixed the issue by setting the ConnMaxLifetime to 2 minutes. This way we have a connection life short enough to avoid having connections to unavailable nodes. This patch solved the issue for a while but it led us to another one: high latency on our requests because of too frequent reconnections.

We then thought of another solution. Let the connections live forever and apply some kind of retryer: when a connection fails and it is no longer valid, remove it from the connection pool and retry the query with a new connection.

Looking at golang’s documentation (http://go-database-sql.org/errors.html) we saw that golang itself should be dealing with this retries:

You don’t need to implement any logic to retry failed statements when this happens. As part of the connection pooling in database/sql, handling failed connections is built-in. If you execute a query or other statement and the underlying connection has a failure, Go will reopen a new connection (or just get another from the connection pool) and retry, up to 10 times.

https://github.com/golang/go/blob/56a14ad4bc19d5ee9d4257f370a570377e81e544/src/database/sql/sql.go#L1531-L1546

We also saw at golang’s repo that even the number of retries was reduced from 10 to 2, which makes sense... why should you retry many times on an already dead connection? golang/go@c468f94

So we did some research at pgx repo and we found that there was a related known issue which is fixed on the v5 versión of the driver: jackc/pgx#672 (comment)

We upgraded our workloads to use the new v5 pgx drivers without too much effort and we test that the retryer works well.

That's why we propose to upgrade the pgx driver on this repo

hydra/driver/registry_sql.go

Line 12 in 5585539

_ "github.com/jackc/pgx/v4/stdlib"

Reproducing the bug

Start the database
Start any workload with a database connection
Perform any query and check that works correctly
Shutdown and restart the database
If you try any query now, it should fail

Performing the same steps with a v5 pgx driver it does not fail.

Relevant log output

No response

Relevant configuration

No response

Version

1.11.8 (but applies to latest also)

On which operating system are you observing this issue?

None

In which environment are you deploying?

None

Additional Context

No response

The text was updated successfully, but these errors were encountered:

glerchundi · 2023-02-07T08:18:36Z

@aeneasr just to let you now that we managed to fix the issue in all of our development components and verified that everything is working fine after the upgrade to pgx/v5. We're now using infinite connection lifetimes as Go handles reconnections and protect us from cluster changes (upgrades, maintenance, ...).

Version 5 of pgx finally resolves some long-awaited issues when PostgreSQL shuts down. The upgrade is a drop-in replacement with the only change being the pgconn import path. See ory/hydra#3432 Signed-off-by: aeneasr <3372410+aeneasr@users.noreply.github.com>

chacmool added the bug Something is not working. label Feb 7, 2023

aeneasr mentioned this issue Mar 13, 2023

Upgrade pgx to v5 gobuffalo/pop#814

Closed

alnr mentioned this issue Mar 31, 2023

feature: upgrade jackc/pgx to v5 gobuffalo/pop#818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL errors when the connection used is no longer valid (pgx driver version) #3432

SQL errors when the connection used is no longer valid (pgx driver version) #3432

chacmool commented Feb 7, 2023 •

edited

Loading

glerchundi commented Feb 7, 2023

SQL errors when the connection used is no longer valid (pgx driver version) #3432

SQL errors when the connection used is no longer valid (pgx driver version) #3432

Comments

chacmool commented Feb 7, 2023 • edited Loading

Preflight checklist

Describe the bug

Reproducing the bug

Relevant log output

Relevant configuration

Version

On which operating system are you observing this issue?

In which environment are you deploying?

Additional Context

glerchundi commented Feb 7, 2023

chacmool commented Feb 7, 2023 •

edited

Loading