Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
database/sql: retry logic is unsafe #11978
What version of Go are you using (go version)?
What operating system and processor architecture are you using?
What did you do?
Execute a query with database/sql over an unreliable connection. For example:
What did you expect to see?
An error message that means the query failed and n is unchanged, or no error meaning the query executed exactly one time and n has been incremented by 1.
What did you see instead?
The query may have actually executed multiple times and thereby n may have been incremented multiple times.
Reproducing the error
It was necessary to build some infrastructure to reproduce this error. I wrote cavein, a TCP tunnel server that purposely breaks connections to aid in reproduction. That, combined with go_database_sql_retry_bug, a test application that executes many updates over the unreliable connections provided by cavein can reliably reproduce queries being executed extra times.
Automatic retry should be removed from Exec and Query. It is impossible at the library layer to know if it is safe to retry a query.
The driver documentation sayeth the following:
If this test case results in duplicate operations, it's the driver that's at fault, not database/sql's. Not a bug.
Under what conditions could an Exec or Query correctly return ErrBadConn?
Once the query command is written to the socket, the driver has to consider that the query could have been executed. The only way it can safely assert that the query did not execute is if it receives a query failure response from the database server -- but if it receives a query failure message then the connection would still be usable -- so ErrBadConn would not be appropriate.
So it really seems the only situation where ErrBadConn is the correct response is on an error prior to socket write.
I suppose the driver can be blamed, but it feels like retry on Query and Exec is a foot-gun waiting to happen -- for minimal convenience improvement. This is not a theoretical assertion either, both drivers I tested, pq and my own pgx, trigger this undesirable behaviour.
Right; you need to return it to get a bad connection to be discarded from the pool. So the order of events for e.g. Exec should be:
Note that even without this issue, you already have to have the "bad" marker on a connection implemented because of #11264, so implementing it like this should not result in any unreasonable extra work.
Fine, maybe, but how would you discard bad connections from the pool in whatever it is you're suggesting?
Last time this came up on
Okay, I wish it didn't work this way, but I understand the now.
The bad connection would have been caused by a previous query. The connection is marked as bad then. I would suggest checking for connection health on releasing the connection back to the pool instead of on next attempted use. This is what pgx's native connection pool does (when not using database/sql). Though I suppose that would require changing the database/sql driver interface, so that is not possible for Go 1.x.
I've re-examined my test app and pgx, and determined I was incorrect in my assertion that it was vulnerable to this issue. However, pq definitely is. Perhaps this test app could be useful in persuading pq's maintainers.