-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database restarts are not handled gracefully #672
Comments
The trick is this requires a read before every query to see if an error has arrived. Unfortunately, Go does not provide a good way to do a non-blocking read. Apparently, it can be done with sufficient fiddling with the Another option would be to do a read with a very short read deadline. But that is impractical because a read deadline long enough to read the possible error would also be long enough to be a significant amount of overhead to pay on every query. The last option is to continuously read in a goroutine and send the messages through a channel. But there was a significant amount of overhead -- at least when I looked into it several years ago. It also adds a fair amount of complexity -- the copy protocol requires something like this and the code is really messy (https://github.com/jackc/pgconn/blob/c9abb86f21f0b89b909e9d112829e21daf3c06d8/pgconn.go#L1039). Perhaps that code could be generalized and optionally used for those who want this eager reading functionality... Beyond that, this type logic should go on the connection pool rather than the individual connections. Otherwise, session state could silently be lost on an automatic reconnect. |
Thanks @jackc for your detailed feedback. We'll have a good think about it. Best regards. |
Just for kicks I hacked together something that continuously read in a goroutine and sent the messages through a goroutine. Unfortunately performance is still unacceptable. A trivial select took 22% longer with that change. Also, after reading your original message again I realized I may have misread it before.
You mean the lib/pq Go library, not the libpq C library. That actually makes more sense because AFAIK C libpq is connection oriented and it would be very surprising for it to transparently establish a different connection. That also confirms may assertion that this logic should go in the connection pool. A database/sql driver can signal to the pool that the connection is bad by returning After a quick glance at lib/pq it appears that it automatically maps any fatal error to a ... Yup. It's error prone alright. It can cause your queries to silently be executed twice. I just reported the bug to them in lib/pq#939. ... So... as far as what could be done to handle this correctly... There isn't a one size fits all solution that I can see. The "perfect" solution is to try to read before sending any queries. But that is almost certainly unacceptable from a performance standpoint. I suppose you could checkout one connection and dedicate it to doing nothing but reading from the database. If it ever failed that would mean that a network failure or database restart had occurred. In that case you would want to close your database connection pool and create a new one. (doing that in a non-racy manner would entail a few gymnastics...) |
Hi @jackc. Yep thanks for the detailed analysis.. we had arrived at some of those conclusions already. It's really frustrating, sadly. Automatic retry logic for immutable queries would be nice. The application could tell the library whether the query is safe to retry. At a deeper level, it's such a pity that Go does not have a non blocking read to check the connection status. Because the operating system is aware that the connection is closed, yet Go can't use that information protectively. In fact, writes still succeed, and then the read fails. Which makes it impossible to tell if query was executed or not. |
so for now, what is the preferred way to handle database restart? In background do database sql ping method (https://golang.org/pkg/database/sql/#DB.PingContext in case of database/sql) or something different? |
As sad as this sounds, there doesn't seem to be a perfect solution at the moment. lib/pq handles it but is risky for double execution. |
The eager read as @jackc would probably be the most bullet proof way to do this and would solve also for network disconnections that the operating system knows about thanks to TCP keepalives etc. It is not trivial to implement, though. |
This is the simplest approach. Any busy connections are going to encounter an error -- no way around that. So all we can easily do is minimize the idle connections. The other approach is this:
Those gymnastics would likely involve |
@jackc, I did have another idea. Not sure if it is feasible but I will put it past you anyway. The concept is that when a connection is being released back to idle, a go routine is launched with a blocking read on that connection until it is needed again. That way, that go routine can handle any unsolicited messages from the server, and even an EOF from the OS. Then, when the connection is needed again, the blocking read is cancelled and the connection is handed to the application again. In my head this doesn't sound like it would be too invasive, or is it too out of the box? Regards, and thanks again for your contributions. |
It's a reasonable approach. There's already a method I also want to test some more real world performance cases. The ~20% loss was with a tiny select on a Unix socket with an uncancellable context. But the actual overhead was measured in microseconds. It may be that in a more production-like environment of connecting over TCP with TLS and a cancellable context that the overhead is insignificant. |
Thanks @jackc. I'll have a look to see how trivial it is. Regarding background reading, what you say makes sense, and we look forward to hearing about your attempt at it. As you mentioned earlier, the application will always and in any case need to be able to gracefully handle failure when talking to the database. There's nothing that the library can do to totally absolve the application developer from handling errors. However, in a HA production system, we try to do anything to minimise any errors and down time as much as possible. Most failures occur due to network components having issues, rerouting, restarts, etc and are typically very temporary and especially under low load, such extra care from the library could hide the impact in some cases. |
can we have some handler that fired then connection lost? So user app can handle it via reconnect or get new connection to fallback? |
The problem is there is no way to know the connection has been lost without reading from the underlying |
for my use-case is ok to get error after next query, but without callback about connection lost i need to write wrapper what checks error code and do reconnect in all instances that uses this conn. if i have some callback i can modify sql.Db pointer with mutex and continue to run other queries.... |
Would it be possible to use this idea would introduce compatibility concerns with non-unix environments. |
we need to peek some data from socket, but as i know go does not provide ability to peek data from socket |
IMHO your suggestion of idle-only background goroutine is preferable to always on background reader. Simply because value can be added only when detected during idle, when connection is actively used error should be propagated to application code anyway |
I read this blog post from GitHub on how they solved for MySQL. Can that approach new used here? https://github.blog/2020-05-20-three-bugs-in-the-go-mysql-driver/ It's the second issue that was fixed. |
It might, though it would not be a trivial change. The biggest issue is how many layers this concept has to cross. At the moment only the connection process knows if TCP or Unix domain sockets are being used and is TLS is being used. Once that is complete everything is through the This change would potentially cross all these layers and require a lot of additional information sharing / coupling between them. Doable, and might even be the right approach -- but definitely messy. |
After switching from lib/pq to pgx/v4/stdlib, this is the only drawback: No automatic reconnect. In my development process, the database is often recreated. To do that, all open connections must be terminated so the database can be dropped and created. The first query that is then executed by my Go process ends up with a |
I got this error today while calling
I don't see a problem if this behavior is implemented only in the
In my opinion, if any kind of error is thrown on |
@jackc What do you think about my suggestion? It'd really be helpful to me if |
What exactly would change? The only error I can see occurring from ping would be a network failure or context cancellation -- both of which are already fatal. |
As stated in this comment on database/sql/driver/driver.go, returning
Returning I don't know if there is any drawback in this approach. I'm not used to the way golang treats database connections. |
@Hellysonrp I would have expected the database/sql connection pool to already handle this correctly as is. But it was a simple enough change to explicitly close a connection on ping failure and report the error as |
Implemented in pgxpool.Pool and database/sql. #672
Well, I finally took another shot at solving this. pgxpool and database/sql now automatically call |
🎩 👏 |
When the database is rebooted gracefully, it sends an
<E
meaningFATAL: terminating connection due to administrator command (SQLSTATE 57P01)
lib/pq discards such connection upfront and the next execution attempts to create a new connection.
Pgx returns an error at the next execution which is a wasted error, because typically the database has already rebooted, or a hot replica is already available,
How can we make pgx realise upfront of similar problems with the connection so that we don't waste a usage attempt with an error when we didn't need to?
The text was updated successfully, but these errors were encountered: