-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add keep-alive option #38
Conversation
Vendored pgx with jackc/pgx#38 included
I did a test of this functionality by testing a connection to a PostgreSQL server on a different computer, then pulling the network cable from that computer. I got some surprising results. Here's the test program I used: https://github.com/jackc/pgx_notify_lost_connection Without your patch, it hung, apparently forever (I let it go 15 minutes). With your patch with a KeepAlive of 5 seconds, it hung for 10 minutes, and then noticed the connection was dead. I found this post that explains what is happening: http://felixge.de/2014/08/26/tcp-keepalive-with-golang.html I'm not sure I want to use the library the post references -- it doesn't support Windows and I try to avoid external dependencies. I suppose timing out after a platform specific amount of time is better than never timing out at all. But it feels wrong to expose control of keep-alive as a number that can't accurately reflect what the true timeout value is. I wonder if it would be a good idea to not expose any configuration and just to turn on keep alive on with a timeout of 1 minute (or 5 or 10...). The libpq C client library enables keep alives by default so presumably it is safe (http://www.postgresql.org/docs/9.3/static/libpq-connect.html). In this way we would not set any expectations of how quickly dead connections will be noticed, but we can be sure that they will be noticed eventually (this is also on my mind because it occurred to me while working on this that a dropped connection at the wrong time could cause Query or Exec to lockup as well). |
I was able to get fairly quick detection of a lost connection with a for-select that attempted a nop ( |
It probably was the nop that was detecting the lost connections, because that is causing a write on the connection. That would detect the broken connection regardless of keep-alive. Where the keep-alive would help is a The problem with just using keep-alive is the actual time it takes from broken connection to detection is highly variable based on OS and OS configuration (and can easily be many minutes until lost connection detection). So to get consistent and shorter latencty detection of lost connection in a This makes me think we have two issues here. One is the specific case of For For the general case of the lost connection, I think we do need to have keep-alive on. Here's a possible scenario: A user sends a select statement that will return 100 rows. After receiving 50 of them the connection silently is lost. pgx could currently hang forever waiting to read. The keep-alive time doesn't matter a great deal because this is just a fail-safe to ensure we eventually notice the lost connection. For a real lost or slow connection detection we would need to use I'm going to merge this pull request and then change keep-alive to always on. I created issue #40 for |
done by eschewing net.Dial shorthand and creating a new dialer and setting the keepalive option