Add keep-alive option #38

flowchartsman · 2014-09-24T22:39:51Z

done by eschewing net.Dial shorthand and creating a new dialer and setting the keepalive option

…ive option

Vendored pgx with jackc/pgx#38 included

jackc · 2014-09-26T22:57:02Z

I did a test of this functionality by testing a connection to a PostgreSQL server on a different computer, then pulling the network cable from that computer. I got some surprising results.

Here's the test program I used: https://github.com/jackc/pgx_notify_lost_connection

Without your patch, it hung, apparently forever (I let it go 15 minutes). With your patch with a KeepAlive of 5 seconds, it hung for 10 minutes, and then noticed the connection was dead.

I found this post that explains what is happening: http://felixge.de/2014/08/26/tcp-keepalive-with-golang.html

I'm not sure I want to use the library the post references -- it doesn't support Windows and I try to avoid external dependencies.

I suppose timing out after a platform specific amount of time is better than never timing out at all. But it feels wrong to expose control of keep-alive as a number that can't accurately reflect what the true timeout value is. I wonder if it would be a good idea to not expose any configuration and just to turn on keep alive on with a timeout of 1 minute (or 5 or 10...). The libpq C client library enables keep alives by default so presumably it is safe (http://www.postgresql.org/docs/9.3/static/libpq-connect.html). In this way we would not set any expectations of how quickly dead connections will be noticed, but we can be sure that they will be noticed eventually (this is also on my mind because it occurred to me while working on this that a dropped connection at the wrong time could cause Query or Exec to lockup as well).

flowchartsman · 2014-09-27T15:47:16Z

I was able to get fairly quick detection of a lost connection with a for-select that attempted a nop ( --; ) every two minutes and otherwise sat in a 30 second WaitForNotification loop. I think perhaps that turning keep-alives on by default is good, but I'm unsure what the best hard-coded value would be in that case. Thoughts? It also occurs to me that the null query functionality I added earlier today to support the aforementioned NOP could help detect dropped connections as well.

jackc · 2014-09-27T17:04:11Z

It probably was the nop that was detecting the lost connections, because that is causing a write on the connection. That would detect the broken connection regardless of keep-alive. Where the keep-alive would help is a WaitForNotification loop that did not include the nop (because that only reads and never tries to send on the connection).

The problem with just using keep-alive is the actual time it takes from broken connection to detection is highly variable based on OS and OS configuration (and can easily be many minutes until lost connection detection). So to get consistent and shorter latencty detection of lost connection in a WaitForNotification loop it does seem necessary to poll in some manner.

This makes me think we have two issues here. One is the specific case of WaitForNotification detecting a lost connection, and two is the general case of a lost connection.

For WaitForNotification it bothers me to require users of pgx to include the nop request in their WaitForNotification loop. I think an internal nop (which may not have to be an Exec -- there may be a lower level nop message we can send -- Flush might do it) would be a good idea.

For the general case of the lost connection, I think we do need to have keep-alive on. Here's a possible scenario: A user sends a select statement that will return 100 rows. After receiving 50 of them the connection silently is lost. pgx could currently hang forever waiting to read. The keep-alive time doesn't matter a great deal because this is just a fail-safe to ensure we eventually notice the lost connection. For a real lost or slow connection detection we would need to use Set[Read|Write]Deadline.

I'm going to merge this pull request and then change keep-alive to always on. I created issue #40 for WaitForNotification detecting lost connections quicker to continue the discussion there.

Add keep-alive option

Handle IPv6 in connection URLs

flowchartsman added 2 commits September 24, 2014 18:38

Add keep-alive option by creating a dialer first, then setting KeepAl…

b46ee0a

…ive option

name consistency

33fb73b

jackc added a commit to jackc/pgx_notify_lost_connection that referenced this pull request Sep 26, 2014

Initial version

aeb66c8

Vendored pgx with jackc/pgx#38 included

jackc mentioned this pull request Sep 27, 2014

WaitForNotification should detect lost connections #40

Closed

jackc added a commit that referenced this pull request Sep 27, 2014

Merge pull request #38 from anaxagoras/keepalive

a724d7b

Add keep-alive option

jackc merged commit a724d7b into jackc:master Sep 27, 2014

flowchartsman deleted the keepalive branch September 28, 2014 15:13

jackc added a commit that referenced this pull request Dec 4, 2021

Merge pull request #38 from lukedirtwalker/fixIPv6

18d2604

Handle IPv6 in connection URLs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add keep-alive option #38

Add keep-alive option #38

flowchartsman commented Sep 24, 2014

jackc commented Sep 26, 2014

flowchartsman commented Sep 27, 2014

jackc commented Sep 27, 2014

Add keep-alive option #38

Add keep-alive option #38

Conversation

flowchartsman commented Sep 24, 2014

jackc commented Sep 26, 2014

flowchartsman commented Sep 27, 2014

jackc commented Sep 27, 2014