Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: missing HTTP KeepAlive timeout configuration #13998

Closed
momchil-sap opened this issue Jan 18, 2016 · 7 comments

Comments

Projects
None yet
5 participants
@momchil-sap
Copy link

commented Jan 18, 2016

Hi,

Currently it seems that it's only possible to enable or disable HTTP KeepAlive. The following code shows how to disable it:

client = &http.Client{
        Transport: &http.Transport{
                DisableKeepAlives: true,
        },
}

There does not seem to be configuration which allows you to specify a KeepAlive timeout, which would abandon idle connections that are inactive for longer than the timeout.

This is important as clients operating behind a Firewall / Proxy or targeting a server running behind a Firewall / Load Balancer might get their TCP connection closed after some period (e.g. 5 minutes) without getting informed. When this happens, Go fails the next HTTP call with read: connection reset by peer.

I faced such a scenario and checked Ruby to see how it would handle. It has multiple protections in place so that it does not error with read: connection reset by peer.

It is important to note that I am asking for an HTTP KeepAlive Timeout configuration. There is already a TCP KeepAlive Timeout configuration available in Go, which works just fine, but has a different purpose - to health-check a TCP connection by pushing regular ACK packages.

@ianlancetaylor ianlancetaylor changed the title net/http: Missing HTTP KeepAlive Timeout configuration net/http: missing HTTP KeepAlive timeout configuration Jan 18, 2016

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Jan 18, 2016

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 18, 2016

I don't understand how this could happen. Go doesn't wait to do a read on the socket when send your next request (e.g. 5.5 minutes later, past the firewall's deadline). Instead, Go is always in a read, waiting for it to hang up on you. If it hangs up in the meantime (say, at 5 minutes, per your example), that connection is removed from the pool and not used later in 5.5 minutes.

Is this a hypothetical bug report or something you're actually seeing?

If you're actually seeing it, please provide code and a packet capture.

@momchil-sap

This comment has been minimized.

Copy link
Author

commented Jan 19, 2016

This is an actual bug that I see. It happens because the connection is terminated by a component in between and a FIN package is never sent back to Go.

If you don't have TCP KeepAlive enabled, Go will have no way of knowing that the connection is down. It will find the first time it tries to send a new HTTP request.

11:48:18.759501 IP source.60116 > target.https: Flags [S], seq 2465554534, win 29200, options [mss 1460,sackOK,TS val 561255716 ecr 0,nop,wscale 7], length 0
11:48:18.760473 IP target.https > source.60116: Flags [S.], seq 1751945730, ack 2465554535, win 28960, options [mss 1460,sackOK,TS val 45413825 ecr 561255716,nop,wscale 7], length 0
11:48:18.760506 IP source.60116 > target.https: Flags [.], ack 1, win 229, options [nop,nop,TS val 561255717 ecr 45413825], length 0
11:48:18.761261 IP source.60116 > target.https: Flags [P.], seq 1:174, ack 1, win 229, options [nop,nop,TS val 561255717 ecr 45413825], length 173
11:48:18.774406 IP target.https > source.60116: Flags [P.], seq 1:4727, ack 174, win 235, options [nop,nop,TS val 45413829 ecr 561255717], length 4726
11:48:18.774439 IP source.60116 > target.https: Flags [.], ack 4727, win 302, options [nop,nop,TS val 561255720 ecr 45413829], length 0
11:48:18.779828 IP source.60116 > target.https: Flags [P.], seq 174:249, ack 4727, win 302, options [nop,nop,TS val 561255722 ecr 45413829], length 75
11:48:18.779943 IP source.60116 > target.https: Flags [P.], seq 249:255, ack 4727, win 302, options [nop,nop,TS val 561255722 ecr 45413829], length 6
11:48:18.779991 IP source.60116 > target.https: Flags [P.], seq 255:300, ack 4727, win 302, options [nop,nop,TS val 561255722 ecr 45413829], length 45
11:48:18.781131 IP target.https > source.60116: Flags [.], ack 300, win 235, options [nop,nop,TS val 45413830 ecr 561255722], length 0
11:48:18.781192 IP target.https > source.60116: Flags [P.], seq 4727:4778, ack 300, win 235, options [nop,nop,TS val 45413830 ecr 561255722], length 51
11:48:18.781547 IP source.60116 > target.https: Flags [P.], seq 300:453, ack 4778, win 302, options [nop,nop,TS val 561255722 ecr 45413830], length 153
11:48:18.789486 IP target.https > source.60116: Flags [P.], seq 4778:5865, ack 453, win 243, options [nop,nop,TS val 45413832 ecr 561255722], length 1087
11:48:18.827759 IP source.60116 > target.https: Flags [.], ack 5865, win 325, options [nop,nop,TS val 561255734 ecr 45413832], length 0
11:55:23.791656 IP source.60116 > target.https: Flags [P.], seq 453:606, ack 5865, win 325, options [nop,nop,TS val 561361974 ecr 45413832], length 153
11:55:23.792331 IP target.https > source.60116: Flags [R.], seq 5865, ack 606, win 0, length 0

(I have replaced the IP addresses with source and target)

You can see that the last 2 records happen around 7 minutes and 5 seconds later and comprise of a [P.] request from Go which receives a [R.] response.

If I were to enable TCP Keep Alive, this would not be reproducible as there would be multiple [.] requests to keep the connection alive.

07:34:26.355755 IP source.60201 > target.443: Flags [.], ack 5865, win 325, options [nop,nop,TS val 687047616 ecr 171198210], length 0
07:34:26.356825 IP target.443 > source.60201: Flags [.], ack 453, win 243, options [nop,nop,TS val 171205724 ecr 687040112], length 0

(These are examples from today, that's why there is a timestamp difference)

Still, it requires a conscious and educated decision when creating the http.Transport to enable it and one might not want a continuous stream of polling packets. I would expect that there is a reasonable HTTP Keep Alive enabled by default, similar to Ruby. Currently, the connection is just left there indefinitely. There must be a good reason why Ruby has such a timeout set by default.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jan 19, 2016

We also have such a timeout set by default, if you use any of the default methods to fetch URLs (http.Get, http.Post, http.DefaultClient, etc). That default is here:

var DefaultTransport RoundTripper = &Transport{
        Proxy: ProxyFromEnvironment,
        Dial: (&net.Dialer{
                Timeout:   30 * time.Second,
                KeepAlive: 30 * time.Second,
        }).Dial,
        TLSHandshakeTimeout: 10 * time.Second,
}

It's true that if you go lower-level and wire stuff up yourself, you get what you ask for.

I suppose a question is whether we should have a DefaultDialer too, and document that custom Transports with a nil Dial func end up using DefaultDial.

(But another question is why your network component drops connections without sending a FIN, but I understand that things do weird things.)

Related: at least in Go 1.6, idempotent requests are retried on a new connection, which mitigates this partially.

@momchil-sap

This comment has been minimized.

Copy link
Author

commented Jan 20, 2016

@bradfitz This are some very good remarks and ideas. Having a DefaultDial sounds good.

@danp

This comment has been minimized.

Copy link
Contributor

commented May 2, 2016

@bradfitz

This comment has been minimized.

Copy link
Member

commented May 2, 2016

Yup, thanks.

@momchil-sap

This comment has been minimized.

Copy link
Author

commented May 3, 2016

Thanks!

@golang golang locked and limited conversation to collaborators May 3, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.