net/http: permanently broken connection with error "read: connection reset by peer" when response body is not closed #36700
Comments
You were probably affected by #24138 |
Ugh. Thanks. This is about (not) having timeouts on the client side, right? It's unfortunate that it affects brand new requests and that it permanently breaks all new requests. Could we add a (when-broken-only?) connection re-use timeout without breaking the compatibility promise? It seems like connection re-use decisions are entirely up to net/http. Any suggestions for workarounds? It's tricky since I am actually doing long polling requests, where I actively want to keep connections open for hours, so I don't want to set aggressive timeouts. Would it help to call |
A different culprit than lack of timeout might be lack of resume-after-suspend notification. That would let us ping the server on resume and reconnect if nec. Also discussed in #36141 (comment) Maybe unrelated, but the CL for #23459 (net.Dial TCP keepalive on by default) didn't touch net/http. Does http.Client not use net.Dial, or is this disabled for http.Client by a different CL? |
Given that we don't know whether 1.13 was affected, I suspect this should not be milestoned to 1.14 at this point. |
@josharian Did you enable http2 on the Server? If so, curious what happens with http/1.1. |
I used net/http out of the box. I will run for a bit with http2 disabled and see what happens. |
In 1.13 (and presumably earlier; not sure), Setting a short keepalive helped in my case (a plain TCP/TLS context, not HTTP). Eventually I plan to disable keepalive and use a server pulse to detect a dead link. From net/http/transport.go:
|
@networkimprov, that seems unrelated. |
I finally tracked this down. It turns out that there was a code path in which in I didn't close resp.Body. Timeouts didn't help. Although this was user error (sorry and thanks!), I do wonder whether there's a way to detect this, since it was not easy to find. One option is a vet check that, similar to the lostcancel check, checks that resp.Body.Close gets called on all exits from a function. Another option would be some kind of check at runtime in the http package, although I don't know net/http well enough to know whether that is feasible. |
Correction: timeouts were also necessary (in addition to properly closing resp.Body). |
Closing as user error. |
@josharian, could you provide some more detail about how this resulting from a missing The documentation for the
It does not say anything about what happens for HTTP/2 connections, so at the very least there seems to be a documentation issue here. |
This also seems like something we could detect and report, using a finalizer-based approach similar to the one described in #24739: if the |
Yeah, I don't think forgetting to close resp.Body should cause anything worse than a TCP connection leak. It certainly shouldn't affect unrelated HTTP fetches. |
I'm honestly not sure. I tried to reproduce using multiple simpler ssetup and failed. There was clearly something else going on as well that closing the Body helped with. I went looking to try to understand better, but I've already spent over a full day on this, and I'm not really inclined to spend more time digging. I do think a finalizer-based unclosed unreachable Body detection would be a good idea, since that is definitely a bug, regardless of what precise consequences it has. I suspect it'd be better to log instead of panicking--net/http already logs when it notices an ineffectual WriteHeader call, so there's some precedent for it. |
Not closing the response body presumably claused flakyness in the registry replacer when it tries to read a response body[0], see[1] for why this can happen. As this seems to be a common mistake in our codebase, this change activates a linter for it. [0] https://issues.redhat.com/browse/DPTP-1692 [1] golang/go#36700
Possibly related: I just saw this error message in our CI system. I couldn't reproduce it, but I can't see why closing idle connections should cause this kind of error:
|
What version of Go are you using (
go version
)?Go 1.14 beta 1
Does this issue reproduce with the latest release?
Not sure; I can't compile my program with 1.13.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Made a series of http requests from a Go client to a Go server. The requests were vanilla HTTP requests, using
http.Get
.I lost connectivity at some point. When I regained connectivity, all subsequent http requests failed with error
read: connection reset by peer
. I waited quite a long time, and it never recovered.It has happened a couple of times, but doesn't reproduce reliably (which is unsurprising, since me closing my laptop lid is not exactly a precision affair).
This looks very similar to #34978, although I don't know when exactly the connectivity failed.
Tentatively marking as Go 1.14.
cc @bradfitz @fraenkel @tombergan
The text was updated successfully, but these errors were encountered: