Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Client returns errors on POST if keep-alive connection closes at unfortunate time #22158

Closed
horgh opened this issue Oct 5, 2017 · 10 comments
Milestone

Comments

@horgh
Copy link
Contributor

@horgh horgh commented Oct 5, 2017

What version of Go are you using (go version)?

go version go1.9.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/vagrant/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build936671241=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

I performed an HTTP POST using an *http.Client that had keepalives enabled and there was an idle connection available for re-use.

Specifically, I encountered this by using an *http.Client with an http.Transport set to have an IdleConnTimeout higher than the server's. Even a very small duration above can cause it. From my testing, I believe the connection gets taken for re-use just as the server closes it.

This program demonstrates the problem: https://play.golang.org/p/GTj6FEFWk-

Note if we change the request to be a GET, we don't see a problem.

What did you expect to see?

The request succeed. In particular, I would expect the *http.Client, being a connection pool and supporting connection re-use, would not raise an error when it finds a dead connection, but use a different connection. This assumes there was confidence the request indeed did not go through.

What did you see instead?

The sample program repeatedly makes HTTP requests. Sometimes they succeed, sometimes they fail with:

Post http://localhost:9999/test: EOF

Or:

Post http://localhost:9999/test: read tcp 127.0.0.1:36589->127.0.0.1:9999: read: connection reset by peer

For example:

$ ./test-connection-reuse 
2017/10/05 20:22:39 request succeeded
2017/10/05 20:22:40 request succeeded
2017/10/05 20:22:41 request succeeded
2017/10/05 20:22:42 request failed: error performing request: Post http://localhost:9999/test: EOF
2017/10/05 20:22:43 request succeeded
2017/10/05 20:22:44 request succeeded
2017/10/05 20:22:45 request failed: error performing request: Post http://localhost:9999/test: EOF
2017/10/05 20:22:46 request succeeded
2017/10/05 20:22:47 request succeeded
2017/10/05 20:22:48 request failed: error performing request: Post http://localhost:9999/test: read tcp 127.0.0.1:36563->127.0.0.1:9999: read: connection reset by peer
2017/10/05 20:22:49 request succeeded
2017/10/05 20:22:50 request failed: error performing request: Post http://localhost:9999/test: read tcp 127.0.0.1:36589->127.0.0.1:9999: read: connection reset by peer
2017/10/05 20:22:51 request succeeded

Further information

I searched for issues before reporting this. I found #8946 which I believe to be about this exact issue, except its example was an HTTP GET request. I believe the fix for #8946 was done for #4677 in 5dd372b.

From reading #4677, I believe the reason HTTP GET works is we retry in that case. Retrying an HTTP POST is unsafe depending on whether the request reached the server. But in the situation I'm describing, it didn't (although I don't know if we could reliably tell). See here in transport.go (comments closely below this line appear to be talking about this).

I noticed in #4677 that some of the basis for the behaviour was based on what Chromium does. I found its relevant code that handles what I think is going on here, and I don't see it talking about particular HTTP verbs (though of course I may not be looking in the right spot).

For my application, I've worked around this by ensuring the client's IdleConnTimeout is lower than the server's equivalent. I also added a retry at that level. I think a retry alone would not be sufficient, as presumably the retry could hit this very condition again.

It may be that nothing can be done. In that case, it might be good to document that this is expected, as it surprised me a little.

Thank you!

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Oct 5, 2017
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 5, 2017

@tombergan

This comment has been minimized.

Copy link
Contributor

@tombergan tombergan commented Oct 6, 2017

Thanks for the playground code. The relevant RFC is here:
https://tools.ietf.org/html/rfc7230#section-6.3.1

You've already pointed out the main challenge, which is determining if the POST had reached the server. This is impossible to know precisely. It looks like Chrome assumes the POST had not reached the server if the connection was reused and gets a RST before any response bytes are received. We could do the same thing.

@tombergan tombergan self-assigned this Oct 6, 2017
@horgh

This comment has been minimized.

Copy link
Contributor Author

@horgh horgh commented Oct 7, 2017

Interesting, thank you for the RFC reference!

The heuristic from Chrome seems fairly reasonable. Presumably if it has been in use there the risk is less.

I also found #15723 and #18241 which sound similar in that they are about non-idempotent requests. From the latter I had the idea to try setting GetBody on the request in the sample program. The behaviour is the same.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 22, 2017

I think it's too late to change to the Chrome behavior for Go 1.10, but we should probably do it for Go 1.11. Leaving NeedsInvestigation but @tombergan please feel free to make a decision and switch to NeedsFix.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017
@odeke-em

This comment has been minimized.

Copy link
Member

@odeke-em odeke-em commented Jul 7, 2018

Kindly pinging you @horgh @tombergan @bradfitz. Should we move this to Go 1.12 or might we be able to do something here?

@bradfitz bradfitz changed the title net/http: http.Client returns errors when re-using connections if the connection closes at the wrong time net/http: Client returns errors on POST if keep-alive connection closes at unfortunate time Jul 9, 2018
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Jul 9, 2018

I'll move it to Go 1.12.

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jul 9, 2018
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Dec 4, 2018

In Go 1.12 we now have the fix for #19943 (comment) ....

Change https://golang.org/cl/147457 mentions this issue: net/http: make Transport respect {X-,}Idempotency-Key header

That should be sufficient for this bug.

See the docs on the Transport type at https://tip.golang.org/pkg/net/http/#Transport

@bradfitz bradfitz closed this Dec 4, 2018
@detailyang

This comment has been minimized.

Copy link

@detailyang detailyang commented Jan 4, 2019

more TCP layer detail, if some guys want to know the reason:)

See the server side send the FIN packet

image

@themez

This comment has been minimized.

Copy link

@themez themez commented Oct 14, 2019

Chromium checked whether socket reused before sending retry request, I wonder why it is necessary for an ECONNRESET error, if the server is not handling on the port, you will get an ECONNREFUSED error instead, so maybe we can retry every time running into an ECONNRESET error. Or ECONNREFUSED is just Linux implementing detail? I didn't find any RFC distinguish it with ECONNRESET, they are both based on the TCP RST flag. Correct me if I'm wrong.

@themez

This comment has been minimized.

Copy link

@themez themez commented Oct 18, 2019

Got explain from chromium dev

Servers can timeout idle connections at their leisure and restart at their leisure. As a result, even if network conditions are perfect, reusing a stale socket can get you ERR_CONNECTION_RESET - so if we ever want to be able to reuse sockets, we need to retry on resets. A server returning ERR_CONNECTION_RESET on a fresh socket, however, means that there is some sort of server (or middlebox) issue. And yes, we can get a reset at that point (buggy SSL implementations seem to be big fans of resetting sockets during SSL negotiation when they're sad, for instance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.