New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Client returns errors on POST if keep-alive connection closes at unfortunate time #22158

Closed
horgh opened this Issue Oct 5, 2017 · 8 comments

Comments

Projects
None yet
7 participants
@horgh
Copy link
Contributor

horgh commented Oct 5, 2017

What version of Go are you using (go version)?

go version go1.9.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/vagrant/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build936671241=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

I performed an HTTP POST using an *http.Client that had keepalives enabled and there was an idle connection available for re-use.

Specifically, I encountered this by using an *http.Client with an http.Transport set to have an IdleConnTimeout higher than the server's. Even a very small duration above can cause it. From my testing, I believe the connection gets taken for re-use just as the server closes it.

This program demonstrates the problem: https://play.golang.org/p/GTj6FEFWk-

Note if we change the request to be a GET, we don't see a problem.

What did you expect to see?

The request succeed. In particular, I would expect the *http.Client, being a connection pool and supporting connection re-use, would not raise an error when it finds a dead connection, but use a different connection. This assumes there was confidence the request indeed did not go through.

What did you see instead?

The sample program repeatedly makes HTTP requests. Sometimes they succeed, sometimes they fail with:

Post http://localhost:9999/test: EOF

Or:

Post http://localhost:9999/test: read tcp 127.0.0.1:36589->127.0.0.1:9999: read: connection reset by peer

For example:

$ ./test-connection-reuse 
2017/10/05 20:22:39 request succeeded
2017/10/05 20:22:40 request succeeded
2017/10/05 20:22:41 request succeeded
2017/10/05 20:22:42 request failed: error performing request: Post http://localhost:9999/test: EOF
2017/10/05 20:22:43 request succeeded
2017/10/05 20:22:44 request succeeded
2017/10/05 20:22:45 request failed: error performing request: Post http://localhost:9999/test: EOF
2017/10/05 20:22:46 request succeeded
2017/10/05 20:22:47 request succeeded
2017/10/05 20:22:48 request failed: error performing request: Post http://localhost:9999/test: read tcp 127.0.0.1:36563->127.0.0.1:9999: read: connection reset by peer
2017/10/05 20:22:49 request succeeded
2017/10/05 20:22:50 request failed: error performing request: Post http://localhost:9999/test: read tcp 127.0.0.1:36589->127.0.0.1:9999: read: connection reset by peer
2017/10/05 20:22:51 request succeeded

Further information

I searched for issues before reporting this. I found #8946 which I believe to be about this exact issue, except its example was an HTTP GET request. I believe the fix for #8946 was done for #4677 in 5dd372b.

From reading #4677, I believe the reason HTTP GET works is we retry in that case. Retrying an HTTP POST is unsafe depending on whether the request reached the server. But in the situation I'm describing, it didn't (although I don't know if we could reliably tell). See here in transport.go (comments closely below this line appear to be talking about this).

I noticed in #4677 that some of the basis for the behaviour was based on what Chromium does. I found its relevant code that handles what I think is going on here, and I don't see it talking about particular HTTP verbs (though of course I may not be looking in the right spot).

For my application, I've worked around this by ensuring the client's IdleConnTimeout is lower than the server's equivalent. I also added a retry at that level. I think a retry alone would not be sufficient, as presumably the retry could hit this very condition again.

It may be that nothing can be done. In that case, it might be good to document that this is expected, as it surprised me a little.

Thank you!

@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Oct 5, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Oct 5, 2017

@tombergan

This comment has been minimized.

Copy link
Contributor

tombergan commented Oct 6, 2017

Thanks for the playground code. The relevant RFC is here:
https://tools.ietf.org/html/rfc7230#section-6.3.1

You've already pointed out the main challenge, which is determining if the POST had reached the server. This is impossible to know precisely. It looks like Chrome assumes the POST had not reached the server if the connection was reused and gets a RST before any response bytes are received. We could do the same thing.

@tombergan tombergan self-assigned this Oct 6, 2017

@horgh

This comment has been minimized.

Copy link
Contributor

horgh commented Oct 7, 2017

Interesting, thank you for the RFC reference!

The heuristic from Chrome seems fairly reasonable. Presumably if it has been in use there the risk is less.

I also found #15723 and #18241 which sound similar in that they are about non-idempotent requests. From the latter I had the idea to try setting GetBody on the request in the sample program. The behaviour is the same.

@rsc

This comment has been minimized.

Copy link
Contributor

rsc commented Nov 22, 2017

I think it's too late to change to the Chrome behavior for Go 1.10, but we should probably do it for Go 1.11. Leaving NeedsInvestigation but @tombergan please feel free to make a decision and switch to NeedsFix.

@rsc rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

@odeke-em

This comment has been minimized.

Copy link
Member

odeke-em commented Jul 7, 2018

Kindly pinging you @horgh @tombergan @bradfitz. Should we move this to Go 1.12 or might we be able to do something here?

@bradfitz bradfitz changed the title net/http: http.Client returns errors when re-using connections if the connection closes at the wrong time net/http: Client returns errors on POST if keep-alive connection closes at unfortunate time Jul 9, 2018

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jul 9, 2018

I'll move it to Go 1.12.

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jul 9, 2018

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Dec 4, 2018

In Go 1.12 we now have the fix for #19943 (comment) ....

Change https://golang.org/cl/147457 mentions this issue: net/http: make Transport respect {X-,}Idempotency-Key header

That should be sufficient for this bug.

See the docs on the Transport type at https://tip.golang.org/pkg/net/http/#Transport

@bradfitz bradfitz closed this Dec 4, 2018

@detailyang

This comment has been minimized.

Copy link

detailyang commented Jan 4, 2019

more TCP layer detail, if some guys want to know the reason:)

See the server side send the FIN packet

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment