Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Client does not retry idempotent requests on transport failure #4677

Closed
gopherbot opened this issue Jan 18, 2013 · 27 comments
Closed

net/http: Client does not retry idempotent requests on transport failure #4677

gopherbot opened this issue Jan 18, 2013 · 27 comments
Milestone

Comments

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 18, 2013

by patrick.allen.higgins:

rfc2616 section 8.1.4 says: "Client software SHOULD reopen the transport connection
and retransmit the aborted sequence of requests without user interaction so long as the
request sequence is idempotent"

http.Client.Get() does not do this and does not document that callers are responsible
for doing so.
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 22, 2013

Comment 1:

If only server authors consistently made their GET handlers be idempotent.
I don't think this is safe to do by default. It could be per-Client opt-in, or we could
just document it.

Status changed to Accepted.

@rsc
Copy link
Contributor

@rsc rsc commented Jan 22, 2013

Comment 2:

What does Chrome do?
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 22, 2013

Comment 3:

See also issue #3514 (a race between client re-using connections and the server shutting
down that connection).  Fixing this would kinda fix that, if it were safe to do this.
I don't know what Chrome does.
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 22, 2013

Comment 4:

[+willchan for Chrome question]
@evmar
Copy link

@evmar evmar commented Jan 22, 2013

Comment 5:

Relevant Chrome code, I think:
"1264 // This method determines whether it is safe to resend the request after an
1265 // IO error.  It can only be called in response to request header or body
1266 // write errors or response header read errors.  It should not be used in
1267 // other cases, such as a Connect error."
http://git.chromium.org/gitweb/?p=chromium.git;a=blob;f=net/http/http_network_transaction.cc;h=caa8f11909ff40e5cee9dd6be3814233220d36f1;hb=HEAD#l1264
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Jan 22, 2013

Comment 6:

And:
1342 bool HttpNetworkTransaction::ShouldResendRequest(int error) const {
1343   bool connection_is_proven = stream_->IsConnectionReused();
1344   bool has_received_headers = GetResponseHeaders() != NULL;
1345 
1346   // NOTE: we resend a request only if we reused a keep-alive connection.
1347   // This automatically prevents an infinite resend loop because we'll run
1348   // out of the cached keep-alive connections eventually.
1349   if (connection_is_proven && !has_received_headers)
1350     return true;
1351   return false;
1352 }
@gopherbot
Copy link
Author

@gopherbot gopherbot commented Jan 23, 2013

Comment 7 by willchan@chromium.org:

* If we get a RST at the TCP level when we try to reuse a persistent connection, we
retry (because middleboxes often time out connections and send RSTs if you still try to
use it). For desktop, you can be wasteful and use TCP keepalives to mitigate this too.
For mobile, it wakes up the radio and maybe costs money, so you shouldn't do that and
just deal with it. We close persistent connections after a fixed period to minimize the
times we get these errors. For mobile, we do deferred socket closes (wait for the radio
to wakeup due to other HTTP requests, and then close all timed out sockets).
* If you pipeline requests and get a transport error, we pray that HEADs and GETs are
actually idempotent and retry.
@rsc
Copy link
Contributor

@rsc rsc commented Jan 30, 2013

Comment 8:

Labels changed: added priority-later, removed priority-triage.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 12, 2013

Comment 9:

Labels changed: added go1.1maybe, removed go1.1.

@robpike
Copy link
Contributor

@robpike robpike commented May 18, 2013

Comment 10:

Labels changed: added go1.2maybe, removed go1.1maybe.

@rsc
Copy link
Contributor

@rsc rsc commented Jul 30, 2013

Comment 12:

Labels changed: added feature.

@robpike
Copy link
Contributor

@robpike robpike commented Aug 30, 2013

Comment 13:

Not for 1.2.

Labels changed: removed go1.2maybe.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Oct 16, 2013

Comment 14:

FWIW we just came across an issue that triggered this problem
in a way that happens reliably.
The problem happens when GOMAXPROCS=1 and there's something
doing lots of work (in our case it was reading a bunch of
data from disk and gzipping it) and then issues an http request.
The connection times out while the code is spinning, but
the http transport code doesn't get scheduled to see the EOF
until the code stops spinning, so we see the EOF almost exactly
at the same time as the request is issued, so the request
fails because it doesn't realise that the EOF is out of band.
The problem is exacerbated because the window for the race is larger than
it needs to be (the request is marked as in flight before the
request has been transferred to the writing goroutine, instead
of incrementing the count just before conn.Write is called).
Here's a program that demonstrates the issue reliably for me:
   http://play.golang.org/p/bVf9wsCJSx
The second GET request fails with "can't write HTTP request on broken connection" or
"EOF".
It succeeds when run with GOMAXPROCS>0.
For the time being we can work around this avoiding connection reuse
on all our http client connections, but this is less than ideal.
@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Oct 16, 2013

Comment 15:

> It succeeds when run with GOMAXPROCS>0.
GOMAXPROCS>1, of course!
@rsc
Copy link
Contributor

@rsc rsc commented Nov 27, 2013

Comment 16:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

@rsc rsc commented Nov 27, 2013

Comment 17:

Labels changed: removed feature.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 18:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 19:

Labels changed: added repo-main.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented May 30, 2014

Comment 20:

Issue #8122 has been merged into this issue.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 19, 2014

Comment 21:

Issue #9122 has been merged into this issue.

@gopherbot
Copy link
Author

@gopherbot gopherbot commented Nov 27, 2014

Comment 22 by james.defelice:

It would be great to see this fixed in the upcoming 1.4 release
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 28, 2014

Comment 23:

That won't happen. Go 1.4 was frozen 3 months ago. We have a release cycle that's 3
months of work, then 3 months of stabilization. This would need to be done between
December 1st-ish and March 1st-ish.
bgentry added a commit to bgentry/go that referenced this issue Jan 22, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

This is a test for one of the issues described in issue golang#4677.
bgentry added a commit to bgentry/go that referenced this issue Jan 22, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

This is a test for one of the issues described in issue golang#4677.
bgentry added a commit to bgentry/go that referenced this issue Jan 23, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

The issue, among others, is described in golang#4677.

This change follows some of the Chromium guidelines for retrying
idempotent requests only when the connection has been already been used
successfully and no header data has yet been received for the response.

Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16
@bgentry
Copy link
Contributor

@bgentry bgentry commented Jan 23, 2015

I've made an attempt to resolve one specific, reproducible example of this issue: https://go-review.googlesource.com/3210

Following the comments in this thread from Chromium's network stack, I'm only retrying under the following circumstances:

  • Request is idempotent (currently just GET or HEAD)
  • Connection has already been used successfully and is being reused
  • No data has yet been received for the response headers

The comments so far suggest that this kind of change is something you're open to. Hopefully my approach is reasonable :)

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Apr 6, 2015

@gopherbot
Copy link
Author

@gopherbot gopherbot commented Apr 25, 2015

CL https://golang.org/cl/3210 mentions this issue.

bradfitz added a commit that referenced this issue Dec 1, 2015
If we try to reuse a connection that the server is in the process of
closing, we may end up successfully writing out our request (or a
portion of our request) only to find a connection error when we try to
read from (or finish writing to) the socket. This manifests as an EOF
returned from the Transport's RoundTrip.

The issue, among others, is described in #4677.

This change follows some of the Chromium guidelines for retrying
idempotent requests only when the connection has been already been used
successfully and no header data has yet been received for the response.

As part of this change, an unexported error was defined for
errMissingHost, which was previously defined inline. errMissingHost is
the only non-network error returned from a Request's Write() method.

Additionally, this breaks TestLinuxSendfile because its test server
explicitly triggers the type of scenario this change is meant to retry
on. Because that test server stops accepting conns on the test listener
before the retry, the test would time out. To fix this, the test was
altered to use a non-idempotent test type (POST).

Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16
Reviewed-on: https://go-review.googlesource.com/3210
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@glasser
Copy link
Contributor

@glasser glasser commented Feb 15, 2016

Should this issue be closed now that 5dd372b is merged? Or does that only cover some of the cases of this issue?

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Feb 16, 2016

@glasser, yes, probably. I can't remember anything else this was open for.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.