Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #20617

Closed
rgeronimi opened this issue Jun 8, 2017 · 20 comments

Comments

Projects
None yet
5 participants
@rgeronimi
Copy link

commented Jun 8, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8.3 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/rgeronimi/sabrina"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build401940531=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I deployed on port 80 a standard http.Server S, plugged into a standard httputil.ReverseProxy P, reverse proxying traffic through a standard http.Transport T to another standard golang web server S2 in a separate unix process. P is configured to use keep-alive.

What did you expect to see?

When used with any browser, S should behave normally, displaying the full website of S2 correctly.

What did you see instead?

It worked with Safari, Chrome and Firefox. But with torBrowser (vanilla MacOS version 7.0, nothing customized), some page resources where randomly broken with a "502 Bad Gateway" message and an empty content (and the content-length header says 0). Headers and torBrowser logs show nothing special beyond the 502 error.

I also tried with all other browsers (outside of torBrowser) but over the Tor network, and everything works. So the issue is triggered by the torBrowser itself and not by the Tor network.

Only the first (and sometimes second) resource is correctly downloaded, and then all the following ones fail with a 502. This is inconsistent, suggesting a race condition.

Suspecting a keep-alive connection management issue somewhere in the chain, I checked the chronology and it matches perfectly a scenario where a connection C is always deemed "broken" by http.Transport after the first resource request Req1 is answered, compromising all the following resources requests Req2, Req3, ... that were pipelined on that same connection C.
The rare cases where not only 1 resource but 2 resources were correctly received by torBrowser is because there were 2 parallel connections to pipeline the requests on.

I checked the httputil.ReverseProxy source code and saw that it sends back 502 errors when the request context is canceled, which matches the torBrowser status code received.

So I made the hypothesis that, for some reasons:
(1) torBrowser systematically has a specific behavior that pushes the http.Server S to declare the request Req1 "canceled" slightly before T considers its own clone TReq1 of Req1 completed
(2) Consequently T propagates that cancellation to all the following requests TReq2, TReq3, ... that it had pipelined on the same connection C (established between T and S2)
(4) Which forces P to generate the 502 errors on each of their original requests Req2, Req3, ...

To test this hypothesis, I implemented this workaround in the httputil.ReverseProxy.Director to remove the influence of contexts on the reverse proxy P and its transport T:

func director(req *http.Request) {
	quietReq := req.WithContext(context.Background())
	*req = *quietReq
	...other code...
}

This workaround worked perfectly : the bug symptoms disappeared, torBrowser shows the website perfectly.

The part missing is the exact explanation for (1). Tests show that the breakages are inconsistent : sometimes all requests go through correctly. So it looks like a subtle race condition between http.Server, httputil.ReverseProxy and http.Roundtrip

@rgeronimi rgeronimi changed the title net/http: inappropriate context cancellation within httputil.ReversePropxy or http.Transport net/http: inappropriate context cancellation within httputil.ReverseProxy or http.Transport Jun 8, 2017

@rgeronimi rgeronimi changed the title net/http: inappropriate context cancellation within httputil.ReverseProxy or http.Transport net/http: inappropriate context cancellation within http.Server or http.Transport Jun 8, 2017

@rgeronimi rgeronimi changed the title net/http: inappropriate context cancellation within http.Server or http.Transport net/http: inappropriate context cancellation between httputil.ReverseProxy and http.Roundtrip Jun 9, 2017

@rgeronimi rgeronimi changed the title net/http: inappropriate context cancellation between httputil.ReverseProxy and http.Roundtrip net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Roundtrip Jun 9, 2017

@rgeronimi rgeronimi changed the title net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Roundtrip net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport Jun 9, 2017

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 9, 2017

To pinpoint more precisely the origin of the bug and confirm the theory a second time, I created instead a clone library of httputil.ReverseProxy, commenting out line 146 of reverseproxy.go:

	if cn, ok := rw.(http.CloseNotifier); ok {
  		var cancel context.CancelFunc
  		ctx, cancel = context.WithCancel(ctx)
  		defer cancel()
  		notifyChan := cn.CloseNotify()
  		go func() {
  			select {
  			case <-notifyChan:
  				//TODO RACE cancel() <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  			case <-ctx.Done():
  			}
  		}()
  	}

The race condition disappeared, the site works normally with torBrowser. So the race condition is indeed in the closing order between on one side the completion of the external request (from the external client) and on the other side the completion of the internal request (from the reverse proxy + transport):
in some cases the external request is deemed completed slightly before the internal request (from the reverse proxy and its transport) is deemed completed, triggering a full cancellation of all the other internal requests that were pipelined on the same internal connection.

That race condition is invisible for most browsers but becomes visible with torBrowser, maybe because it has very unusual delay patterns (roundtrips can take up to several seconds) in the overall request/response cycles.

@willmo

This comment has been minimized.

Copy link

commented Jun 11, 2017

Well the documentation for CloseNotify says:

        // If the protocol is HTTP/1.1 and CloseNotify is called while
        // processing an idempotent request (such a GET) while
        // HTTP/1.1 pipelining is in use, the arrival of a subsequent
        // pipelined request may cause a value to be sent on the
        // returned channel. In practice HTTP/1.1 pipelining is not
        // enabled in browsers and not seen often in the wild. If this
        // is a problem, use HTTP/2 or only use CloseNotify on methods
        // such as POST.

And from googling it looks like Tor Browser enables pipelining by default, and ReverseProxy doesn't check the method when using CloseNotify. So I think ReverseProxy is incompatible with HTTP/1.1 pipelining and therefore with Tor Browser (unless you disable pipelining in Tor Browser or make it use HTTP/2). But you could try to work around this in the Director function like you have.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 11, 2017

Tor Browser enables pipelining by default? Ouch. Everybody else who's tried that has ended up disabling it. I thought we'd all concluded it was a failed experiment.

Go does support HTTP/1.1 pipelining in general, with the exception of the context cancellation caveat noted above. We considered that acceptable at the time because ~nobody uses pipelining in the wild.

I'm super curious to read the thread where Tor decided to that, if anybody has a link.

@bradfitz bradfitz added this to the Unplanned milestone Jun 11, 2017

@bradfitz bradfitz added the Thinking label Jun 11, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 11, 2017

I'll keep this bug open, but I can't think of a good fix. Ideas welcome.

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 11, 2017

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 11, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 12, 2017

What are the negative side effects of resolving the CloseNotify caveat?

The drug hasn't even been invented yet, so it's a bit early to be talking about negative side effects.

If you have a proposed fix, I'm all ears.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 12, 2017

Oh, are you referring to your httputil.ReverseProxy change? That's the wrong place to make the change. That only affects the httputil package, but not anything else that uses net/http.(*Request).Context(). If you only cared about ReverseProxy, that might work.

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

Oh sorry I didn't mean that, of course the httputil change was only a hack to confirm my pipelining issue theory.

I was thinking more about adding in CloseNotifier (or in a separate HalfCloseNotifier interface) this method:
HalfCloseNotify() <-chan bool
which returns a channel that receives at most 2 values : at most one false when the request stream is closed, at most one true when the response stream is closed.

An alternative is to address specifically only the response side with a ResponseCloseNotifier:
ResponseCloseNotify() <- chan struct{}

But that looks as complex and maybe less powerful for later.

Or maybe there are other ways to provide this distinction between request streams and response streams.

Then httputil.ReverseProxy, and maybe other http API clients, could use HalfCloseNotify or one of its alternatives together with the request idempotency to properly distinguish between request vs response closings.

But I am still new to that pipelining issue and more importantly to the exact set of API contracts satisfied today by the http lib so I may have missed something.

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

My answer above was poor as I didn't check where exactly is the CloseNotifier interface implemented.
I see that its implemented by ResponseWriter.

In such case my first alternative (adding a HalfCloseNotify method) looks very abstract : it crosses the conceptual boundary of what the response writer should know and tell about by introducing a concept of "half-close" which wasnt triggered by the response stream itself but by specific request stream events. CloseNotifier already crossed the boundary (by including requests closings in its notification scope) but this would only reinforce it and make it more explicit.

The second alternative has a very confusing name: ResponseCloseNotifier for a ResponseWriter method. But we could call it FullCloseNotifier maybe

A third alternative is to extend the spec of the behavior of the CloseNotify() channel in the following backward-compatible way:
when specifically the response stream (and not the request stream) of the http roundtrip is closed, not only does the channel receive true, but it is also closed afterward.

As the doc says nothing about the closure of the channel, only about at most 1 value received, I guess this alternative satisfies the Go1 compatibility constraint.

That way reverseproxy can distinguish between request and response stream closing.
Althought it's cheaper in terms of API surface and heap allocations, i find this third alternative slightly arcane (in a negative way)

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

If this is fixed, the solution should not involve new API. It'd be nice to just only close the Request.Context at the right time. It would need to distinguish the background-reader byte read to see whether it's a real byte (e.g. 'G' for "GET") or io.EOF etc for the conn going away.

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

Ah I didn't see the "may" in the CloseNotifier doc about the pipeline special case. I thought it was a "does" and thus that we were tied by Go1 backward compatibility contract.
In such case, yeah that shall be API-free

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

Ah I just saw the connReader.closeNotifyFromPipelinedRequest method.
I see indeed there is quite some doc comments already on that topic :)
As written in the docs we dont need to go as far as parallelizing such requests, processing them sequentially would be largely sufficient for the specifics of torBrowser.

But couldn't we have that method, instead of calling connReader.closeNotify, wait on some kind of signal that the current request have been finished processing and just return? I dont know the inner workings of your signaling infrastructure there, do you have something, e.g. a sync.Cond under mu mutex supervision that you could wait on? A bit like Close vs Shutdown, where one is waiting for others to complete while the other is aggressively shutting things down.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

Sorry, I'm working on getting the Go 1.9beta1 release out. I won't have time to work on this bug for a number of months. (This definitely won't be fixed in Go 1.9)

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

For my use case is ok, good release !

@willmo

This comment has been minimized.

Copy link

commented Jun 15, 2017

Probably worth noting that Firefox (which Tor Browser is based on) has removed support for HTTP/1.1 pipelining in Firefox 54 (released today).

@rgeronimi

This comment has been minimized.

Copy link
Author

commented Oct 2, 2017

Can't reproduce the bug with latest version of torBrowser (7.0.6). Might be due to random network fluctuations, or to them writing off pipelining. But couldn't find any announcement on that. Closing the ticket as far as what I could reproduce.

@rgeronimi rgeronimi closed this Oct 2, 2017

@juliens

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2018

Do you think it's possible to reopen this issue ?

I see the same problem while using httputil.ReverseProxy for an apt mirror.

It'd be nice to just only close the Request.Context at the right time. It would need to distinguish the background-reader byte read to see whether it's a real byte (e.g. 'G' for "GET") or io.EOF etc for the conn going away.

@bradfitz I would be really happy to help on this, but I'm not sure to understand what you mean here.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

@juliens please open a new issue and include all the information requested by the issue template. A runnable code sample to demonstrate the issue would be extremely beneficial.

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

Fixed by 00ebfcd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.