Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http: Re-connect with upgraded HTTP2 connection fails to send Request.body #32441

Open
brianfife opened this issue Jun 5, 2019 · 10 comments
Open

Comments

@brianfife
Copy link

@brianfife brianfife commented Jun 5, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.5 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/bfife/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/bfife/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/x7/yphtc2sj4ml2lrflpwdblvrxsmcq9d/T/go-build801671760=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

This is a follow up of issues #25009 #28206

We are sending requests as follows:

proto.transport = &http.Transport{}
proto.client = &http.Client{Transport: proto.transport}
req, err := http.NewRequest(request.Method, url, bytes.NewBuffer(request.Body))
resp, err := proto.client.Do(req)

What did you expect to see?

Good behaviour

With the above client configuration, I was able to determine the following code was sending out the request using the RoundTrip function noted here:

https://github.com/golang/go/blob/release-branch.go1.12/src/net/http/transport.go#L427

	if t.useRegisteredProtocol(req) {
		altProto, _ := t.altProto.Load().(map[string]RoundTripper)
		if altRT := altProto[scheme]; altRT != nil {
			if resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol {
				return resp, err
			}
		}
	}

In our case, for the first and following requests t.useRegisteredProtocol(req) returns true, so altRT.RoundTrip(req) tries to send via h2_bundle's RoundTrip. As no connection exists at this point in time, this will fail due to error "http2: no cached connection was available", bubbling up ErrSkipAltProtocol "net/http: skip alternate protocol", and the code continues on. n.b. subsequent requests will be sent via altRT.RoundTrip(req) if there are cached connections available, however a GoAway received from the server will trigger the same behaviour of advancing to the next section of code.

https://github.com/golang/go/blob/release-branch.go1.12/src/net/http/transport.go#L447

	for {
		select {
		case <-ctx.Done():
			req.closeBody()
			return nil, ctx.Err()
		default:
		}

		// treq gets modified by roundTrip, so we need to recreate for each retry.
		treq := &transportRequest{Request: req, trace: trace}
		cm, err := t.connectMethodForRequest(treq)
		if err != nil {
			req.closeBody()
			return nil, err
		}

		// Get the cached or newly-created connection to either the
		// host (for http or https), the http proxy, or the http proxy
		// pre-CONNECTed to https server. In any case, we'll be ready
		// to send it requests.
		pconn, err := t.getConn(treq, cm)
		if err != nil {
			t.setReqCanceler(req, nil)
			req.closeBody()
			return nil, err
		}

		var resp *Response
		if pconn.alt != nil {
			// HTTP/2 path.
			t.decHostConnCount(cm.key()) // don't count cached http2 conns toward conns per host
			t.setReqCanceler(req, nil)   // not cancelable with CancelRequest
			resp, err = pconn.alt.RoundTrip(req)
		} else {
			resp, err = pconn.roundTrip(treq)
		}
		if err == nil {
			return resp, nil
		}
...

In the above for loop, for the first request without any connections available, a new transport connection is created, and pconn.alt.RoundTrip(req) or pconn.RoundTrip(req) is then used to submit and fulfill the Request, and this likely completes successfully. So far, so good.

As noted above, subsequent requests from our client continue to be sent from altRT.RoundTrip(req), at which point they are returned, without reaching the for loop.

Bad behaviour

After some time, our load balancer may respond with a GoAway response after a full request is transmitted, including data from the Request.Body, as seen with the following debug:

2019/06/04 02:22:26.750150 http2: Framer 0xc000470a80: wrote HEADERS flags=END_HEADERS stream=17 len=12
2019/06/04 02:22:26.750321 http2: Framer 0xc000470a80: wrote DATA stream=17 len=125 data="STRINGOFDATA"
2019/06/04 02:22:26.750386 http2: Framer 0xc000470a80: wrote DATA flags=END_STREAM stream=17 len=0 data=""
2019/06/04 02:22:26.753144 http2: Framer 0xc000470a80: read GOAWAY len=8 LastStreamID=15 ErrCode=NO_ERROR Debug=""
2019/06/04 02:22:26.753217 http2: Transport readFrame error on conn 0xc0001a3e00: (*errors.errorString) EOF
2019/06/04 02:22:26.753291 http2: Transport failed to get client conn for server:443: http2: no cached connection was available

This results in another transport being created in the for loop shown above, however the Request.Body is not being reset before the request is sent on the newly created transport connection.

The rewind code at https://github.com/golang/go/blob/release-branch.go1.12/src/net/http/transport.go#L497 doesn't get executed if pconn.alt.RoundTrip(req) or pconn.RoundTrip(req) are successful in sending a request after a connection is re-established. From my limited testing, I think the only time the rewind is reached is when TLSNextProto is manually set as it is in TestTransportBodyAltRewind, because this skips setting up the altProto map in transport.go/onceSetNextProtoDefaults(), as altRT is nil.

It looks like moving the rewind code from L497 to L455 would satisfy the case where an existing connection receives a GoAway frame and the Body had been sent to the server (as described above), however it would also execute in the case of the initial request where a transport does not exist and the request hasn't been sent (which makes that execution not necessary).

I believe this test case demonstrates the issue described above:

func TestTransportBodyGoAwayReceived(t *testing.T) {
        var err error
        var resp *Response
        req, _ := NewRequest("POST", "https://example.org/", bytes.NewBufferString("request"))
        tr := &Transport{DisableKeepAlives: true }
        tr.RegisterProtocol("https", httpsProto{})
        c := &Client{Transport: tr}
        resp, err = c.Do(req)
        if err != nil {
                t.Error(err)
        }
        if resp != nil {
                defer resp.Body.Close()
                ioutil.ReadAll(resp.Body)
        }
}

type httpsProto struct{}

func (httpsProto) RoundTrip(req *Request) (*Response, error) {
        defer req.Body.Close()
        ioutil.ReadAll(req.Body)
        return nil, ErrSkipAltProtocol
}

output

../bin/go test -v -run=TestTransportBodyGoAwayReceived net/http
=== RUN   TestTransportBodyGoAway
2019/06/04 21:36:24 Error enabling Transport HTTP/2 support: protocol https already registered
--- FAIL: TestTransportBodyGoAway (0.42s)
    transport_internal_test.go:274: Post https://example.org/: http: ContentLength=7 with Body length 0
FAIL
FAIL	net/http	0.495s

Moving the rewind code to the top of the for loop as described above results in success:

./bin/go test -v -run=TestTransportBodyGoAwayReceived net/http
=== RUN   TestTransportBodyGoAway
2019/06/04 21:35:54 Error enabling Transport HTTP/2 support: protocol https already registered
--- PASS: TestTransportBodyGoAway (0.56s)
PASS
ok  	net/http	0.642s
@gopherbot gopherbot added this to the Unreleased milestone Jun 5, 2019
@brianfife

This comment has been minimized.

Copy link
Author

@brianfife brianfife commented Jun 10, 2019

As I am not in a position to sign the CLA and submit a fix, it would be good to get some maintainers to have a look at this issue.

@dmitshur

This comment has been minimized.

Copy link
Member

@dmitshur dmitshur commented Jun 10, 2019

@brianfife

This comment has been minimized.

Copy link
Author

@brianfife brianfife commented Jul 8, 2019

Sorry to bump, but is anyone able to have a look at this ? @bradfitz @FiloSottile

@dcarney

This comment has been minimized.

Copy link

@dcarney dcarney commented Jul 19, 2019

Thanks for opening this issue @brianfife. Also, thanks for linking to #25009 and #28206, as they appear to be related (even though they were marked as closed as of Go 1.12). I'm on 1.12.7 and I'm seeing this issue with clients connecting to the API I work on.

You did a good job of showing the issue in the actual transport code, but I figured I'd add some of my client debug logs to the issue for additional context.

As you described, I always see this issue manifest after the server sends a GOAWAY frame after a request body has been sent. The internal retry logic in http2shouldRetryRequest in h2_bundle.go calls .GetBody() on the request in order to "recreate" the body to send on a new TCP connection, but the bytes never make it onto the wire:

...
2019/07/18 12:51:41 http2: Framer 0xc00027a000: read GOAWAY len=8 LastStreamID=1999 ErrCode=NO_ERROR Debug=""
2019/07/18 12:51:41 http2: Transport received GOAWAY len=8 LastStreamID=1999 ErrCode=NO_ERROR Debug=""
2019/07/18 12:51:41 http2: Transport failed to get client conn for api.foobar.com:443: http2: no cached connection was available
...
2019/07/18 12:51:41 http2: Transport creating client conn 0xc000508900 to 12.34.56.78:443
2019/07/18 12:51:41 http2: Transport encoding header "content-type" = "application/x-www-form-urlencoded"
2019/07/18 12:51:41 http2: Transport encoding header "content-length" = "51"
2019/07/18 12:51:41 http2: Transport encoding header "accept-encoding" = "gzip"
2019/07/18 12:51:41 http2: Framer 0xc000409500: wrote HEADERS flags=END_HEADERS stream=1 len=135
2019/07/18 12:51:41 http2: Framer 0xc000409500: wrote DATA flags=END_STREAM stream=1 len=0 data=""

In the above, we can clearly see that the request contained a 51 byte, form-encoded body but the H/2 framer never sent it, setting the END_STREAM flag after sending 0 bytes of data.

We typically see this happen when the client is using a moderate amount of concurrency, since the chances are increased that the periodic GOAWAY frames from the server will arrive after a request body has been sent.

Interestingly, the Go bug manifests when the transport is left to choose H/2 on it's own via the connection upgrade codepath (aka TLSNextProto).

So, either of these options will result in a client that does not repro this bug:

// Force H/2 explicitly
client := &http.Client{
    Transport: &http2.Transport{},
}

// OR Force HTTP/1.1 explicitly
client := http.Client{
    Transport: &http.Transport{
        TLSNextProto: make(map[string]func(string, *tls.Conn) http.RoundTripper),
    },
}

// OR, run the repro code with GODEBUG="http2client=0" to disable H/2
brandur-stripe pushed a commit to stripe/stripe-go that referenced this issue Jul 24, 2019
See #866, but once in a while when using HTTP/2, stripe-go sends an
empty request body which then results in a 400 back from the server.

We tracked the issue down to this one open on the Go language's HTTP/2
implementation:

golang/go#32441

For now, the best thing we can do is disable HTTP/2 by default for new
integrations. When the bug is fixed, we can turn it back on again.
@dcarney

This comment has been minimized.

Copy link

@dcarney dcarney commented Aug 20, 2019

To echo @brianfife, I'm also sorry to bump this, but I'm wondering if there's any movement or thoughts on how to move forward with this particular issue? @bradfitz @FiloSottile

@ernado

This comment has been minimized.

Copy link
Contributor

@ernado ernado commented Sep 26, 2019

Correct, body is not re-winded after altRT.RoundTrip here:

if t.useRegisteredProtocol(req) {
altProto, _ := t.altProto.Load().(map[string]RoundTripper)
if altRT := altProto[scheme]; altRT != nil {
if resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol {
return resp, err
}
}
}

I've missed that during #25009 fix because the trigger condition is less frequent and was not covered by my go-only test-case.
This condition triggers when http2Transport.RoundTripOpt is failing to get a connection during retry (e.g. on attempt to retry after GOAWAY when all other connections in pool are terminated), which results to http2noCachedConnError that is transformed into ErrSkipAltProtocol in http2noDialH2RoundTripper.RoundTrip.

If we change to something like that, everything should be OK:

	if t.useRegisteredProtocol(req) {
		altProto, _ := t.altProto.Load().(map[string]RoundTripper)
		if altRT := altProto[scheme]; altRT != nil {
			if resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol {
				return resp, err
			} else if req.GetBody != nil {
				// Rewind the body after ErrSkipAltProtocol, because it can
				// be already used.
				newReq := *req
				var err error
				newReq.Body, err = req.GetBody()
				if err != nil {
					return nil, err
				}
				req = &newReq
			}
		}
	}

I've tested this change with my repro and got zero "400 Bad Request" errors, but I'm not sure that it is correct solution, probably we can refactor something to make this rewind more explicit.

/cc @bradfitz

@bobby-stripe

This comment has been minimized.

Copy link

@bobby-stripe bobby-stripe commented Nov 19, 2019

the above looks reasonable to me, @ernado do you want to open a PR?

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Dec 8, 2019

Change https://golang.org/cl/210123 mentions this issue: net/http: rewind request body on ErrSkipAltProtocol

darkdarkdragon added a commit to livepeer/go-livepeer that referenced this issue Feb 12, 2020
orchestrator and transcoder.
Mitigates issue in the golang's implementation of HTTP2
golang/go#32441
@darkdarkdragon darkdarkdragon mentioned this issue Feb 12, 2020
0 of 3 tasks complete
darkdarkdragon added a commit to livepeer/go-livepeer that referenced this issue Feb 12, 2020
orchestrator and transcoder.
Mitigates issue in the golang's implementation of HTTP2
golang/go#32441
@brianfife

This comment has been minimized.

Copy link
Author

@brianfife brianfife commented Mar 2, 2020

@ernado @bcmills any updates on this?

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Mar 2, 2020

@brianfife, I left a comment on @ernado's CL 210123 indicating what I think is the way forward: namely, we need to add an optional method that a RoundTripper can implement in order to decline a request without reading its Body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.