New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: misbehaved streams can cause connections to exhaust flow control #28204

Open
jared2501 opened this Issue Oct 15, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@jared2501

jared2501 commented Oct 15, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.11 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jnewman/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jnewman/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_t/hg9f_j4x1q743pqh00kdv6m00000gn/T/go-build803039253=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do? / What did you expect to see? / What did you see instead?

Use httputil.ReverseProxy with an http2.Transport. I'm still trying to find a minimal reproducible example.

Basically, the http2.Transport is closing an HTTP/2 stream and then sending a number of DATA frames on the connection. The http2.Server is detecting that the stream has been half-closed by the remote and is following RFC7540 and responding with an ErrCodeStreamClosed (see http2/server.go).

However, since the http2.Server does not send a window update the remote http2.Transport does not add back the sent DATA frames to its flow control. This causes a re-used http2.Transport to eventually expend all connection flow control and halt sending.

A proposed fix that I've applied to my fork is to make the following patch

diff --git a/http2/server.go b/http2/server.go
index 56859d1..7a39456 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1586,6 +1586,8 @@ func (sc *serverConn) processData(f *DataFrame) error {
        // "open" or "half-closed (local)" state, the recipient MUST respond with a
        // stream error (Section 5.4.2) of type STREAM_CLOSED.
        if state == stateClosed {
+               sc.inflow.take(int32(f.Length))
+               sc.sendWindowUpdate(nil, int(f.Length))
                return streamError(id, ErrCodeStreamClosed)
        }
        if st == nil || state != stateOpen || st.gotTrailerHeader || st.resetQueued {

Although, one could argue that http2.Transport/httputil.Reverse proxy should be fixed to make sure that data frames aren't sent after a close.

@jared2501

This comment has been minimized.

jared2501 commented Oct 15, 2018

I think this might be related to 8f38f28#diff-d863507a61be206d112f6e00e6d812a2 since it appears that the http.Server marks the stream as stateClosed if the handler panics, even though the remote side hasn't sent a RESET_STREAM frame. Therefore, the stream should be in "half-closed (local)" and so should send back WINDOW_UPDATE frames but it's not.

@FiloSottile

This comment has been minimized.

Member

FiloSottile commented Oct 15, 2018

/cc @bradfitz

@bradfitz

This comment has been minimized.

Member

bradfitz commented Oct 17, 2018

Can you capture the logs (from setting GODEBUG=http2debug=2) while this happens?

@jared2501

This comment has been minimized.

jared2501 commented Oct 17, 2018

@prashantv

This comment has been minimized.

Contributor

prashantv commented Dec 8, 2018

We ran into the same issue, and I've put together a small repro using the gRPC client, and a gRPC server hosted via the x/net HTTP/2 server:
https://github.com/prashantv/grpc-http2-repro

This example doesn't contain use the reverse proxy, and the issue is exactly as @jared2501 pointed out -- when the server sends RST_STREAM with stream closed, it's not updating flow control, and so the client stops sending data.

When this test is run, the client reports a "deadline exceeded" (since it's waiting to send data, but never gets to it since the window is full):

--- FAIL: TestGRPCLargePayload (1.02s)
        require.go:794:
                        Error Trace:    repro_test.go:58
                                                                repro_test.go:59
                        Error:          Received unexpected error:
                                        rpc error: code = DeadlineExceeded desc = context deadline exceeded
                        Test:           TestGRPCLargePayload
                        Messages:       Expect following requests to succeed

I have the GODEBUG=http2debug=2 logs:
https://gist.github.com/prashantv/e181206b39b030ffea44b05f16596ac1

The client doesn't stop sending frames, and so the server keeps responding with:
RST_STREAM stream=1 len=4 ErrCode=STREAM_CLOSED. However, there's no corresponding WINDOW_UPDATE (which it did before the stream was closed).

This seems like a regression introduced in golang/net@039a425 -- before that change, the code would send a window update.

I also have the "fix" (update flow-control) on a branch called "fix", where the test mostly passes. However, around 1-2% of the time, it still fails with the same error -- I think there might be other paths where the server sends RST_STREAM without taking flow control into account.

@jared2501

This comment has been minimized.

jared2501 commented Dec 8, 2018

Hey @prashantv - glad to see you can repro my issue! I looked over your example and it's possible you're also being hit by #28634. Although, even with patches for this issue and #28634, I was still seeing some flow control issues when using grpc with a context with a timeout...

@fraenkel

This comment has been minimized.

Contributor

fraenkel commented Dec 11, 2018

@prashantv You are correct. I did introduce a bug, but there is another one there too. Once we receive a data frame that does not cause a connection error, we must account for the data. The take() logic should be before all the additional state checks regardless of the path taken.

I believe there may also be a similar bug on the client side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment