Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: transport race in closing stream while processing DATA frame, kills connection. #47076

Open
amandachow opened this issue Jul 7, 2021 · 3 comments

Comments

@amandachow
Copy link

@amandachow amandachow commented Jul 7, 2021

What version of Go are you using (go version)?

$ go version
go1.16.5 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/amandachow/Library/Caches/go-build"
GOENV="/Users/amandachow/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/amandachow/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/amandachow/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.5"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_f/1tl8g2w95wx34b7zp53y0kqc0000gp/T/go-build310981905=/tmp/go-build -gno-record-gcc-switches -fno-common"

What's the issue?

I use httputil.ReverseProxy with http2.Transport. Downstream clients were hitting what looked like connection flakes - Either getting streams closed with RST_STREAM, or receiving an error response from our ErrorHandler. After some investigation, I found that it hit a race condition when a stream is closed on the http2.Server side, and on the http2.Transport side the stream is closed while a DATA frame is being processed. This caused the entire TCP client connection to error and close. It is the same issue described here: https://go.googlesource.com/net/+/6c4ac8bdbf06a105c4baf3dcda28bd9b0fb15588

The stream is getting closed between the clientconn unlock and the bufpipe write. It errors on attempting to write to a closed bufpipe, and as a result closes the entire clientconn, erroring all other open streams.

I tried holding onto the lock for the bufpipe write. I made the following change and ran it into production, and I saw the error stop:

                // Check connection-level flow control.
                cc.mu.Lock()
+               defer cc.mu.Unlock()
                if cs.inflow.available() >= int32(f.Length) {
                        cs.inflow.take(int32(f.Length))
                } else {
-                       cc.mu.Unlock()
                        return ConnectionError(ErrCodeFlowControl)
                }
                // Return any padded flow control now, since we won't
@@ -2246,11 +2246,13 @@ func (rl *clientConnReadLoop) processData(f *DataFrame) error {
                        cc.bw.Flush()
                        cc.wmu.Unlock()
                }
-               cc.mu.Unlock()
                if len(data) > 0 && !didReset {
                        if _, err := cs.bufPipe.Write(data); err != nil {
                                rl.endStreamError(cs, err)
                                return err
                        }
@mknyszek mknyszek changed the title x/net/http2: Transport race in closing stream while processing DATA frame, kills connection. x/net/http2: transport race in closing stream while processing DATA frame, kills connection. Jul 7, 2021
@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Jul 7, 2021

@jared2501
Copy link

@jared2501 jared2501 commented Aug 14, 2021

@neild quick ping on this issue! We're still seeing this in our PROD environments.

@amandachow
Copy link
Author

@amandachow amandachow commented Sep 2, 2021

Hey, friendly ping on this - Any other details you need @neild ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants