Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: a canceled request context can deadlock other requests on the same connection #52853

Open
milesbxf opened this issue May 11, 2022 · 1 comment
Labels
NeedsFix
Milestone

Comments

@milesbxf
Copy link

@milesbxf milesbxf commented May 11, 2022

What version of Go are you using (go version)?

$ go version
go version go1.17.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes, this issue reproduces with golang.org/x/net v0.0.0-20220425223048-2871e0cb64e4 (see reproduction below)

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/milesbryant/Library/Caches/go-build"
GOENV="/Users/milesbryant/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/milesbryant/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/milesbryant"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go@1.16/1.16.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go@1.16/1.16.13/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.6"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

We enabled HTTP/2 for an internal proxy written in Go, and noticed that after 10 minutes, every connection into the proxy seemed to be hanging. A goroutine dump revealed that many goroutines were blocked on acquiring the ClientConn mutex (e.g. see https://github.com/golang/net/blob/2871e0cb64e47e47ba7466fad6e11562caf99563/http2/transport.go#L790).

One goroutine, however, had acquired the mutex and was blocked in abortStreamLocked trying to close the request body: https://github.com/golang/net/blob/2871e0cb64e47e47ba7466fad6e11562caf99563/http2/transport.go#L379. In our case this seemed to be indefinitely blocked trying to read from a network socket. We haven't figured out why yet, but that's a separate issue.

I've knocked up a very crude and simple reproduction here - I'll try and come back and tidy this up a bit later: https://github.com/milesbxf/go-http2-request-close-deadlock/blob/main/deadlock_test.go

For this to occur:

  • One request/stream has its context cancelled, causing the abortStream
  • The same request/stream also blocks on requestBody.Close
  • Other requests on the same connection try to acquire the mutex

Is this actually expected/intended behaviour? I.e. do we require that closing the request body should always be non-blocking? If so, I guess we could mitigate by increasing the concurrent connection count, but it feels like we could still hit the deadlock across multiple connections

What did you expect to see?

Other requests be able to make progress whilst the first request is blocked

What did you see instead?

All requests on the same client connection were blocked.

Other things

I'm very happy to attempt a fix, but I'm a bit of a novice in this part of the codebase!

@heschi heschi added the NeedsFix label May 11, 2022
@heschi heschi added this to the Go1.19 milestone May 11, 2022
@heschi
Copy link
Contributor

@heschi heschi commented May 11, 2022

cc @neild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix
Projects
None yet
Development

No branches or pull requests

2 participants