Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: pool deadlock #32388

Open
stevenh opened this issue Jun 2, 2019 · 7 comments

Comments

Projects
None yet
5 participants
@stevenh
Copy link
Contributor

commented Jun 2, 2019

What version of Go are you using (go version)?

go version go1.12.4 freebsd/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GOARCH="amd64"
GOBIN="/data/go/bin"
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOOS="freebsd"
GOPATH="/data/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build232080061=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Use the net/http client to make requests to an http2 supporting server.

What did you expect to see?

Requests should continue to succeed if a single connection becomes blocked in a write.

What did you see instead?

A single blocked connection prevented new http2 connections from being established due to the http2 pool requiring the connection lock when testing to see if an existing connection can handle new requests. This results in the entire application hanging indefinitely.

The following trace was created with an older go version 1.9 from our production app, but I have proved it's still possible to reproduce in the latest version as detailed above, as the locking between connections and pool haven't been changed.

-- goroutine blocked in writeHeaders -> Flush
1 @ 0x43bddc 0x436eea 0x4364e7 0x4a201e 0x4a210d 0x4a3a04 0x577812 0x589dbd 0x5d1887 0x5d1da0 0x5d2cbb 0x662cea 0x6ad1d4 0x52bdce 0x6678dc 0x66680b 0x663215 0x662dda 0x6476b9 0x690a17 0x63b819 0x63b4ad 0x63cb4d 0xb1d000 0xe641df 0xe6213e 0xe8245c 0xe7f42e 0xe7ad28 0xe61e87 0xf598ed 0xf59637
#    0x4364e6    internal/poll.runtime_pollWait+0x56                                                    /usr/local/go/src/runtime/netpoll.go:173
#    0x4a201d    internal/poll.(*pollDesc).wait+0xad                                                    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
#    0x4a210c    internal/poll.(*pollDesc).waitWrite+0x3c                                                /usr/local/go/src/internal/poll/fd_poll_runtime.go:94
#    0x4a3a03    internal/poll.(*FD).Write+0x243                                                        /usr/local/go/src/internal/poll/fd_unix.go:227
#    0x577811    net.(*netFD).Write+0x51                                                            /usr/local/go/src/net/fd_unix.go:220
#    0x589dbc    net.(*conn).Write+0x6c                                                            /usr/local/go/src/net/net.go:188
#    0x5d1886    crypto/tls.(*Conn).write+0x1a6                                                        /usr/local/go/src/crypto/tls/conn.go:832
#    0x5d1d9f    crypto/tls.(*Conn).writeRecordLocked+0x35f                                                /usr/local/go/src/crypto/tls/conn.go:909
#    0x5d2cba    crypto/tls.(*Conn).Write+0x1ba                                                        /usr/local/go/src/crypto/tls/conn.go:1068
#    0x662ce9    net/http.http2stickyErrWriter.Write+0x79                                                /usr/local/go/src/net/http/h2_bundle.go:6803
#    0x6ad1d3    net/http.(*http2stickyErrWriter).Write+0x73                                                <autogenerated>:1
#    0x52bdcd    bufio.(*Writer).Flush+0x7d                                                        /usr/local/go/src/bufio/bufio.go:567
#    0x6678db    net/http.(*http2ClientConn).writeHeaders+0x1eb                                                /usr/local/go/src/net/http/h2_bundle.go:7436
#    0x66680a    net/http.(*http2ClientConn).RoundTrip+0x46a                                                /usr/local/go/src/net/http/h2_bundle.go:7289
#    0x663214    net/http.(*http2Transport).RoundTripOpt+0x144                                                /usr/local/go/src/net/http/h2_bundle.go:6858
#    0x662dd9    net/http.(*http2Transport).RoundTrip+0x39                                                /usr/local/go/src/net/http/h2_bundle.go:6820
#    0x6476b8    net/http.http2noDialH2RoundTripper.RoundTrip+0x38                                            /usr/local/go/src/net/http/h2_bundle.go:990
#    0x690a16    net/http.(*Transport).RoundTrip+0xd46                                                    /usr/local/go/src/net/http/transport.go:371
#    0x63b818    net/http.send+0x1a8                                                            /usr/local/go/src/net/http/client.go:249
#    0x63b4ac    net/http.(*Client).send+0xfc                                                        /usr/local/go/src/net/http/client.go:173
...

-- goroutine waiting for connection lock, held above, while holding the pool lock
1 @ 0x43bddc 0x43bece 0x44d2c4 0x44cfdd 0x48115e 0x66546f 0x6457d8 0x646dce 0x6631c7 0x662dda 0x6476b9 0x690a17 0x63b819 0x63b4ad 0x63cb4d 0xb1d000 0xe641df 0xe6213e 0xe8245c 0xe7f42e 0xe7ad28 0xe61e87 0xf598ed 0xf59637 0x46b1f1
#    0x44cfdc    sync.runtime_SemacquireMutex+0x3c                                                    /usr/local/go/src/runtime/sema.go:71
#    0x48115d    sync.(*Mutex).Lock+0xed                                                            /usr/local/go/src/sync/mutex.go:134
#    0x66546e    net/http.(*http2ClientConn).CanTakeNewRequest+0x3e                                            /usr/local/go/src/net/http/h2_bundle.go:7083
#    0x6457d7    net/http.(*http2clientConnPool).getClientConn+0xe7                                            /usr/local/go/src/net/http/h2_bundle.go:739
#    0x646dcd    net/http.http2noDialClientConnPool.GetClientConn+0x4d                                            /usr/local/go/src/net/http/h2_bundle.go:927
#    0x6631c6    net/http.(*http2Transport).RoundTripOpt+0xf6                                                /usr/local/go/src/net/http/h2_bundle.go:6852
#    0x662dd9    net/http.(*http2Transport).RoundTrip+0x39                                                /usr/local/go/src/net/http/h2_bundle.go:6820
#    0x6476b8    net/http.http2noDialH2RoundTripper.RoundTrip+0x38                                            /usr/local/go/src/net/http/h2_bundle.go:990
#    0x690a16    net/http.(*Transport).RoundTrip+0xd46
...

-- goroutines blocked waiting for the pool lock held above
13033 @ 0x43bddc 0x43bece 0x44d2c4 0x44cfdd 0x48115e 0x645754 0x646dce 0x6631c7 0x662dda 0x6476b9 0x690a17 0x63b819 0x63b4ad 0x63cb4d 0xb1d000 0xe641df 0xe6213e 0xe8245c 0xe7f42e 0xe7ad28 0xe61e87 0xf598ed 0xf59637 0x46b1f1
#   0x44cfdc    sync.runtime_SemacquireMutex+0x3c                                                   /usr/local/go/src/runtime/sema.go:71
#   0x48115d    sync.(*Mutex).Lock+0xed                                                         /usr/local/go/src/sync/mutex.go:134
#   0x645753    net/http.(*http2clientConnPool).getClientConn+0x63                                          /usr/local/go/src/net/http/h2_bundle.go:737
#   0x646dcd    net/http.http2noDialClientConnPool.GetClientConn+0x4d                                           /usr/local/go/src/net/http/h2_bundle.go:927
#   0x6631c6    net/http.(*http2Transport).RoundTripOpt+0xf6                                                /usr/local/go/src/net/http/h2_bundle.go:6852
#   0x662dd9    net/http.(*http2Transport).RoundTrip+0x39                                               /usr/local/go/src/net/http/h2_bundle.go:6820
#   0x6476b8    net/http.http2noDialH2RoundTripper.RoundTrip+0x38                                           /usr/local/go/src/net/http/h2_bundle.go:990
#   0x690a16    net/http.(*Transport).RoundTrip+0xd46                                                   /usr/local/go/src/net/http/transport.go:371
#   0x63b818    net/http.send+0x1a8                                                         /usr/local/go/src/net/http/client.go:249
#   0x63b4ac    net/http.(*Client).send+0xfc                                                        /usr/local/go/src/net/http/client.go:173
#   0x63cb4c    net/http.(*Client).Do+0x28c                                                     /usr/local/go/src/net/http/client.go:602

This is similar to #23559 however that bug is specifically about certain areas of the http2 connections not paying attention to timeouts whereas this bug is caused by the interaction of individual connection locks and the pool lock.

@gopherbot

This comment has been minimized.

Copy link

commented Jun 2, 2019

Change https://golang.org/cl/179938 mentions this issue: http2: Prevent pool blocking when testing a conn for a new request

@bcmills bcmills added this to the Unreleased milestone Jun 3, 2019

@bcmills

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

@bradfitz

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

Whelp....

        // TODO(bradfitz): this Flush could potentially block (as
        // could the WriteHeaders call(s) above), which means they
        // wouldn't respond to Request.Cancel being readable. That's
        // rare, but this should probably be in a goroutine.
        cc.bw.Flush()
        return cc.werr
}

I haven't reviewed CL 179938 but it seems a bit long/invasive. I wonder why the mutex needs to be held during that write.

Rather than change everything so CanTakeRequest can work without taking the lock, can we instead not hold the lock during the write and not modify CanTakeRequest?

@stevenh

This comment has been minimized.

Copy link
Contributor Author

commented Jun 3, 2019

I tried that but because cc.bw.Write(...) modifies cc.werr it requires the lock.

It may be that adding a lock just around the update of werr would be possible but given the wide scope of the current lock, it would hard to prove it's not needed elsewhere too.

@fraenkel

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2019

There seems to be a mix of ways the locks are acquired but it is always mu then wmu. I can see value in acquiring wmu under mu and then releasing if we are trying to guarantee some ordering.
I didn't submit any CL because I am lacking a testcase that shows the issue. When I try the testcase in the submitted CL, it fails the same way using the current code as well as when I move the lock.
If I just glance at the testcase, I don't see how this is testing the right behavior. The server is closed after the first connection and we are only allowed 1 dial, so the second request fails with too many dials.

@stevenh

This comment has been minimized.

Copy link
Contributor Author

commented Jun 8, 2019

The test only talks to one endpoint so the http/2 code will try to make these requests over the same TCP connection; it's designed to validate that the first request doesn't block the second request due to a stalled write; in this situation, it should just create a new connection but as the CanTakeRequest blocks it can't do that.

You are correct in that mu is always acquired before wmu if holding both as documented here:
https://github.com/golang/net/blob/master/http2/transport.go#L240

@gopherbot

This comment has been minimized.

Copy link

commented Jun 9, 2019

Change https://golang.org/cl/181457 mentions this issue: http2: reduce lock contention when writing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.