x/net/http2: Blocked Write on single connection causes all future calls to block indefinitely

### What version of Go are you using (`go version`)?

<pre>
$ go version
go version go1.12.7 linux/amd64
</pre>

### Does this issue reproduce with the latest release?
Yes.

### What operating system and processor architecture are you using (`go env`)?

<details><summary><code>go env</code> Output</summary><br><pre>
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/prashant/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/prashant/go"
GOPROXY=""
GORACE=""
GOROOT="/home/prashant/.gimme/versions/go1.12.7.linux.amd64"
GOTMPDIR=""
GOTOOLDIR="/home/prashant/.gimme/versions/go1.12.7.linux.amd64/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build705930436=/tmp/go-build -gno-record-gcc-switches"

</pre></details>

### What did you do?

Used the HTTP/2 library to make calls as a client. One of the backends stalled and stopped reading from the TCP connection (although it was still running and the TCP connection was active).

This caused the TCP write buffer on the client to fill up.

### What did you expect to see?

I expected that all future calls to the blocked server would timeout eventually, and that calls to any other server would not be affected.

### What did you see instead?

All calls on the transport are blocked forever, including calls to working backends. Stack traces should the calls are blocked on a mutex, so any timeouts are ignored.

### Repro

Repro: https://github.com/prashantv/go-http2-stuck-repro/

The repro sets up 2 "backends":
`stuckURL`: Accepts a TCP connection and does not read from the connection, to simulate a stuck backend
`normal1URL`: Echoes the request body back to the caller.


1. A call is made to `stuckURL` with a payload that never completes. This causes the TCP write buffer to fill up, and the `Write` to block, while holding the `*ClientConn.wmu` (write lock on the connection). We sleep a second to make sure the TCP buffer fills up, and the write is blocked.

<details><summary>Stack trace</summary><pre>
goroutine 44 [IO wait]:
internal/poll.runtime_pollWait(0x7f74ab342b98, 0x77, 0xffffffffffffffff)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/netpoll.go:182 +0x56
internal/poll.(*pollDesc).wait(0xc000144598, 0x77, 0x4000, 0x4800, 0xffffffffffffffff)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/internal/poll/fd_poll_runtime.go:87 +0x9b
internal/poll.(*pollDesc).waitWrite(...)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/internal/poll/fd_poll_runtime.go:96
internal/poll.(*FD).Write(0xc000144580, 0xc0001f2000, 0x4009, 0x4800, 0x0, 0x0, 0x0)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/internal/poll/fd_unix.go:276 +0x26e
net.(*netFD).Write(0xc000144580, 0xc0001f2000, 0x4009, 0x4800, 0x3, 0x3, 0xc0001aadaa)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc00013c0a0, 0xc0001f2000, 0x4009, 0x4800, 0x0, 0x0, 0x0)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/net.go:189 +0x69
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.stickyErrWriter.Write(0x93f320, 0xc00013c0a0, 0xc000195058, 0xc0001f2000, 0x4009, 0x4800, 0x4009, 0x0, 0x0)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:372 +0x68
bufio.(*Writer).Write(0xc000130480, 0xc0001f2000, 0x4009, 0x4800, 0xc000034070, 0xc000034000, 0x0)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/bufio/bufio.go:622 +0x145
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Framer).endWrite(0xc0001ea000, 0x0, 0x0)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/frame.go:366 +0xab
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Framer).WriteDataPadded(0xc0001ea000, 0x3, 0xc0001ee000, 0x4000, 0x4000, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/frame.go:685 +0x25d
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Framer).WriteData(...)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/frame.go:643
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientStream).writeRequestBody(0xc000154640, 0x7f74ab343070, 0xc000138600, 0x7f74ab343090, 0xc000138600, 0x0, 0x0)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:1270 +0x337
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).getBodyWriterState.func1()
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:2462 +0xc2
created by github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.bodyWriterState.scheduleBodyWrite
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:2509 +0xa3


</pre></details>

2. Another call to `stuckURL` is made, which is blocked in `roundTrip` trying to get the `*ClientConn.wmu` (since (1) holds this lock), and with the `*ClientConn.mu` lock held.

<details><summary>Stack trace</summary><pre>

goroutine 46 [semacquire]:
sync.runtime_SemacquireMutex(0xc000195054, 0x40de00)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000195050)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/sync/mutex.go:134 +0x109
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*ClientConn).roundTrip(0xc000194f00, 0xc000204100, 0x0, 0x0, 0x0, 0x0)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:1024 +0x467
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt(0xc00013a4e0, 0xc000204100, 0xc000118000, 0xc0001164b0, 0xc00013e03c, 0xc00013e040)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:447 +0x172
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTrip(0xc00013a4e0, 0xc000204100, 0xc00013a4e0, 0xbf490e456743e889, 0xe34199ec9)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:408 +0x3a
net/http.send(0xc000204000, 0x93f060, 0xc00013a4e0, 0xbf490e456743e889, 0xe34199ec9, 0xc6c560, 0xc00011c050, 0xbf490e456743e889, 0x1, 0x0)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:250 +0x461
net/http.(*Client).send(0xc000134db0, 0xc000204000, 0xbf490e456743e889, 0xe34199ec9, 0xc6c560, 0xc00011c050, 0x0, 0x1, 0xc000020a80)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:174 +0xfb
net/http.(*Client).do(0xc000134db0, 0xc000204000, 0x0, 0x0, 0x0)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:641 +0x279
net/http.(*Client).Do(...)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:509
net/http.(*Client).Get(0xc000134db0, 0xc000136180, 0x16, 0x8b330c, 0xf, 0x93f020)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:398 +0x9e
created by github.com/prashantv/go-http2-stuck-repro.TestSendLargeUnreadPayload
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/repro_test.go:46 +0x402
</pre></details>

3. Another call to `stuckURL` grabs the connection pool lock, `*clientConnPool.mu`, then iterates over all connections to that address to check the `idleState()` which tries to grab `*ClientConn.mu`. This blocks since (2) is already holding this lock.

<details><summary>Stack trace</summary><pre>
goroutine 45 [semacquire]:
sync.runtime_SemacquireMutex(0xc000194f54, 0xc00013e000)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000194f50)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/sync/mutex.go:134 +0x109
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*ClientConn).idleState(0xc000194f00, 0xc000130000)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:718 +0x41
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn(0xc000134fc0, 0xc000204300, 0xc00013e080, 0xf, 0xc00013a501, 0xc0001a5a18, 0x7c1030, 0xc00013a4e0)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:89 +0xeb
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).GetClientConn(0xc000134fc0, 0xc000204300, 0xc00013e080, 0xf, 0xc00013e080, 0xf, 0x2)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:47 +0x4e
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt(0xc00013a4e0, 0xc000204300, 0xc000118100, 0xc0001165d0, 0xc00013e06c, 0xc00013e070)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:440 +0x105
github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTrip(0xc00013a4e0, 0xc000204300, 0xc00013a4e0, 0xbf490e456749078a, 0xe341ebd66)
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:408 +0x3a
net/http.send(0xc000204200, 0x93f060, 0xc00013a4e0, 0xbf490e456749078a, 0xe341ebd66, 0xc6c560, 0xc00011c080, 0xbf490e456749078a, 0x1, 0x0)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:250 +0x461
net/http.(*Client).send(0xc000134db0, 0xc000204200, 0xbf490e456749078a, 0xe341ebd66, 0xc6c560, 0xc00011c080, 0x0, 0x1, 0xc000020a80)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:174 +0xfb
net/http.(*Client).do(0xc000134db0, 0xc000204200, 0x0, 0x0, 0x0)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:641 +0x279
net/http.(*Client).Do(...)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:509
net/http.(*Client).Get(0xc000134db0, 0xc000136180, 0x16, 0x8b330c, 0xf, 0x93f020)
	/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:398 +0x9e
created by github.com/prashantv/go-http2-stuck-repro.TestSendLargeUnreadPayload
	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/repro_test.go:46 +0x402
</pre></details>

4. Now any calls made using the same `*clientConnPool` (e.g., any call through the same client/transport) block on trying to get `*clientConnPool.mu`.

<details><summary>Stack trace</summary><pre>
3 @ 0x430b8f 0x440d79 0x440d4f 0x440aed 0x4663d9 0x78e6c6 0x78e62e 0x7a8b85 0x7a86ba 0x6af511 0x6aef0b 0x6b0519 0x6b004e 0x6b003b 0x45d481
#	0x440aec	sync.runtime_SemacquireMutex+0x3c										/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/runtime/sema.go:71
#	0x4663d8	sync.(*Mutex).Lock+0x108											/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/sync/mutex.go:134
#	0x78e6c5	github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn+0x65	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:87
#	0x78e62d	github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*clientConnPool).GetClientConn+0x4d	/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/client_conn_pool.go:47
#	0x7a8b84	github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt+0x104		/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:440
#	0x7a86b9	github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2.(*Transport).RoundTrip+0x39		/home/prashant/go/src/github.com/prashantv/go-http2-stuck-repro/vendor/golang.org/x/net/http2/transport.go:408
#	0x6af510	net/http.send+0x460												/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:250
#	0x6aef0a	net/http.(*Client).send+0xfa											/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:174
#	0x6b0518	net/http.(*Client).do+0x278											/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:641
#	0x6b004d	net/http.(*Client).Do+0x9d											/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:509
#	0x6b003a	net/http.(*Client).Get+0x8a											/home/prashant/.gimme/versions/go1.12.7.linux.amd64/src/net/http/client.go:398
</pre></details>


I've reduced the size of TCP read/write buffers to trigger the issue more quickly.

The repro includs a flag for controlling steps 2/3. Setting the `--num-stuck-calls` to 2 or higher will reproduce the issue (the echo calls will fail), while 0 or 1 will cause the test to pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

x/net/http2: Blocked Write on single connection causes all future calls to block indefinitely #33425

What version of Go are you using (`go version`)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

What did you do?

What did you expect to see?

What did you see instead?

Repro

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

x/net/http2: Blocked Write on single connection causes all future calls to block indefinitely #33425

Description

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Repro

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?