Description
There is a regression in go 1.17.6 that causes a long-lived goroutine to remain blocked after http.client/setRequestCancel()
when using a non-http
Transport custom type.
This regression was introduced by https://go-review.googlesource.com/c/go/+/361919
We've analyzed the cause of the issue.
If all the following conditions are met:
- there's no
.Cancel
channel defined on the Request - there's a non-zero
Timeout
on the Client, set to X seconds - the client's
Transport
field is set to an object that has a different type than those recognized inknownRoundTripperImpl()
(eitherhttp.Transport
,http.http2Transport
orhttp2noDialH2RoundTripper
) - the request completes
Then the goroutine spawned at the end of setRequestCancel()
remains running for X seconds even if the request completes in a short time.
Prior to go 1.17.6, this goroutine would terminate shortly after the body was read successfully.
The problem was introduced by this change, which removed the .stop()
call that would clean up the waiting goroutine.
This is a problem because it's common for the client Timeout to be set to a large amount of time (say, 30 seconds), and then issue thousands of requests to a local HTTP server. With this regression, we see thousands of goroutines remaining running, creating lag and bloated goroutine dumps.
Note also that this problem does not occur when using http
's own Transport
type, because then we get knownTransport == true
and setRequestCancel
returns a regular deadline-based context. This is probably why go's own test suite did not detect the regression?
What version of Go are you using (go version
)?
go version go1.17.6
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (go env
)?
Issue reproduces on all platforms
What did you do?
- created http.Client with custom
Transport
(custom go type) - set non-zero Timeout (say,30 seconds)
- sent request to some server with this client
- observed remaining goroutines after request completes, remaining blocked for the duration set in Timeout
(This is part of CockroachDB's unit tests. Internal reference: cockroachdb/cockroach#74657)
What did you expect to see?
No remaining goroutine.
What did you see instead?
Leaked goroutine: goroutine 9374 [select]:
net/http.setRequestCancel.func4(0x0, 0xc00706f920, 0xc001c80af0, 0xc005640bcc, 0xc006a39320)
/usr/local/go/src/net/http/client.go:398 +0xb0
created by net/http.setRequestCancel
/usr/local/go/src/net/http/client.go:397 +0x497