Skip to content

net/http: go 1.17.6's http2-related cancellation change causes long-lived goroutine with custom Transport types #50652

Closed
@knz

Description

@knz

There is a regression in go 1.17.6 that causes a long-lived goroutine to remain blocked after http.client/setRequestCancel() when using a non-http Transport custom type.
This regression was introduced by https://go-review.googlesource.com/c/go/+/361919

We've analyzed the cause of the issue.
If all the following conditions are met:

  • there's no .Cancel channel defined on the Request
  • there's a non-zero Timeout on the Client, set to X seconds
  • the client's Transport field is set to an object that has a different type than those recognized in knownRoundTripperImpl() (either http.Transport, http.http2Transport or http2noDialH2RoundTripper)
  • the request completes

Then the goroutine spawned at the end of setRequestCancel() remains running for X seconds even if the request completes in a short time.

Prior to go 1.17.6, this goroutine would terminate shortly after the body was read successfully.
The problem was introduced by this change, which removed the .stop() call that would clean up the waiting goroutine.

This is a problem because it's common for the client Timeout to be set to a large amount of time (say, 30 seconds), and then issue thousands of requests to a local HTTP server. With this regression, we see thousands of goroutines remaining running, creating lag and bloated goroutine dumps.

Note also that this problem does not occur when using http's own Transport type, because then we get knownTransport == true and setRequestCancel returns a regular deadline-based context. This is probably why go's own test suite did not detect the regression?

What version of Go are you using (go version)?

go version go1.17.6

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

Issue reproduces on all platforms

What did you do?

  1. created http.Client with custom Transport (custom go type)
  2. set non-zero Timeout (say,30 seconds)
  3. sent request to some server with this client
  4. observed remaining goroutines after request completes, remaining blocked for the duration set in Timeout

(This is part of CockroachDB's unit tests. Internal reference: cockroachdb/cockroach#74657)

What did you expect to see?

No remaining goroutine.

What did you see instead?

 Leaked goroutine: goroutine 9374 [select]:
        net/http.setRequestCancel.func4(0x0, 0xc00706f920, 0xc001c80af0, 0xc005640bcc, 0xc006a39320)
                /usr/local/go/src/net/http/client.go:398 +0xb0
        created by net/http.setRequestCancel
                /usr/local/go/src/net/http/client.go:397 +0x497

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions