Skip to content

net/http: Client.Do() in HTTP/2 mode returns unexpected nil error after client timeout #65375

Closed
@bobvawter

Description

@bobvawter

Go version

go1.20.13 darwin/arm64, go1.21.6 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/bob/Library/Caches/go-build"
GOENV="/Users/bob/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/bob/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/bob/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/bob/go/current"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/bob/go/current/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20.13"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/bob/cdc-sink/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/2x/kgchp7tx4938vcs04rk3nyz40000gq/T/go-build1903809151=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

It appears that if http.Client.Do() encounters a client-initiated timeout, it can sometimes returns a nil error and a server response if the http.Request object has been reused. This only appears to happen if the client is speaking HTTP/2.

Here's a minimum reproduction case: https://gist.github.com/bobvawter/c5fb99bbc8de533bb53881fe78b8db44

In the linked repro, there's an HTTP/2 enabled server that shows a pathologically slow endpoint. This endpoint returns 200/OK if it is able to completely consume the request body, or 400/Bad Request if the body can't be read after a delay. The http.Client is configured with a timeout that will fire most of the time. The client loop retries the request if there is an error, since we can inspect http.NewRequestWithContext() to see that its implementation of GetBody makes copies of a bytes.Reader.

After about 30 seconds of running the test on my M1 laptop, the repro will demonstrate that a timed-out request to the server can cause Client.Do() to return a nil error and that the HTTP 400 status code is visible to the client. We often, but not always, see the expected "context deadline exceeded" error.

The final log output that the repro creates before halting will look like:

repro_400_test.go:141: request declared a Content-Length of 6228922 but only wrote 0 bytes
repro_400_test.go:124: BAD STATUS 400

The message from line 141 is an error generated by reading the http.Request.Body. There are cases too, where the body is some non-zero, but partial, amount of the content length. Since only the client-configured timeout should be introducing errors, it's odd that we see the resulting HTTP 400 sent by the handler.

This repro proceeds as expected if:

  • HTTP/2 is turned off in the client.
  • The request object is not reused.

What did you see happen?

The repro case linked above gets to the block identified as ISSUE, where client.Do() has returned a nil error and we see the 400 status code returned by the trivial http handler appear, the BAD STATUS message.

What did you expect to see?

I would expect that if the client timeout fires or the client is unable to transmit the entire payload to the server, that http.Client.Do() should return a non-nil error.

I would also expect the behaviors to be broadly similar between HTTP/1.1 and HTTP/2 protocols.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions