Description
Go version
go1.20.13 darwin/arm64, go1.21.6 darwin/arm64
Output of go env
in your module/workspace:
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/bob/Library/Caches/go-build"
GOENV="/Users/bob/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/bob/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/bob/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/bob/go/current"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/bob/go/current/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20.13"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/bob/cdc-sink/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/2x/kgchp7tx4938vcs04rk3nyz40000gq/T/go-build1903809151=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
It appears that if http.Client.Do()
encounters a client-initiated timeout, it can sometimes returns a nil error and a server response if the http.Request
object has been reused. This only appears to happen if the client is speaking HTTP/2.
Here's a minimum reproduction case: https://gist.github.com/bobvawter/c5fb99bbc8de533bb53881fe78b8db44
In the linked repro, there's an HTTP/2 enabled server that shows a pathologically slow endpoint. This endpoint returns 200/OK if it is able to completely consume the request body, or 400/Bad Request if the body can't be read after a delay. The http.Client
is configured with a timeout that will fire most of the time. The client loop retries the request if there is an error, since we can inspect http.NewRequestWithContext()
to see that its implementation of GetBody
makes copies of a bytes.Reader
.
After about 30 seconds of running the test on my M1 laptop, the repro will demonstrate that a timed-out request to the server can cause Client.Do()
to return a nil error and that the HTTP 400 status code is visible to the client. We often, but not always, see the expected "context deadline exceeded" error.
The final log output that the repro creates before halting will look like:
repro_400_test.go:141: request declared a Content-Length of 6228922 but only wrote 0 bytes
repro_400_test.go:124: BAD STATUS 400
The message from line 141 is an error generated by reading the http.Request.Body
. There are cases too, where the body is some non-zero, but partial, amount of the content length. Since only the client-configured timeout should be introducing errors, it's odd that we see the resulting HTTP 400 sent by the handler.
This repro proceeds as expected if:
- HTTP/2 is turned off in the client.
- The request object is not reused.
What did you see happen?
The repro case linked above gets to the block identified as ISSUE
, where client.Do()
has returned a nil
error and we see the 400 status code returned by the trivial http handler appear, the BAD STATUS
message.
What did you expect to see?
I would expect that if the client timeout fires or the client is unable to transmit the entire payload to the server, that http.Client.Do()
should return a non-nil error.
I would also expect the behaviors to be broadly similar between HTTP/1.1 and HTTP/2 protocols.