Skip to content

net/http: dial should not be canceled if an idle connection is reused #66442

Closed
@bancek

Description

@bancek

Go version

go1.22.1 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.1'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3581231785=/tmp/go-build -gno-record-gcc-switches'

What did you do?

When the HTTP transport processes a new request it first tries to get an idle connection (if keep alive is not disabled). If there are no idle connection it will dial a new connection. If an idle connection becomes ready while dialing the new connection, the idle connection will be reused and the new connection will be put to idle queue (if there is space). If the request's context is canceled, both the request's and the new connection's context will be canceled.

This causes problems when canceling a context after the response has already been read and closed but the new connection dial has not yet completed (e.g. TLS handshake not yet done). In this case the new connection will be closed. This may cause a lot of new connections to be created and closed (MaxConnsPerHost is ignored). If this repeats a lot, too many TCP connections are stuck in CLOSE_WAIT state and new connections cannot be created.

This only affects HTTP/1 (HTTP and HTTPS).

Example:

  • Requests are created with context cancellation and canceled when the response is read and closed
  • R1 dials C1 and uses C1
  • R2 dials C2 but while waiting for the C2's TLS handshake, C1 becomes idle and is reused - R2 uses C1
  • R2 finishes, its response body is closed and R2's context is canceled
  • C2 is still handling TLS handshake and R2's context cancellation causes C2 to be closed

Example code:

package main

import (
	"context"
	"fmt"
	"io"
	"net/http"
	"net/http/httptest"
	"sync"
)

func main() {
	server := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {}))
	client := server.Client()
	transport := client.Transport.(*http.Transport)
	transport.MaxIdleConnsPerHost = 2

	requests := 100
	// concurrency must be > transport.MaxIdleConnsPerHost
	concurrency := 10

	makeReq := func() {
		ctx := context.Background()
		ctx, cancel := context.WithCancel(ctx)
		defer cancel()

		req, _ := http.NewRequestWithContext(ctx, "GET", server.URL, nil)
		res, err := transport.RoundTrip(req)
		if err != nil {
			fmt.Println(err)
			return
		}
		io.Copy(io.Discard, res.Body)
		res.Body.Close()
	}

	reqChan := make(chan struct{}, requests)
	for i := 0; i < requests; i++ {
		reqChan <- struct{}{}
	}
	close(reqChan)

	var wg sync.WaitGroup

	wg.Add(concurrency)

	for i := 0; i < concurrency; i++ {
		go func() {
			for range reqChan {
				makeReq()
			}
			wg.Done()
		}()
	}

	wg.Wait()
}

What did you see happen?

$ go run .
read tcp 127.0.0.1:55248->127.0.0.1:54313: read: connection reset by peer
read tcp 127.0.0.1:55246->127.0.0.1:54313: read: connection reset by peer
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55857: EOF
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55848: EOF
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55852: EOF
read tcp 127.0.0.1:55250->127.0.0.1:54313: read: connection reset by peer
2024/03/21 11:23:07 http: TLS handshake error from 127.0.0.1:55850: EOF

With higher concurrency even setting transport.MaxConnsPerHost does not help. The setting fixes request errors but still creates too many TCP connections.

transport.MaxIdleConnsPerHost = 100
transport.MaxConnsPerHost = 100
concurrency := 100

What did you expect to see?

$ go run .

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions