Skip to content

net/http: when reusing http.DefaultTransport and doing queries in parallel, sometimes all queries are starting to time out #69570

Closed as not planned
@freak12techno

Description

@freak12techno

Go version

go version go1.22.7 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN='/home/monitoring/go/bin'
GOCACHE='/home/monitoring/.cache/go-build'
GOENV='/home/monitoring/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/monitoring/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/monitoring/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.7'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/monitoring/go-test/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build4029050489=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I have a bunch of tools written in golang that use net/http to do queries in parallel. Recently something happened, not sure yet what exactly, but all of the queries started to time out.

All of my tools were using http.DefaultTransport, either explicitly or implicitly.

After investigating, it seems like replacing http.DefaultTransport with its .Clone(), all the problems went away. So I assume there's something mutating http.DefaultTransport or in some other way making it unusable.

This also seem to happen only on one of my servers, and I was not able to reproduce it on my local laptop, not sure what is the difference between the two.

I used this script to reproduce the behaviour and I was successfully able to reproduce it on my server:

package main

import (
	"net/http"
	"fmt"
	"io/ioutil"
	"sync"
)

func main() {
	urls := []string{
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/signing_infos/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_validators/pion-1",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/params",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains_per_validator/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/validator_consumer_addr?chain_id=pion-1&provider_address=cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/commission",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/supply?pagination.limit=10000&pagination.offset=0",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators?pagination.count_total=true&pagination.limit=1000",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/delegators/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j/rewards/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations?pagination.count_total=true&pagination.limit=1",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/params",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_commission_rate/pion-1/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/mint/v1beta1/inflation",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/unbonding_delegations?pagination.count_total=true&pagination.limit=1",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/base/tendermint/v1beta1/node_info",
		"https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/balances/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j",
	}

	var wg sync.WaitGroup

	for {
		for _, url := range urls {
			wg.Add(1)
			go func (url string) {
				defer wg.Done()

				client := &http.Client{
					Timeout:   10 * 1000000000,
					Transport: http.DefaultTransport,
				}
			
				req, err := http.NewRequest(http.MethodGet, url, nil)
				if err != nil {
					fmt.Printf("query failed: %s\n", err)
					return
				}
			
				fmt.Printf("doing query\n")
				res, err := client.Do(req)
				if err != nil {
					fmt.Printf("query failed: %s\n", err)
					return
				}
				body, _ := ioutil.ReadAll(res.Body)
				res.Body.Close()
		
				fmt.Printf("query done: %s\n", body)
			}(url)
		}

		wg.Wait()
	}
	
}

What did you see happen?

After running it for some time, I started seeing this repeatedly (later it starts working correctly, then stops, and all over again):

doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
doing query
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators?pagination.count_total=true&pagination.limit=1000": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains_per_validator/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_validators/pion-1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/unbonding_delegations?pagination.count_total=true&pagination.limit=1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/params": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/mint/v1beta1/inflation": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/balances/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/bank/v1beta1/supply?pagination.limit=10000&pagination.offset=0": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/commission": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/validator_consumer_addr?chain_id=pion-1&provider_address=cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/base/tendermint/v1beta1/node_info": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/signing_infos/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_commission_rate/pion-1/cosmosvalcons1d8h57qw3d2upngacprs4v2gdztq2k3q3ztrqkm": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/staking/v1beta1/validators/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp/delegations?pagination.count_total=true&pagination.limit=1": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/slashing/v1beta1/params": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/interchain_security/ccv/provider/consumer_chains": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
query failed: Get "https://rest.provider-sentry-01.rs-testnet.polypore.xyz/cosmos/distribution/v1beta1/delegators/cosmos1zyqledl5wr5rr8l4sx7lkuf9x4qrudmm727n2j/rewards/cosmosvaloper1zyqledl5wr5rr8l4sx7lkuf9x4qrudmmm72xxp": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

This seems to happen on all (or almost all) hosts that are queried by a client that is utilising this http.DefaultTransport.

What did you expect to see?

All the queries should work and not time out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions