Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: internal error: connCount underflow #61474

Open
zzkcode opened this issue Jul 20, 2023 · 9 comments
Open

net/http: internal error: connCount underflow #61474

zzkcode opened this issue Jul 20, 2023 · 9 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@zzkcode
Copy link
Contributor

zzkcode commented Jul 20, 2023

What version of Go are you using (go version)?

$ go version
go version go1.18.3 linux/amd64

Does this issue reproduce with the latest release?

Haven't tried it yet. Will try it when we have a producer.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1421070104=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We are piloting limited MaxConnsPerHost on several PRD instances, and running about 1~2 days separately. And one of them crashed with: panic: net/http: internal error: connCount underflow.

What did you expect to see?

No crash.

What did you see instead?

panic: net/http: internal error: connCount underflow.

Below are the details by examining the core file, and since there are some locals and args that seem to be optimized out or something, I try jump between frames to print out the details.

bt:

(dlv) bt
 0  0x000000000046e601 in runtime.raise
    at /usr/local/go/src/runtime/sys_linux_amd64.s:168
 1  0x000000000044fc25 in runtime.dieFromSignal
    at /usr/local/go/src/runtime/signal_unix.go:852
 2  0x00000000004505f6 in runtime.sigfwdgo
    at /usr/local/go/src/runtime/signal_unix.go:1066
 3  0x000000000044e947 in runtime.sigtrampgo
    at /usr/local/go/src/runtime/signal_unix.go:430
 4  0x000000000046f44e in runtime.sigtrampgo
    at <autogenerated>:1
 5  0x000000000046e8fd in runtime.sigtramp
    at /usr/local/go/src/runtime/sys_linux_amd64.s:361
 6  0x00007fa1b18df8e0 in ???
    at ?:-1
 7  0x0000000000439c49 in runtime.crash
    at /usr/local/go/src/runtime/signal_unix.go:944
 8  0x0000000000439c49 in runtime.fatalpanic
    at /usr/local/go/src/runtime/panic.go:1092
 9  0x0000000000439417 in runtime.gopanic
    at /usr/local/go/src/runtime/panic.go:941
10  0x0000000000869d52 in net/http.(*Transport).decConnsPerHost
    at /usr/local/go/src/net/http/transport.go:1475
11  0x000000000086472f in net/http.(*Transport).roundTrip
    at /usr/local/go/src/net/http/transport.go:604
12  0x000000000084c339 in net/http.(*Transport).RoundTrip
    at /usr/local/go/src/net/http/roundtrip.go:17
13  0x0000000000808d58 in net/http.send
    at /usr/local/go/src/net/http/client.go:252
14  0x00000000008085fb in net/http.(*Client).send
    at /usr/local/go/src/net/http/client.go:176
15  0x000000000080aa35 in net/http.(*Client).do
    at /usr/local/go/src/net/http/client.go:725
16  0x000000000101a6c9 in net/http.(*Client).Do
    at /usr/local/go/src/net/http/client.go:593
(dlv) frame 11
(dlv) p pconn.cacheKey
net/http.connectMethodKey {
        proxy: "http://proxy-server:3128",
        scheme: "https",
        addr: "real-endpoint-which-caused-crash:443",
        onlyH1: false,}

(dlv) frame 10
(dlv) p t.connsPerHost
map[net/http.connectMethodKey]int [
        // ignore unrelated connectMethods
        {proxy: "http://proxy-server:3128", scheme: "https", addr: "real-endpoint-which-caused-crash:443", onlyH1: false}: 1, 
]

It seems a little weird here, panic on net/http/transport.go:1475, but by printing t.connsPerHost, it shows that it should be 1?

	t.connsPerHostMu.Lock()
	defer t.connsPerHostMu.Unlock()
	n := t.connsPerHost[key]
	if n == 0 {
		// Shouldn't happen, but if it does, the counting is buggy and could
		// easily lead to a silent deadlock, so report the problem loudly.
		panic("net/http: internal error: connCount underflow")
	}

Note: n and key seem optimized out here? print them return as: unreadable could not find loclist entry..., and GDB show a large number:

(gdb) p n
$2 = 825625303360

Will try to reproduce it off the Production Environment, and then with -race and the latest Go Version. I have noticed there are some issues but closed already: #34941, #38172. Any comments and suggestions would be appreciated.

@mauri870
Copy link
Member

Running with -race should shine some light. If you manage to put together a reproducible example please share.

@bcmills
Copy link
Member

bcmills commented Jul 20, 2023

go version go1.18.3 linux/amd64

Per the release policy Go 1.18 has been unsupported since 2023-02-01 (when Go 1.20 was released), and there have been quite a few fixes in the net/http package since 1.18 was released.

Does this reproduce with a supported (1.20.6 or 1.19.11) or upcoming (1.21rc3) release?

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 20, 2023
@bcmills
Copy link
Member

bcmills commented Jul 20, 2023

(CC @neild)

@zzkcode
Copy link
Contributor Author

zzkcode commented Jul 20, 2023

@bcmills Thanks. We encountered this problem while we pilot MaxConnsPerHost on Production Environment, and we are still using go1.18.3 on Production. Will try to reproduce this problem first, and see if it could reproduce on the lastest Go.

What seems weird to me is the info that I show on the What did you see instead?, which appears like should not crash at all. Is it possible that while it panic on frame 10, and runs defer t.connsPerHostMu.Unlock(), then another goroutine happens to acquire the lock and add it to 1?

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 22, 2023
@bcmills
Copy link
Member

bcmills commented Aug 7, 2023

See previously #38172. @zzkcode, have you tried running a copy of your program built with the -race option? If so, does it report any races?

@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Aug 7, 2023
@zzkcode
Copy link
Contributor Author

zzkcode commented Aug 9, 2023

Hi all, here are some latest progress:

See previously #38172. @zzkcode, have you tried running a copy of your program built with the -race option? If so, does it report any races?

@bcmills Thanks, we haven't tried -race yet since it will downgrade performance, but will do once have a reproducer.

go version go1.18.3 linux/amd64

Per the release policy Go 1.18 has been unsupported since 2023-02-01 (when Go 1.20 was released), and there have been quite a few fixes in the net/http package since 1.18 was released.

Does this reproduce with a supported (1.20.6 or 1.19.11) or upcoming (1.21rc3) release?

@bcmills We upgrade go from 1.18.3 to 1.20.6, and this issue still exists.

panic: net/http: internal error: connCount underflow

goroutine 33385383 [running]:
panic({0x1289e40, 0x1852220})
        /usr/local/go/src/runtime/panic.go:987 +0x3bb fp=0xc03fb9f4b0 sp=0xc03fb9f3f0 pc=0x43a67b
net/http.(*Transport).decConnsPerHost(0xc00033c280, {{0xc0799dc080, 0x40}, {0xc078329880, 0x5}, {0xc0991c60c0, 0x21}, 0x0})
        /usr/local/go/src/net/http/transport.go:1482 +0x40f fp=0xc03fb9f550 sp=0xc03fb9f4b0 pc=0x8957cf
net/http.(*Transport).roundTrip(0xc00033c280, 0xc037407d00)
        /usr/local/go/src/net/http/transport.go:613 +0x8cb fp=0xc03fb9f778 sp=0xc03fb9f550 pc=0x89062b
net/http.(*Transport).RoundTrip(0x0?, 0x1855e60?)
        /usr/local/go/src/net/http/roundtrip.go:17 +0x19 fp=0xc03fb9f798 sp=0xc03fb9f778 pc=0x878119
net/http.send(0xc037407d00, {0x1855e60, 0xc00033c280}, {0x8?, 0x14aa2e0?, 0x0?})
        /usr/local/go/src/net/http/client.go:252 +0x5f7 fp=0xc03fb9f990 sp=0xc03fb9f798 pc=0x833c57
net/http.(*Client).send(0xc001a78510, 0xc037407d00, {0xc002a53ba5?, 0x15?, 0x0?})
        /usr/local/go/src/net/http/client.go:176 +0x9b fp=0xc03fb9fa08 sp=0xc03fb9f990 pc=0x8334db
net/http.(*Client).do(0xc001a78510, 0xc037407d00)
        /usr/local/go/src/net/http/client.go:716 +0x8fb fp=0xc03fb9fc10 sp=0xc03fb9fa08 pc=0x8357db
net/http.(*Client).Do(...)
        /usr/local/go/src/net/http/client.go:582
(dlv) bt
 0  0x00000000004729a1 in runtime.raise
    at /usr/local/go/src/runtime/sys_linux_amd64.s:154
 1  0x0000000000451e05 in runtime.dieFromSignal
    at /usr/local/go/src/runtime/signal_unix.go:883
 2  0x0000000000452486 in runtime.sigfwdgo
    at /usr/local/go/src/runtime/signal_unix.go:1096
 3  0x0000000000450aa7 in runtime.sigtrampgo
    at /usr/local/go/src/runtime/signal_unix.go:432
 4  0x0000000000472ca6 in runtime.sigtramp
    at /usr/local/go/src/runtime/sys_linux_amd64.s:354
 5  0x00007f77d95ad8e0 in ???
    at ?:-1
 6  0x000000000043af89 in runtime.crash
    at /usr/local/go/src/runtime/signal_unix.go:975
 7  0x000000000043af89 in runtime.fatalpanic
    at /usr/local/go/src/runtime/panic.go:1172
 8  0x000000000043a67b in runtime.gopanic
    at /usr/local/go/src/runtime/panic.go:987
 9  0x00000000008957cf in net/http.(*Transport).decConnsPerHost
    at /usr/local/go/src/net/http/transport.go:1482
10  0x000000000089062b in net/http.(*Transport).roundTrip
    at /usr/local/go/src/net/http/transport.go:613
11  0x0000000000878119 in net/http.(*Transport).RoundTrip
    at /usr/local/go/src/net/http/roundtrip.go:17
12  0x0000000000833c57 in net/http.send
    at /usr/local/go/src/net/http/client.go:252
13  0x00000000008334db in net/http.(*Client).send
    at /usr/local/go/src/net/http/client.go:176
14  0x00000000008357db in net/http.(*Client).do
    at /usr/local/go/src/net/http/client.go:716
15  0x0000000001054053 in net/http.(*Client).Do
    at /usr/local/go/src/net/http/client.go:582

(dlv) frame 10

(dlv) p err
error(net/http.http2noCachedConnError) {}

(dlv) p pconn.cacheKey
net/http.connectMethodKey {
        proxy: "http://proxy-host:1234",
        scheme: "https",
        addr: "real-endpoint:443",
        onlyH1: false,}

By printing out the t.connsPerHost, we don't see the same pconn.cacheKey in it.

And please let me know if more details about it I could provide.

@zzkcode
Copy link
Contributor Author

zzkcode commented Aug 10, 2023

Hi @bcmills @mauri870, we ran with -race, and there is nothing related to this. Only panic and Crash Traceback.

panic: net/http: internal error: connCount underflow

goroutine 41240143 [running]:
panic({0x1bd7b00, 0x2c8b0b0})
	/usr/local/go/src/runtime/panic.go:987 +0x3bb fp=0xc005fb4df8 sp=0xc005fb4d38 pc=0x44b0bb
net/http.(*Transport).decConnsPerHost(0xc00049a3c0, {{0xc04f629a80, 0x40}, {0xc03ab6d280, 0x5}, {0xc03f468240, 0x24}, 0x0})
	/usr/local/go/src/net/http/transport.go:1482 +0x818 fp=0xc005fb4f90 sp=0xc005fb4df8 pc=0xb4af78
net/http.(*Transport).roundTrip(0xc00049a3c0, 0xc03dad5800)
	/usr/local/go/src/net/http/transport.go:613 +0xe65 fp=0xc005fb52d0 sp=0xc005fb4f90 pc=0xb423c5
net/http.(*Transport).RoundTrip(0xc03dad5800?, 0x2c8ed00?)
	/usr/local/go/src/net/http/roundtrip.go:17 +0x37 fp=0xc005fb5318 sp=0xc005fb52d0 pc=0xb1a057
net/http.send(0xc03dad5800, {0x2c8ed00, 0xc00049a3c0}, {0x8?, 0xc001d81400?, 0x0?})
	/usr/local/go/src/net/http/client.go:252 +0x943 fp=0xc005fb55f0 sp=0xc005fb5318 pc=0xaab2a3
net/http.(*Client).send(0xc00122f650, 0xc03dad5800, {0x0?, 0x487077?, 0x0?})
	/usr/local/go/src/net/http/client.go:176 +0x165 fp=0xc005fb56a8 sp=0xc005fb55f0 pc=0xaaa6a5
net/http.(*Client).do(0xc00122f650, 0xc03dad5800)
	/usr/local/go/src/net/http/client.go:716 +0x10df fp=0xc005fb5a10 sp=0xc005fb56a8 pc=0xaae2bf
net/http.(*Client).Do(...)
	/usr/local/go/src/net/http/client.go:582

...

@mauri870
Copy link
Member

@zzkcode Thanks for taking the time to upgrade the Go version as well as running with -race. Let's wait till someone can look into this. Also check updates on #61794

@zzkcode
Copy link
Contributor Author

zzkcode commented Aug 10, 2023

@mauri870 Thanks, the lastest crash which ran with -race we found there are thounsands of reqs from same host that cause crash. Didn't check this previous times, and will try to make a simple reproducer. FYI @bcmills

We send reqs through a foward-proxy, and part of the http.Transport Configuration:

t := &http.Transport {
    DisableKeepAlives:     false,
    DisableCompression:    false,
    ForceAttemptHTTP2:     true,
    MaxConnsPerHost:       5000,
    MaxIdleConns:          10000,
    MaxIdleConnsPerHost:   2000,
    IdleConnTimeout:       0 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Aug 10, 2023
@bcmills bcmills added this to the Backlog milestone Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants