Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: connCount underflow panic with MaxConnsPerHost #38172

Closed
andybalholm opened this issue Mar 31, 2020 · 19 comments
Closed

net/http: connCount underflow panic with MaxConnsPerHost #38172

andybalholm opened this issue Mar 31, 2020 · 19 comments

Comments

@andybalholm
Copy link
Contributor

@andybalholm andybalholm commented Mar 31, 2020

I'm splitting this off from #34941 because it seems to be a separate issue, although with the same symptom.

$ go version
go version go1.14 darwin/amd64

I compiled this binary on macOS, but I'm running it on Linux amd64.

Here is the stack trace:

panic: net/http: internal error: connCount underflow

goroutine 2114 [running]:
net/http.(*Transport).decConnsPerHost(0x1094ca0, 0x0, 0x0, 0xc001f110b5, 0x4, 0xc001a27320, 0x14, 0x0)
		/usr/local/go/src/net/http/transport.go:1391 +0x590
net/http.(*persistConn).closeLocked(0xc0000cd320, 0xb8a620, 0xc000096b60)
		/usr/local/go/src/net/http/transport.go:2584 +0xf5
net/http.(*persistConn).close(0xc0000cd320, 0xb8a620, 0xc000096b60)
		/usr/local/go/src/net/http/transport.go:2574 +0x83
net/http.(*persistConn).readLoop.func1(0xc0000cd320, 0xc0073b4d60)
		/usr/local/go/src/net/http/transport.go:1946 +0x41
net/http.(*persistConn).readLoop(0xc0000cd320)
		/usr/local/go/src/net/http/transport.go:2122 +0x12ae
created by net/http.(*Transport).dialConn
		/usr/local/go/src/net/http/transport.go:1647 +0xc56

The usage is in a forward proxy server (github.com/andybalholm/redwood). We have about 2000 users running through the server I have been watching these errors on the most closely. So the usage is heavy, and probably about as close to random as you are likely to find.

I thought that switching to two Transports (one for HTTP/1.1 and one for HTTP/2) instead of one http.Transport that was configured to automatically switch protocols would get rid of the panics. But it didn't seem to help. (So it seems that the panic doesn't depend on having HTTP/2 enabled on the Transport.)

I had Transport.MaxConnsPerHost set to 8. Setting it to 0 (unlimited) made the panics go away.

@andybalholm
Copy link
Contributor Author

@andybalholm andybalholm commented Mar 31, 2020

I just noticed that the binary that stack trace is from was compiled with Go 1.14, not Go 1.14.1 like I had thought originally. But we got the same panics with a binary compiled on Linux with Go 1.14.1.

@andybalholm andybalholm changed the title net/http: `connCount underflow` panic with MaxConnsPerHost net/http: connCount underflow panic with MaxConnsPerHost Mar 31, 2020
@andybons andybons added this to the Unplanned milestone Apr 1, 2020
@andybons
Copy link
Member

@andybons andybons commented Apr 1, 2020

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 1, 2020

@andybalholm, could you try testing with a go build from head, or patching in CL 220905? I wonder if this is another symptom of the bug that was fixed there.

@bcmills bcmills modified the milestones: Unplanned, Backlog Apr 1, 2020
@andybalholm
Copy link
Contributor Author

@andybalholm andybalholm commented Apr 1, 2020

Building with

go version devel +620208790e Wed Apr 1 19:32:57 2020 +0000 darwin/amd64

I still get the panic:

panic: net/http: internal error: connCount underflow

goroutine 67124 [running]:
net/http.(*Transport).decConnsPerHost(0x108df80, 0x0, 0x0, 0xc00b2800e4, 0x4, 0xc002061900, 0x17, 0x0)
	/usr/local/go/src/net/http/transport.go:1388 +0x590
net/http.(*persistConn).closeLocked(0xc00af806c0, 0xb847e0, 0xc00006cb60)
	/usr/local/go/src/net/http/transport.go:2586 +0xf5
net/http.(*persistConn).close(0xc00af806c0, 0xb847e0, 0xc00006cb60)
	/usr/local/go/src/net/http/transport.go:2576 +0x83
net/http.(*persistConn).closeConnIfStillIdle(0xc00af806c0)
	/usr/local/go/src/net/http/transport.go:1890 +0xcb
created by time.goFunc
	/usr/local/go/src/time/sleep.go:169 +0x44
@bcmills bcmills removed the WaitingForInfo label Apr 1, 2020
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Apr 2, 2020

@bcmills This is an http request issue not http2.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 7, 2020

@andybalholm, could you share some more detail about your usage? (Any unusual Transport customizations, proxies, or anything like that?)

@rsc
Copy link
Contributor

@rsc rsc commented Apr 7, 2020

If you have a bunch of these panic traces, are there any that don't have that stack sequence (decConnsPerHost called from closeLocked called from close)?

@andybalholm
Copy link
Contributor Author

@andybalholm andybalholm commented Apr 7, 2020

@bcmills Here is my transport:

var dialer = &net.Dialer{
	Timeout:   30 * time.Second,
	KeepAlive: 30 * time.Second,
	DualStack: true,
}

var httpTransport = &http.Transport{
	Proxy:                 http.ProxyFromEnvironment,
	DialContext:           dialer.DialContext,
	TLSHandshakeTimeout:   10 * time.Second,
	ExpectContinueTimeout: 1 * time.Second,
}

The environment doesn't specify a proxy. I call http2.ConfigureTransport(httpTransport) elsewhere in the code. (But not making that call doesn't fix the problem.)

@andybalholm
Copy link
Contributor Author

@andybalholm andybalholm commented Apr 7, 2020

@rsc All the stack traces seem to have that sequence. And close is being called from either net/http.(*persistConn).readLoop.func1 or net/http.(*persistConn).closeConnIfStillIdle.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Apr 8, 2020

One thing I realize is that we dont check the return of removeIdleConnLocked during closeConnIfStillIdle before we blindly close the persistConn.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Apr 8, 2020

More interesting is that the order of pc.close() and removeIdleConn vary which means we could easily get into this situation.

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Apr 9, 2020

@andybalholm I am confused. You mention that you set MaxConnsPerHost = 8, but the above does not show you setting it. Where/When did you do that.

@andybalholm
Copy link
Contributor Author

@andybalholm andybalholm commented Apr 9, 2020

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 9, 2020

@andybalholm, do you set those after the Transport is already in use? I don't believe that's safe in general. (It wouldn't surprise me if it's even a data race.)

What happens if you run your program under the race detector?

@andybalholm
Copy link
Contributor Author

@andybalholm andybalholm commented Apr 9, 2020

@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Apr 9, 2020

Can you run with -race? That would catch this pretty fast if it was done later.

@andybalholm
Copy link
Contributor Author

@andybalholm andybalholm commented Apr 10, 2020

I can't run it with -race on the server where I was getting the panics; it's too heavily loaded. But I tried it on a local server with just me proxied through it, and that was enough:

WARNING: DATA RACE
Read at 0x000001587540 by goroutine 87:
  net/http.(*Transport).onceSetNextProtoDefaults()
      /usr/local/go/src/net/http/transport.go:344 +0x61
  net/http.(*Transport).onceSetNextProtoDefaults-fm()
      /usr/local/go/src/net/http/transport.go:343 +0x41
  sync.(*Once).doSlow()
      /usr/local/go/src/sync/once.go:66 +0x103
  sync.(*Once).Do()
      /usr/local/go/src/sync/once.go:57 +0x68
  net/http.(*Transport).roundTrip()
      /usr/local/go/src/net/http/transport.go:485 +0x8b
  net/http.(*Transport).RoundTrip()
      /usr/local/go/src/net/http/roundtrip.go:17 +0x42
  main.proxyHandler.ServeHTTP()
      /root/src/redwood/proxy.go:328 +0x17fb
  main.(*proxyHandler).ServeHTTP()
      <autogenerated>:1 +0x114
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2807 +0xce
  net/http.initALPNRequest.ServeHTTP()
      /usr/local/go/src/net/http/server.go:3381 +0xfc
  net/http.(*initALPNRequest).ServeHTTP()
      <autogenerated>:1 +0xa6
  net/http.Handler.ServeHTTP-fm()
      /usr/local/go/src/net/http/server.go:87 +0x64
  golang.org/x/net/http2.(*serverConn).runHandler()
      /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190213061140-3a22650c66bd/http2/server.go:2095 +0xc0

Previous write at 0x000001587540 by main goroutine:
  golang.org/x/net/http2.configureTransport()
      /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190213061140-3a22650c66bd/http2/transport.go:174 +0x420
  golang.org/x/net/http2.ConfigureTransport()
      /root/go/pkg/mod/golang.org/x/net@v0.0.0-20190213061140-3a22650c66bd/http2/transport.go:136 +0x7ca
  main.main()
      /root/src/redwood/redwood.go:88 +0x7ab

It's been several years since I tried running with -race. I didn't find anything then, and I figured only a heavily-loaded production server would have enough concurrency to expose races if there were any. But now I know that running with -race locally is worth doing.

Reordering the startup steps gets rid of the race.

@mrjrieke
Copy link

@mrjrieke mrjrieke commented Aug 12, 2020

Running go 1.14.6 or 1.14.7

Since this is a different code path and happening during automatic cleanup, seems like this shouldn't happen...

panic: net/http: internal error: connCount underflow
goroutine 25 [running]:
net/http.(*Transport).decConnsPerHost(0xc0004ba280, 0x0, 0x0, 0x7fae7523e750, 0x5, 0xc0004af9e0, 0x18, 0x0)
/usr/local/go/src/net/http/transport.go:1391 +0x590
net/http.(*persistConn).closeLocked(0xc0000a4000, 0x132adcf00, 0xc000118c50)
/usr/local/go/src/net/http/transport.go:2584 +0xf5
net/http.(*persistConn).close(0xc0000a4000, 0x132adcf00, 0xc000118c50)
/usr/local/go/src/net/http/transport.go:2574 +0x83
net/http.(*persistConn).closeConnIfStillIdle(0xc0000a4000)
/usr/local/go/src/net/http/transport.go:1888 +0xcb
created by time.goFunc
/usr/local/go/src/time/sleep.go:168 +0x44

Should I open a new issue for this?

@networkimprov
Copy link

@networkimprov networkimprov commented Aug 12, 2020

@mrjrieke please try your code with the race detector, as that proved to be the problem above. If that doesn't reveal a mistake, please file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.