Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: apparent deadlock in TestTransportMaxConnsPerHost #45570

Closed
bcmills opened this issue Apr 14, 2021 · 4 comments
Closed

net/http: apparent deadlock in TestTransportMaxConnsPerHost #45570

bcmills opened this issue Apr 14, 2021 · 4 comments
Assignees
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 14, 2021

2021-04-14T03:15:34-e7ab1a5/windows-amd64-2008

The test timed out with many copies of this goroutine stack:

goroutine 2201 [select, 5 minutes]:
net/http.(*Transport).getConn(0xc0000637c0, 0xc0003bb540, 0x0, 0xc000382300, 0x4, 0xc00038f4d0, 0xf, 0x0, 0x0, 0x0, ...)
	C:/workdir/go/src/net/http/transport.go:1370 +0x66f
net/http.(*Transport).roundTrip(0xc0000637c0, 0xc0000ebf00, 0x0, 0x0, 0x0)
	C:/workdir/go/src/net/http/transport.go:579 +0x9b0
net/http.(*Transport).RoundTrip(0xc0000637c0, 0xc0000ebf00, 0xc0000637c0, 0x0, 0x0)
	C:/workdir/go/src/net/http/roundtrip.go:18 +0x3c
net/http.send(0xc0000ebf00, 0x9bdda0, 0xc0000637c0, 0x0, 0x0, 0x0, 0xc00038a1c0, 0x0, 0x1, 0x0)
	C:/workdir/go/src/net/http/client.go:251 +0x714
net/http.(*Client).send(0xc0002b9020, 0xc0000ebf00, 0x0, 0x0, 0x0, 0xc00038a1c0, 0x0, 0x1, 0x20)
	C:/workdir/go/src/net/http/client.go:175 +0xcf
net/http.(*Client).do(0xc0002b9020, 0xc0000ebf00, 0x0, 0x0, 0x0)
	C:/workdir/go/src/net/http/client.go:723 +0xb2e
net/http.(*Client).Do(...)
	C:/workdir/go/src/net/http/client.go:591
net/http_test.TestTransportMaxConnsPerHost.func2.2()
	C:/workdir/go/src/net/http/transport_test.go:652 +0x425
net/http_test.TestTransportMaxConnsPerHost.func2.3(0xc000018300, 0xc0002b90e0)
	C:/workdir/go/src/net/http/transport_test.go:668 +0x58
created by net/http_test.TestTransportMaxConnsPerHost.func2
	C:/workdir/go/src/net/http/transport_test.go:666 +0x386

This is the first such failure in the logs since #34941 was fixed in 2019, so tentatively marking as release-blocker for Go 1.17 — it may be a regression.

CC @bradfitz @fraenkel @neild @empijei

@bcmills bcmills added this to the Go1.17 milestone Apr 14, 2021
@neild neild self-assigned this Apr 14, 2021
@fraenkel
Copy link
Contributor

@fraenkel fraenkel commented Apr 15, 2021

It is a valid issue. I will send in a fix which will probably cause some other failure if it happens again.
The current issue is that we expect 1 connection and the channel is sized to 1. If have an additional dial, we hang.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 15, 2021

Change https://golang.org/cl/310213 mentions this issue: http: allow multiple dials in TestTransportMaxConnsPerHost

@neild
Copy link
Contributor

@neild neild commented Apr 15, 2021

I have not been able to reproduce this failure after leaving a windows-amd64-2008 builder running the test on repeat for an hour. Whatever happened seems to be rare.

The problem seems to be an unexpected additional Dial call. My best guess is that the test's connection dropped for some unknown reason, resulting in a redial. The test hangs and times out in this situation, rather than reporting an error. CL 310213 fixes the hang.

I don't think this is a release blocker unless it crops up again.

gopherbot pushed a commit that referenced this issue Apr 15, 2021
If there is more than the expected single dial, the channel will block.
Allow at least one connection per client, and do the expected cleanup.

Updates #45570

Change-Id: Iaecd45298a7d7c591b7d7b1be13cea6e4a1e2e85
Reviewed-on: https://go-review.googlesource.com/c/go/+/310213
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Trust: Damien Neil <dneil@google.com>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
@neild
Copy link
Contributor

@neild neild commented Apr 15, 2021

I'm going to tentatively close this. CL 310213 should fix the test to not panic on failure, and I can't reproduce that failure at all. We can revisit if it flakes again.

@neild neild closed this Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants