Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: apparent deadlock in TestCloseWrite on darwin-arm64-corellium #34837

Open
bcmills opened this issue Oct 11, 2019 · 11 comments
Open

net: apparent deadlock in TestCloseWrite on darwin-arm64-corellium #34837

bcmills opened this issue Oct 11, 2019 · 11 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 11, 2019

From the darwin-arm64-corellium builder (https://build.golang.org/log/0f26cd7aadb20043bcb06081b5b9c0a633bcb9fe):

panic: test timed out after 3m0s

goroutine 601 [running]:
testing.(*M).startAlarm.func1()
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/testing/testing.go:1377 +0xc0
created by time.goFunc
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/time/sleep.go:168 +0x38

[…]

goroutine 599 [IO wait, 2 minutes]:
internal/poll.runtime_pollWait(0x10578ce98, 0x72, 0x102f20380)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/runtime/netpoll.go:184 +0x3c
internal/poll.(*pollDesc).wait(0x130356718, 0x72, 0x0, 0x1, 0xffffffffffffffff)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/internal/poll/fd_poll_runtime.go:87 +0x30
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0x130356700, 0x13011000f, 0x1, 0x1, 0x0, 0x0, 0x0)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/internal/poll/fd_unix.go:169 +0x1b8
net.(*netFD).Read(0x130356700, 0x13011000f, 0x1, 0x1, 0x130356700, 0x102e44fd4, 0x1)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/net/fd_unix.go:202 +0x3c
net.(*conn).Read(0x130018010, 0x13011000f, 0x1, 0x1, 0x0, 0x0, 0x0)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/net/net.go:184 +0x68
net.TestCloseWrite(0x13017c600)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/net/net_test.go:151 +0x3cc
testing.tRunner(0x13017c600, 0x102f1ee80)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/testing/testing.go:909 +0xb0
created by testing.(*T).Run
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/testing/testing.go:960 +0x29c

[…]

FAIL	net	180.181s

CC @mikioh @bradfitz @ianlancetaylor

@bcmills bcmills added this to the Go1.14 milestone Oct 11, 2019
@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 11, 2019

Another one on darwin-arm64-corellium. Is it possible that this is a recent regression?

https://build.golang.org/log/175d78eebd4aa686657b7faf57755ff9ee52d02e

@odeke-em

This comment was marked as off-topic.

@bcmills

This comment was marked as off-topic.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 11, 2019

I don't really see how but it's conceivable that this is a recent regression due to https://golang.org/cl/197938.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Oct 11, 2019

The test is both fairly straightforward and not all that important. If someone wants to debug it, great, but I would be inclined to just skip it on darwim/arm64.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 5, 2019

Hasn't happened since November 7. I'm calling this fixed.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Mar 16, 2020

2020-03-14T04:12:41-70dc28f/darwin-arm64-corellium
2020-03-03T19:53:02-24343cb/darwin-arm64-corellium
2020-02-27T21:24:58-1c4e515/darwin-arm64-corellium

Given the apparent slowness of the network stack on this builder (#37322, #35498, and others), I wonder if this test is deadlocking due to a race that the other builders just aren't slow enough to trigger.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Mar 16, 2020

Or, perhaps the test is timing out somewhere and written in such a way that timeouts manifest as deadlocks?

@bcmills bcmills modified the milestones: Go1.14, Unplanned Apr 8, 2020
@bcmills bcmills added the OS-Darwin label Apr 8, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 8, 2020

Change https://golang.org/cl/227588 mentions this issue: net: convert many Close tests to use parallel subtests

gopherbot pushed a commit that referenced this issue Apr 9, 2020
Also set a deadline in TestCloseWrite so that we can more easily
determine which kind of connection is getting stuck on the
darwin-arm64-corellium builder (#34837).

Change-Id: I8ccacbf436e8e493fb2298a79b17e0af8fc6eb81
Reviewed-on: https://go-review.googlesource.com/c/go/+/227588
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bcmills
Copy link
Member Author

@bcmills bcmills commented Apr 14, 2020

Looks like at least the tcp stack is affected (2020-04-13T21:56:15-1b15c7f/darwin-arm64-corellium):

--- FAIL: TestCloseWrite (0.00s)
    --- FAIL: TestCloseWrite/tcp (158.92s)
        net_test.go:172: got (0, read tcp 127.0.0.1:58175->127.0.0.1:58174: i/o timeout); want (0, io.EOF)
        net_test.go:112: got (0, read tcp4 127.0.0.1:58174->127.0.0.1:58175: i/o timeout); want (0, io.EOF)
FAIL
FAIL	net	162.420s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.