Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: apparent deadlock in TestWaitGroupMisuse3 #35774

Open
bcmills opened this issue Nov 22, 2019 · 3 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 22, 2019

Looks like there has been a recent regression in sync.TestWaitGroupMisuse3.

It's not obvious to me whether this is a bug in the test itself, or a bug in the runtime (or the sync package) exposed by the test.

panic: test timed out after 3m0s

goroutine 3650 [running]:
testing.(*M).startAlarm.func1()
	/workdir/go/src/testing/testing.go:1422 +0x145
created by time.goFunc
	/workdir/go/src/time/sleep.go:168 +0x46

goroutine 1 [chan receive, 2 minutes]:
testing.(*T).Run(0xc00014c7e0, 0x60f12e, 0x14, 0x617d10, 0x5dd71d00)
	/workdir/go/src/testing/testing.go:1006 +0x68e
testing.runTests.func1(0xc00014c000)
	/workdir/go/src/testing/testing.go:1247 +0xdc
testing.tRunner(0xc00014c000, 0xc00011bc70)
	/workdir/go/src/testing/testing.go:954 +0x19c
testing.runTests(0xc00012c020, 0x746ba0, 0x1f, 0x1f, 0x7f7af6be0100)
	/workdir/go/src/testing/testing.go:1245 +0x3a5
testing.(*M).Run(0xc000148000, 0x0)
	/workdir/go/src/testing/testing.go:1162 +0x454
main.main()
	_testmain.go:198 +0xc1

goroutine 3444 [semacquire, 2 minutes]:
sync.runtime_Semacquire(0xc000466078)
	/workdir/go/src/runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc000466070)
	/workdir/go/src/sync/waitgroup.go:130 +0xd6
sync_test.TestWaitGroupMisuse3(0xc00014c7e0)
	/workdir/go/src/sync/waitgroup_test.go:174 +0x2b0
testing.tRunner(0xc00014c7e0, 0x617d10)
	/workdir/go/src/testing/testing.go:954 +0x19c
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:1005 +0x660
FAIL	sync	180.007s

2019-11-21T23:15:28-bbbc658/linux-amd64-noopt
2019-11-03T01:44:46-d2c039f/openbsd-amd64-62
2019-11-01T07:59:09-5b31021/linux-386-387

CC @aclements @ianlancetaylor @cherrymui @rhysh @CAFxX

@bcmills

This comment has been minimized.

Copy link
Member Author

@bcmills bcmills commented Nov 22, 2019

Hmm, the first failure predates CL 206180, so that can't be implicated. I don't see any other changes to sync, so this looks like a runtime bug to me.

@rhysh

This comment has been minimized.

Copy link
Contributor

@rhysh rhysh commented Nov 23, 2019

I took a shot at pinning this down, but it appears present (on linux/amd64) in go1.13.3 and as long ago as go1.7.6 (the oldest I can easily cross-compile). It's possible that it's become more frequent.

With -test.run=TestWaitGroupMisuse3 and without -test.timeout, it shows up as fatal error: all goroutines are asleep - deadlock!. For example, with go1.13.3 on a big linux/amd64 machine with GOMAXPROCS trimmed down to 4, using stress:

492232 runs so far, 5 failures

/tmp/go-stress101518475
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0000ac100, 0x5ac2f4, 0x14, 0x5b4a70, 0x483601)
	/usr/local/go/src/testing/testing.go:961 +0x377
testing.runTests.func1(0xc0000ac000)
	/usr/local/go/src/testing/testing.go:1202 +0x78
testing.tRunner(0xc0000ac000, 0xc00007edc0)
	/usr/local/go/src/testing/testing.go:909 +0xc9
testing.runTests(0xc00000c0e0, 0x6e8ac0, 0x1f, 0x1f, 0x0)
	/usr/local/go/src/testing/testing.go:1200 +0x2a7
testing.(*M).Run(0xc00009e000, 0x0)
	/usr/local/go/src/testing/testing.go:1117 +0x176
main.main()
	_testmain.go:198 +0x135

goroutine 6 [semacquire]:
sync.runtime_Semacquire(0xc000012138)
	/usr/local/go/src/runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc000012130)
	/usr/local/go/src/sync/waitgroup.go:130 +0x64
sync_test.TestWaitGroupMisuse3(0xc0000ac100)
	/usr/local/go/src/sync/waitgroup_test.go:174 +0x206
testing.tRunner(0xc0000ac100, 0x5b4a70)
	/usr/local/go/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:960 +0x350


ERROR: exit status 2
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 2, 2019

This hasn't happened since November 21. The failure was uncommon before then so this doesn't prove that it's been fixed. But it could have been fixed by https://golang.org/cl/208379, committed November 22. CC @mknyszek .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.