Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/signal: test flaky on darwin-amd64 #31264

Open
bcmills opened this Issue Apr 4, 2019 · 7 comments

Comments

Projects
None yet
3 participants
@bcmills
Copy link
Member

bcmills commented Apr 4, 2019

The os/signal test seems to time out fairly frequently on the darwin-amd64-race builder.
It's not obvious to me whether the test is deadlocked or just slow.

Example: https://build.golang.org/log/7c20aa477d782f38c6184e7f8bce05c9648b57b3

panic: test timed out after 3m0s

goroutine 20 [running]:
testing.(*M).startAlarm.func1()
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1332 +0x11c
created by time.goFunc
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/time/sleep.go:169 +0x52

goroutine 1 [chan receive]:
testing.(*T).Run(0xc0000c0000, 0x42154f9, 0x9, 0x421e888, 0x1)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:917 +0x68a
testing.runTests.func1(0xc0000c0000)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1157 +0xa7
testing.tRunner(0xc0000c0000, 0xc000046db0)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:865 +0x163
testing.runTests(0xc00000e060, 0x435ace0, 0xb, 0xb, 0x0)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1155 +0x522
testing.(*M).Run(0xc0000ba000, 0x0)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:1072 +0x2ea
main.main()
	_testmain.go:64 +0x21f

goroutine 4 [syscall]:
os/signal.signal_recv(0x4247580)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/sigqueue.go:139 +0x9f
os/signal.loop()
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal_unix.go:23 +0x30
created by os/signal.init.0
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal_unix.go:29 +0x4f

goroutine 10 [runnable]:
runtime.Gosched(...)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/proc.go:266
os/signal.signalWaitUntilIdle()
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/sigqueue.go:172 +0x31
os/signal.Stop(0xc00009c300)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal.go:196 +0x2f3
runtime.Goexit()
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/panic.go:503 +0xec
testing.(*common).FailNow(0xc0000c0300)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:609 +0x5b
testing.(*common).Fatalf(0xc0000c0300, 0x4218523, 0x16, 0xc0000e6d18, 0x1, 0x1)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:672 +0x9f
os/signal.waitSig(0xc0000c0300, 0xc00009c2a0, 0x4247580, 0x42436a8)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal_test.go:32 +0x309
os/signal.testCancel(0xc0000c0300, 0xc00003b700)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal_test.go:133 +0x1f9
os/signal.TestReset(0xc0000c0300)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal_test.go:187 +0x3e
testing.tRunner(0xc0000c0300, 0x421e888)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:865 +0x163
created by testing.(*T).Run
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/testing/testing.go:916 +0x652

goroutine 8 [runnable]:
runtime.Gosched(...)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/proc.go:266
os/signal.signalWaitUntilIdle()
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/runtime/sigqueue.go:172 +0x31
os/signal.Stop(0xc00009c180)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal.go:196 +0x2f3
os/signal.TestStress.func1(0xc000022480, 0xc0000224e0)
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal_test.go:92 +0x194
created by os/signal.TestStress
	/var/folders/dx/k53rs1s93538b4x20g46cj_w0000gn/T/workdir/go/src/os/signal/signal_test.go:79 +0xe1
FAIL	os/signal	180.028s

@bcmills bcmills added this to the Go1.13 milestone Apr 4, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

bcmills commented Apr 4, 2019

CC @ianlancetaylor for os/signal

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Apr 4, 2019

In the stack trace two goroutines are stuck in this loop in signalWaitUntilIdle (in runtime/sigqueue.go):

	for atomic.Load(&sig.delivering) != 0 {
		Gosched()
	}

sig.delivering can only be non-zero while executing sigsend, also in runtime/sigqueue.go. sigsend is invoked by the signal handler. There is no code in sigsend that blocks, and sigsend always returns with sig.delivering unchanged.

So this seems to be impossible. The failure presumably has something to do with the race detector, but I don't know why it would be Darwin specific.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Apr 5, 2019

@dvyukov for any ideas.

@bcmills

This comment has been minimized.

Copy link
Member Author

bcmills commented Apr 5, 2019

Some thoughts:

  • It could be another manifestation of the race in #20748.
  • Perhaps the fact that it's Darwin-specific has something to do with the interaction with libc. (Do we invoke sigaction via libc on Darwin now?) I seem to recall that TSAN tends to defer signal delivery until the next libc call (see #18717), so if we're hitting the TSAN hooks for signal delivery and then the test is idling in a way that doesn't ping back into libc, that could deadlock it.
@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Apr 5, 2019

Not just race, it turns out. @andybons notes that it's also here:

https://build.golang.org/log/65130fd6cf64710896cae0a569087f8fa8df66fc (darwin-amd64-10_14)

@bradfitz bradfitz changed the title os/signal: test flaky on darwin-amd64-race builder os/signal: test flaky on darwin-amd64 Apr 5, 2019

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Apr 5, 2019

And that last one is on the Go 1.12 release branch.

@randall77, @ianlancetaylor, should this block the Go 1.12.2 release?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Apr 5, 2019

It's hard to believe that this is new in 1.12.2 compared to 1.12, so I don't see a reason to block the 1.12.2 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.