Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: apparent deadlock in TestSignalM #35276

Closed
bcmills opened this issue Oct 31, 2019 · 9 comments
Assignees
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 31, 2019

From darwin-386-10_14 (https://build.golang.org/log/b5777ae134f6711c97b26f42b83697bc014a335b):

panic: test timed out after 5m0s

[…]

goroutine 63722 [syscall, 4 minutes, locked to thread]:
runtime.notetsleepg(0x383998, 0x3b9aca00, 0x0, 0x489008)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/lock_sema.go:286 +0x2e fp=0x11c0e798 sp=0x11c0e774 pc=0xad1e
runtime.WaitForSigusr1(0x11c0e7d4, 0x3b9aca00, 0x0, 0x11c0e7e0, 0x1c8001, 0x34dc8, 0x200534)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/export_unix_test.go:47 +0x9d fp=0x11c0e7b8 sp=0x11c0e798 pc=0x56ddd
runtime_test.TestSignalM.func1(0x119a9680, 0x1146ca60, 0x1146ca68, 0x1146ca70)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/crash_unix_test.go:321 +0x56 fp=0x11c0e7e0 sp=0x11c0e7b8 pc=0x17b3b6
runtime.goexit()
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/asm_386.s:1337 +0x1 fp=0x11c0e7e4 sp=0x11c0e7e0 pc=0x5c631
created by runtime_test.TestSignalM
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/crash_unix_test.go:319 +0xbd

CC @cherrymui @mknyszek

@bcmills

This comment has been minimized.

Copy link
Member Author

@bcmills bcmills commented Oct 31, 2019

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 2, 2019

Change https://golang.org/cl/204620 mentions this issue: runtime: remove write barrier in WaitForSigusr1

gopherbot pushed a commit that referenced this issue Nov 6, 2019
WaitForSigusr1 registers a callback to be called on SIGUSR1 directly
from the runtime signal handler. Currently, this callback has a write
barrier in it, which can crash with a nil P if the GC is active and
the signal arrives on an M that doesn't have a P.

Fix this by recording the ID of the M that receives the signal instead
of the M itself, since that's all we needed anyway. To make sure there
are no other problems, this also lifts the callback into a package
function and marks it "go:nowritebarrierrec".

Fixes #35248.

Updates #35276, since in principle a write barrier at exactly the
wrong time while entering the scheduler could cause issues, though I
suspect that bug is unrelated.

Change-Id: I47b4bc73782efbb613785a93e381d8aaf6850826
Reviewed-on: https://go-review.googlesource.com/c/go/+/204620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@aclements

This comment has been minimized.

Copy link
Member

@aclements aclements commented Nov 7, 2019

This has happened a few times since "runtime: remove write barrier in WaitForSigusr1" landed, so that CL clearly didn't fix it (it was only a guess that it might).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

2019-10-29T14:22:15-a0c1e8d/darwin-386-10_14
2019-10-29T14:22:15-a0c1e8d/darwin-amd64-10_12
2019-10-29T22:56:47-15ea61c/darwin-amd64-10_12
2019-10-30T14:51:03-cc4b824/darwin-386-10_14
2019-10-30T20:46:44-81a74b4/aix-ppc64
2019-10-30T20:46:44-81a74b4/dragonfly-amd64
2019-10-31T17:21:56-48c0cef/darwin-amd64-nocgo
2019-11-01T07:59:09-5b31021/dragonfly-amd64
2019-11-01T14:19:56-8405cd3/darwin-amd64-nocgo
2019-11-01T21:41:41-9bde9b4/darwin-386-10_14
2019-11-02T03:00:20-1e4a358/darwin-arm64-corellium
2019-11-02T05:36:43-971ec87/darwin-amd64-10_12
2019-11-02T15:46:47-dc0c23e/aix-ppc64
2019-11-02T21:51:09-61fa798/dragonfly-amd64
2019-11-02T21:51:21-177a36a/darwin-amd64-10_14
2019-11-03T04:58:01-32190b0/darwin-amd64-10_14
2019-11-03T05:01:00-4497d7e/darwin-amd64-10_12
2019-11-04T01:28:26-6b67f7d/darwin-amd64-10_14
2019-11-04T21:30:29-5a7c571/darwin-386-10_14
2019-11-04T23:16:09-2566e21/dragonfly-amd64
2019-11-05T02:54:42-0f992b9/darwin-386-10_14
2019-11-05T16:31:48-414c1d4/dragonfly-amd64
2019-11-05T19:06:28-7e71c9c/aix-ppc64
2019-11-05T19:06:28-7e71c9c/darwin-386-10_14
2019-11-05T19:06:28-7e71c9c/dragonfly-amd64
2019-11-06T09:09:21-0ea7440/dragonfly-amd64
2019-11-06T09:09:59-0c5d545/dragonfly-amd64
2019-11-06T23:48:45-6ce4384/darwin-386-10_14
2019-11-07T06:58:44-f5949b6/darwin-amd64-nocgo
2019-11-07T08:25:32-7a2baa9/darwin-amd64-10_14
2019-11-07T16:35:53-b3a3613/darwin-386-10_14
2019-11-07T19:14:27-7331708/darwin-amd64-10_14
2019-11-07T19:18:59-a96cfa7/darwin-amd64-nocgo
2019-11-07T21:59:16-4cde749/darwin-amd64-nocgo
2019-11-07T22:17:54-52d5e76/darwin-386-10_14
2019-11-07T22:17:54-52d5e76/darwin-amd64-10_12

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

The failures are only on Darwin, Dragonfly, and AIX.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

OK, I think I see what the problem is, at least on Darwin. We are calling notewakeup from a signal handler. But on Darwin the note functions are not async-signal-safe. We already had to work around this once before, for #31264.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 8, 2019

Change https://golang.org/cl/206078 mentions this issue: runtime: use pipe rather than note in TestSignalM

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Nov 8, 2019

Change https://golang.org/cl/206077 mentions this issue: runtime: add pipe/pipe2 on Solaris

gopherbot pushed a commit that referenced this issue Nov 8, 2019
This adds pipe/pipe2 on Solaris as they exist on other Unix systems.
They were not added previously because Solaris does not need them
for netpollBreak. They are added now in preparation for using pipes
in TestSignalM.

Updates #35276

Change-Id: I53dfdf077430153155f0a79715af98b0972a841c
Reviewed-on: https://go-review.googlesource.com/c/go/+/206077
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot gopherbot closed this in 99957b6 Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.