Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: apparent deadlock in TestSignalM #35276

Closed
bcmills opened this issue Oct 31, 2019 · 10 comments
Closed

runtime: apparent deadlock in TestSignalM #35276

bcmills opened this issue Oct 31, 2019 · 10 comments
Assignees
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 31, 2019

From darwin-386-10_14 (https://build.golang.org/log/b5777ae134f6711c97b26f42b83697bc014a335b):

panic: test timed out after 5m0s

[…]

goroutine 63722 [syscall, 4 minutes, locked to thread]:
runtime.notetsleepg(0x383998, 0x3b9aca00, 0x0, 0x489008)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/lock_sema.go:286 +0x2e fp=0x11c0e798 sp=0x11c0e774 pc=0xad1e
runtime.WaitForSigusr1(0x11c0e7d4, 0x3b9aca00, 0x0, 0x11c0e7e0, 0x1c8001, 0x34dc8, 0x200534)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/export_unix_test.go:47 +0x9d fp=0x11c0e7b8 sp=0x11c0e798 pc=0x56ddd
runtime_test.TestSignalM.func1(0x119a9680, 0x1146ca60, 0x1146ca68, 0x1146ca70)
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/crash_unix_test.go:321 +0x56 fp=0x11c0e7e0 sp=0x11c0e7b8 pc=0x17b3b6
runtime.goexit()
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/asm_386.s:1337 +0x1 fp=0x11c0e7e4 sp=0x11c0e7e0 pc=0x5c631
created by runtime_test.TestSignalM
	/var/folders/9w/4l2_g3kx01x199n37fbmv3s80000gn/T/workdir-host-darwin-10_14/go/src/runtime/crash_unix_test.go:319 +0xbd

CC @cherrymui @mknyszek

@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 31, 2019

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 2, 2019

Change https://golang.org/cl/204620 mentions this issue: runtime: remove write barrier in WaitForSigusr1

gopherbot pushed a commit that referenced this issue Nov 6, 2019
WaitForSigusr1 registers a callback to be called on SIGUSR1 directly
from the runtime signal handler. Currently, this callback has a write
barrier in it, which can crash with a nil P if the GC is active and
the signal arrives on an M that doesn't have a P.

Fix this by recording the ID of the M that receives the signal instead
of the M itself, since that's all we needed anyway. To make sure there
are no other problems, this also lifts the callback into a package
function and marks it "go:nowritebarrierrec".

Fixes #35248.

Updates #35276, since in principle a write barrier at exactly the
wrong time while entering the scheduler could cause issues, though I
suspect that bug is unrelated.

Change-Id: I47b4bc73782efbb613785a93e381d8aaf6850826
Reviewed-on: https://go-review.googlesource.com/c/go/+/204620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@aclements
Copy link
Member

@aclements aclements commented Nov 7, 2019

This has happened a few times since "runtime: remove write barrier in WaitForSigusr1" landed, so that CL clearly didn't fix it (it was only a guess that it might).

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

2019-10-29T14:22:15-a0c1e8d/darwin-386-10_14
2019-10-29T14:22:15-a0c1e8d/darwin-amd64-10_12
2019-10-29T22:56:47-15ea61c/darwin-amd64-10_12
2019-10-30T14:51:03-cc4b824/darwin-386-10_14
2019-10-30T20:46:44-81a74b4/aix-ppc64
2019-10-30T20:46:44-81a74b4/dragonfly-amd64
2019-10-31T17:21:56-48c0cef/darwin-amd64-nocgo
2019-11-01T07:59:09-5b31021/dragonfly-amd64
2019-11-01T14:19:56-8405cd3/darwin-amd64-nocgo
2019-11-01T21:41:41-9bde9b4/darwin-386-10_14
2019-11-02T03:00:20-1e4a358/darwin-arm64-corellium
2019-11-02T05:36:43-971ec87/darwin-amd64-10_12
2019-11-02T15:46:47-dc0c23e/aix-ppc64
2019-11-02T21:51:09-61fa798/dragonfly-amd64
2019-11-02T21:51:21-177a36a/darwin-amd64-10_14
2019-11-03T04:58:01-32190b0/darwin-amd64-10_14
2019-11-03T05:01:00-4497d7e/darwin-amd64-10_12
2019-11-04T01:28:26-6b67f7d/darwin-amd64-10_14
2019-11-04T21:30:29-5a7c571/darwin-386-10_14
2019-11-04T23:16:09-2566e21/dragonfly-amd64
2019-11-05T02:54:42-0f992b9/darwin-386-10_14
2019-11-05T16:31:48-414c1d4/dragonfly-amd64
2019-11-05T19:06:28-7e71c9c/aix-ppc64
2019-11-05T19:06:28-7e71c9c/darwin-386-10_14
2019-11-05T19:06:28-7e71c9c/dragonfly-amd64
2019-11-06T09:09:21-0ea7440/dragonfly-amd64
2019-11-06T09:09:59-0c5d545/dragonfly-amd64
2019-11-06T23:48:45-6ce4384/darwin-386-10_14
2019-11-07T06:58:44-f5949b6/darwin-amd64-nocgo
2019-11-07T08:25:32-7a2baa9/darwin-amd64-10_14
2019-11-07T16:35:53-b3a3613/darwin-386-10_14
2019-11-07T19:14:27-7331708/darwin-amd64-10_14
2019-11-07T19:18:59-a96cfa7/darwin-amd64-nocgo
2019-11-07T21:59:16-4cde749/darwin-amd64-nocgo
2019-11-07T22:17:54-52d5e76/darwin-386-10_14
2019-11-07T22:17:54-52d5e76/darwin-amd64-10_12

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

The failures are only on Darwin, Dragonfly, and AIX.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

OK, I think I see what the problem is, at least on Darwin. We are calling notewakeup from a signal handler. But on Darwin the note functions are not async-signal-safe. We already had to work around this once before, for #31264.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 8, 2019

Change https://golang.org/cl/206078 mentions this issue: runtime: use pipe rather than note in TestSignalM

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 8, 2019

Change https://golang.org/cl/206077 mentions this issue: runtime: add pipe/pipe2 on Solaris

gopherbot pushed a commit that referenced this issue Nov 8, 2019
This adds pipe/pipe2 on Solaris as they exist on other Unix systems.
They were not added previously because Solaris does not need them
for netpollBreak. They are added now in preparation for using pipes
in TestSignalM.

Updates #35276

Change-Id: I53dfdf077430153155f0a79715af98b0972a841c
Reviewed-on: https://go-review.googlesource.com/c/go/+/206077
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@gopherbot gopherbot closed this in 99957b6 Nov 12, 2019
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 17, 2019

Change https://golang.org/cl/207577 mentions this issue: runtime: remove stray errno check from TestSignalM

gopherbot pushed a commit that referenced this issue Nov 18, 2019
CL 206078 introduced a stray errno check that was always false. This CL removes it.

Updates #35276

Change-Id: I6996bb595d347fe81752786a3d83d3432735c9cb
GitHub-Last-Rev: e026e71
GitHub-Pull-Request: #35650
Reviewed-on: https://go-review.googlesource.com/c/go/+/207577
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.