-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: on NetBSD, *Process.Wait sometimes deadlocks after cmd.Process.Signal returns "process already finished" #48789
Comments
@bsiegert, @coypoop: do you know who might be able to look into this on the NetBSD side? @ianlancetaylor, @tklauser: is this possibly related to #13987? |
#44801 may be related, in that it involves a hang in |
I see two possible sequences that could lead to this. First, Second, Neither should be possible. The first case seems more likely. If we don't see a response from somebody familiar with NetBSD we should probably move the |
The first case seems more likely to me too, given that the failure seems to have started occuring only after https://golang.org/cl/315281 was submitted on 2021-05-02. I'll send a CL to move the |
Change https://golang.org/cl/354249 mentions this issue: |
CL 315281 changed the os package use wait6 on netbsd. This seems to be causing frequent test failures as reported in #48789. Revert that change using wait6 on netbsd for now. Updates #13987 Updates #16028 For #48789 Change-Id: Ieddffc65611c7f449971eaa8ed6f4299a5f742c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/354249 Trust: Tobias Klauser <tobias.klauser@gmail.com> Trust: Bryan C. Mills <bcmills@google.com> Trust: Benny Siegert <bsiegert@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bcmills Have you seen more deadlocks since the commit above landed at the beginning of October? |
Looks like the deadlocks ended at that CL. 👍 The only failures I can find involving
2021-11-05T16:51:14-c58417b/netbsd-amd64-9_0 |
We believe that the
2021-12-14T01:48:22-1afa432/netbsd-amd64-9_0-n2
|
Hmm, maybe not! That send is here: which is not on the |
Change https://go.dev/cl/431855 mentions this issue: |
Resend of CL 315281 which was partially reverted by CL 354249 after the original CL was suspected to cause test failures as reported in #48789. It seems that both wait4 and wait6 lead to that particular deadlock, so let's use wait6. That way we at least don't hit #13987 on netbsd. Updates #13987 For #48789 For #50138 Change-Id: Iadc4a771217b7e9e821502e89afa07036e0dcb6f Reviewed-on: https://go-review.googlesource.com/c/go/+/431855 Reviewed-by: Benny Siegert <bsiegert@gmail.com> Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Change https://go.dev/cl/483396 mentions this issue: |
CL 431855 changed (*Process).blockUntilWaitable on netbsd to use wait6 again. Update #48789 Change-Id: I948f5445a44ab2e82c02560480a2a244d2b5f473 Reviewed-on: https://go-review.googlesource.com/c/go/+/483396 Reviewed-by: Benny Siegert <bsiegert@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
I've noticed a recurring pattern in
cmd/go
test failures on NetBSD builders, and I believe that it indicates a bug in eitheros.Process
or the kernel itself on that platform.The
cmd/go
tests start processes running “in the background” usingos/exec
, and use awaitOrStop
function to terminate any remaining processes at the conclusion of each test.The
waitOrStop
function starts a goroutine, then blocks oncmd.Wait()
. The background goroutine blocks until either the call tocmd.Wait
completes or theContext
is canceled, then sends a signal to the process. If that signal fails withos: process already finished
, then we assume that the process actually has already finished, and the background goroutine simply blocks until the call tocmd.Wait
(inevitably) returns.That all happens here:
go/src/cmd/go/script_test.go
Lines 1164 to 1206 in a05a7d4
What I'm seeing on (some of?) the NetBSD builders is that after
cmd.Process.Signal
fails withprocess already finished
, the call tocmd.Wait
continues to block, seemingly forever.The relevant goroutine traces are:
Note that goroutine 1537 is blocked at
script_test.go:1176
, which is the send onerrc
aftercmd.Process.Signal
fails withos: process already finished
.Goroutine 1536 is blocked at the call to
cmd.Wait
, which is itself blocked onsyscall.Wait4
.The failure rate with these symptoms is fairly high: something on the order of 20 failures per month.
greplogs --dashboard -md -l -e '(?m)panic: test timed out.*(?:.*\n)*.*\[syscall, .* minutes\]:\n(?:.+\n\t.+\n)*syscall\.Wait.*\n\t.+\n(?:.+\n\t.+\n)*cmd/go_test\.waitOrStop'
2021-10-04T22:46:23-17674e2/netbsd-386-9_0
2021-10-04T18:15:09-9f8d558/netbsd-386-9_0
2021-09-30T19:56:06-eb9f090/netbsd-386-9_0
2021-09-29T15:23:27-aeb4fba/netbsd-amd64-9_0
2021-09-28T17:18:36-ff7b041/netbsd-amd64-9_0
2021-09-28T15:26:21-583eeaa/netbsd-386-9_0
2021-09-27T18:57:20-3d795ea/netbsd-amd64-9_0
2021-09-22T16:24:17-74ba70b/netbsd-amd64-9_0
2021-09-22T15:00:53-91c2318/netbsd-amd64-9_0
2021-09-21T20:39:31-48cf96c/netbsd-386-9_0
2021-09-20T23:04:13-d7e3e44/netbsd-arm64-bsiegert
2021-09-20T00:13:47-a83a558/netbsd-amd64-9_0
2021-09-17T19:32:44-74e384f/netbsd-amd64-9_0
2021-09-16T23:57:40-8d2a9c3/netbsd-386-9_0
2021-09-16T19:38:19-bcdc61d/netbsd-amd64-9_0
2021-09-10T17:11:39-5a4b9f9/netbsd-amd64-9_0
2021-09-09T16:32:28-a53e3d5/netbsd-amd64-9_0
2021-09-08T11:57:03-9295723/netbsd-386-9_0
2021-09-07T03:56:13-6226020/netbsd-386-9_0
2021-09-04T10:58:11-5ec298d/netbsd-386-9_0
2021-08-31T16:43:46-6815235/netbsd-386-9_0
2021-08-30T22:07:49-b06cfe9/netbsd-386-9_0
2021-08-27T05:13:44-2c60a99/netbsd-386-9_0
2021-08-24T22:23:12-54cdef1/netbsd-386-9_0
2021-08-23T21:22:58-8157960/netbsd-386-9_0
2021-08-23T21:22:58-8157960/netbsd-arm64-bsiegert
2021-08-22T21:43:43-1958582/netbsd-386-9_0
2021-08-20T03:25:17-c92c2c9/netbsd-386-9_0
2021-08-19T20:50:13-65074a4/netbsd-amd64-9_0
2021-08-18T21:19:22-c2bd9ee/netbsd-386-9_0
2021-08-18T20:11:28-165ebd8/netbsd-386-9_0
2021-08-17T16:22:15-cf12b0d/netbsd-386-9_0
2021-08-17T15:00:04-3001b0a/netbsd-386-9_0
2021-08-17T04:37:32-a304273/netbsd-386-9_0
2021-08-17T01:29:37-1951afc/netbsd-386-9_0
2021-08-16T18:44:38-56a919f/netbsd-386-9_0
2021-08-16T18:44:32-ff36d11/netbsd-amd64-9_0
2021-08-16T13:38:52-a192ef8/netbsd-386-9_0
2021-08-12T17:43:16-39634e7/netbsd-386-9_0
2021-08-09T20:06:35-f1dce31/netbsd-386-9_0
2021-08-06T16:51:12-70546f6/netbsd-386-9_0
2021-07-28T03:27:13-b39e0f4/netbsd-386-9_0
2021-07-15T20:39:22-0941dbc/netbsd-amd64-9_0
2021-06-29T16:57:13-3463852/netbsd-386-9_0
2021-06-28T20:51:30-956c81b/netbsd-386-9_0
2021-06-24T03:45:33-a9bb382/netbsd-386-9_0
2021-06-15T20:59:42-d77f4c0/netbsd-amd64-9_0
2021-06-11T20:31:30-16b5d76/netbsd-386-9_0
2021-06-09T17:11:44-df35ade/netbsd-386-9_0
2021-06-09T15:09:13-139e935/netbsd-386-9_0
2021-05-21T17:43:46-4fda54c/netbsd-arm64-bsiegert
2021-05-21T17:35:47-8876b9b/netbsd-386-9_0
2021-05-17T16:02:12-b1aff42/netbsd-386-9_0
2021-05-12T15:23:09-0388670/netbsd-386-9_0
2021-05-12T02:04:57-1a0ea1a/netbsd-386-9_0
2021-05-11T18:22:54-9b84814/netbsd-386-9_0
2021-05-10T13:16:56-2870259/netbsd-386-9_0
2021-05-06T16:00:55-6c591f7/netbsd-386-9_0
2020-09-19T05:13:19-ccf581f/netbsd-386-9_0
The text was updated successfully, but these errors were encountered: