-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/signal: timeout in TestAllThreadsSyscallSignals #44193
Comments
Tentatively milestoned to Go 1.16 because the function under test is new (https://tip.golang.org/doc/go1.16#syscall). |
@laboger FYI |
Please assign this to me. I have no idea what might have happened here, yet, but I'll try to figure it out. Looking at the build.golang.org status page, is this a one-off failure? |
This does seem to have occurred only once. |
Quickly looking at the code https://tip.golang.org/src/os/signal/signal_linux_test.go it looks like the test was evaluating whether or not it should be skipped at the time of the 3 minute timeout. Capturing the following just in case it otherwise gets lost at some point.
|
#42178 (comment) concerning the ppc64 build supports the detail that this code is CGO_ENABLED=0. My recollection of working on resolving that issue was that this architecture was noticeably slower overall than the systems I typically work with. This lends some support for a timeout being more likely on this architecture. However, the test should be nowhere near 3 minutes of runtime. That being said, the code in this crash trace does not appear to have timed out while running the loop inside the test, but before that loop even starts and in the code after all the syscall parts on all the threads have run and just as code is trying to unstop the world (and reenable GC). I've not yet reproduced the failure, and the build servers appear not to have failed again. But I'm still investigating. |
@AndrewGMorgan, another theory to consider: perhaps the actual slowdown was in some other |
Did this ever repeat on the build servers? Running against HEAD, the whole Compressed into one bug update, hammering on this test for many hundreds of iterations, with some It's elusive and I'm not convinced it reproduces in the same place every time (different I'm using this command to reproduce the failure (it happens after a few attempts in a row, and a pass takes about 23s, so a 40s timeout seems fair):
I'll keep trying to narrow it down, but we might want to get someone more familiar with |
Turns out it did! Just not on |
Interesting. This should be easier to debug. That being said, so far, my recipe for reproducing it does not work on this architecture for me. Just to keep all the crash dumps in the same place this
|
A modified recipe. Compile the test into a stand alone binary Variants of the above sequence don't seem to be able to break into the hung program. Very strange. |
I've found at least one issue that leads to a dead lock. It explains why I needed to use |
Tl;dr this is, indeed, a pair of bugs with the I've added a signal test ( I'm probably being a little loose with technical language here, but hopefully the following explains the problematic conditions effectively... First a recap for context:
The two issues, with the
So far, I've run my fix against
I'll give it a whirl on the |
The mentioned fix is https://go-review.googlesource.com/c/go/+/305149 . |
Change https://golang.org/cl/305149 mentions this issue: |
Since this is a bug in 1.16 should this issue be tagged with that too? |
@gopherbot Please open backport to 1.16 This bug can cause a rare deadlock if a signal is received during a call to |
Backport issue(s) opened: #45307 (for 1.16). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/316869 mentions this issue: |
…dling fixes The runtime support for syscall.AllThreadsSyscall() functions had some corner case deadlock issues when signal handling was in use. This was observed in at least 3 build test failures on ppc64 and amd64 architecture CGO_ENABLED=0 builds over the last few months. The fixes involve more controlled handling of signals while the AllThreads mechanism is being executed. Further details are discussed in bug #44193. The all-threads syscall support is new in go1.16, so earlier releases are not affected by this bug. Fixes #45307 Change-Id: I01ba8508a6e1bb2d872751f50da86dd07911a41d Reviewed-on: https://go-review.googlesource.com/c/go/+/305149 Reviewed-by: Michael Pratt <mpratt@google.com> Trust: Michael Pratt <mpratt@google.com> Trust: Ian Lance Taylor <iant@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> (cherry picked from commit 7e97e4e) Reviewed-on: https://go-review.googlesource.com/c/go/+/316869 Run-TryBot: Ian Lance Taylor <iant@golang.org>
It's not clear to me whether this is a deadlock, a livelock, or just a slow test, but the similarity to #43149 is concerning (CC @AndrewGMorgan @ianlancetaylor).
2021-02-09T18:40:13-e9c9683/linux-ppc64-buildlet
The text was updated successfully, but these errors were encountered: