-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: hang in runtime test #13642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The sleep in goroutine 232349 is only 5 ms (it's in a loop, but we limit it to 20 iterations), so while it's perfectly legitimate for the system to not have anything to do, the M blocked in timerproc's semasleep should have woken up and kicked the system back in to action. This smells like a scheduler issue to me, maybe related to the recent spinning changes? @dvyukov It's unfortunate we don't have tracebacks from the other Ms. I'm working on reproducing this, but I don't have high hopes. |
I'll see what I can do. It's possible this is fairly reproducible. For a
few weeks I've thought that test was just slow and was often killing
all.bash at that point as "good enough for local testing". I only noticed
this because I was timing my old and new laptops running ./all.bash against
each other, and the old one was still running this test when I came back
from lunch.
|
No luck reproducing this yet. It's interesting that the test didn't time itself out after 5 minutes. That suggests that the testing package's alarm is stuck in the same limbo as the test's 5ms sleep. There have been a few recent instances of this test timing out on the dashboard, but they're all 5 minute time-outs from the testing package: The most recent instances of the build system timing out this test were several months ago: |
On the other hand we know this machine is not completely healthy. Maybe
there's more wrong than a dead battery. I'll try to reproduce it overnight
with GOTRACEBACK=crash. If not, I'll close the issue.
|
One output with GOTRACEBACK=crash. Normally a run takes about 200 seconds. This one timed out (on its own) after 300 (edit: no, 1000). Because it wasn't a signal-induced exit the thread stacks were not dumped.
goroutine 232206 is calling gcMarkDone and has been preempted on entry to the function. I can't tell if things had been stuck like this for long. It's possible this is just the state at this point, but it's also possible that regular (non-timer-triggered) goroutines were not being run at all. |
Current theory is that the various hardware resets changed the Mac sleep settings. Now it goes to sleep even when plugged in. When it gets woken up, lots of time has gone by and the test "times out". Or the test gets stuck, although I can't explain that part. Anyway, closing. |
There may be a real problem. Discussion of non-rsc-broken-Mac failures contineus on #13645. |
From all.bash on my Mac at 61eb705 (Dec 14). Killed runtime.test after 30+ minutes.
The text was updated successfully, but these errors were encountered: