-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: "spinning with local work" #10573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've seen this failure on travis as well: https://travis-ci.org/tsuru/tsuru/jobs/59940568#L575. |
I can't reproduce this locally, so debugging has been slow going. Here's what I've figured out so far. The "local work" returned by runqget always comes from p.runnext (the newly introduced field) and the run queue is otherwise empty (runqhead == runqtail). The sequence of events is something like:
Either something in the above trace is wrong, or we've somehow added something to the P's run queue in the middle of it. |
On darwin/amd64, go test -short std almost guaranteed to trigger this.
either by cmd/go or cmd/internal/gc.
|
Finally caught it red handed calling runqput between steps 2 and 3. newm calls allocm, which "borrows" the P we're about to start and uses it to allocate, which assists GC, which has some chance of finishing the mark, which ready()s the main GC goroutine, which gets put on the borrowed P's run queue. This bug was introduced in ce502b0 when I switched this synchronization from using notes to using park/ready. It doesn't actually have anything to do with the new time slice hand off like I initially thought; this would have happened even without that. Now I just have to figure out a good way to fix it. The full failure is here: https://storage.googleapis.com/go-build-log/7ab81449/windows-amd64-gce_edd15a5c.log from my debug hacks in https://go-review.googlesource.com/#/c/9331/. And the relevant part:
|
https://go-review.googlesource.com/#/c/9332/ @minux, since you seem to be able to reproduce this pretty reliably, would you mind giving that a whirl? |
I could reproduce reliably, and CL 9332 fixes it for me. |
I have run go test -short std for 10 times with the patch, none of them
fails.
|
CL https://golang.org/cl/9332 mentions this issue. |
Several of the builders have started crashing sometimes with "fatal error: spinning with local work". Here's the earliest one: http://build.golang.org/log/ed94ee0fce2dfcdf7cfca23716438f7f19596db2 (linux-386-387 on 0e6a6c5). This is presumably the fault of e870f06, which changed the scheduler to hand off the remainder of the time slice to the most recently ready()d G.
The text was updated successfully, but these errors were encountered: