-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: timer self-deadlock due to preemption point #38070
Comments
@gopherbot Please open a backport to 1.14. |
This comment has been minimized.
This comment has been minimized.
Backport issue(s) opened: #38072 (for 1.14). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/225497 mentions this issue: |
Does this deadlock account for the test failures observed in #37894? |
(Looks like probably not.) |
@bcmills It looks to me like those test failures are in fact this bug. |
This comment has been minimized.
This comment has been minimized.
… timerModifying Currently if a goroutine is preempted while owning a timer in the timerModifying state, it could self-deadlock. When the goroutine is preempted and calls into the scheduler, it could call checkTimers. If checkTimers encounters the timerModifying timer and calls runtimer on it, then runtimer will spin, waiting for that timer to leave the timerModifying state, which it never will. So far we got lucky that for the most part that there were no preemption points while timerModifying is happening, however CL 221077 seems to have introduced one, leading to sporadic self-deadlocks. This change disables preemption explicitly while a goroutines holds a timer in timerModifying. Since only checkTimers (and thus runtimer) is called from the scheduler, this is sufficient to prevent preemption-based self-deadlocks. For #38070 Fixes #38072 Change-Id: Idbfac310889c92773023733ff7e2ff87e9896f0c Reviewed-on: https://go-review.googlesource.com/c/go/+/225497 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> (cherry picked from commit e8be350) Reviewed-on: https://go-review.googlesource.com/c/go/+/225521 Run-TryBot: Ian Lance Taylor <iant@golang.org>
@mknyszek, thanks for debugging it and fixing it so quickly. |
I think I've hit the same bug in a long-running server with many thousands of goroutines. This early morning, the process just froze forever - even a goroutine that was simply logging some trivial stats every minute or so was not responding. One of the runnable goroutines is at @mknyszek do you think it's the same bug, or should I file a new one? I should note that this is a server that works fine with Go 1.13.x, but started hanging like this since we jumped to 1.14.x. I had been testing it with 1.14.x for weeks before, but we hadn't started running the long-lived staging servers with 1.14.x.until earlier this week. |
Go 1.14 has a timer reset deadlock (golang/go#38070). This also downgrades quic-go until either a go patch release fixes this issue or a version of quic-go is released that works with go 1.13.
golang 0.14 currently has a bug that is being exposed by our latest deps golang/go#38070 resolves #60
golang 0.14 currently has a bug that is being exposed by our latest deps golang/go#38070 resolves #60
golang/go#38070 was resolved
Go 1.14 has a timer reset deadlock (golang/go#38070). This also downgrades quic-go until either a go patch release fixes this issue or a version of quic-go is released that works with go 1.13.
Go 1.14 has a timer reset deadlock (golang/go#38070). This also downgrades quic-go until either a go patch release fixes this issue or a version of quic-go is released that works with go 1.13.
Go 1.14 has a timer reset deadlock (golang/go#38070). This also downgrades quic-go until either a go patch release fixes this issue or a version of quic-go is released that works with go 1.13.
Go 1.14 has a timer reset deadlock (golang/go#38070). This also downgrades quic-go until either a go patch release fixes this issue or a version of quic-go is released that works with go 1.13.
Go 1.14 has a timer reset deadlock (golang/go#38070). This also downgrades quic-go until either a go patch release fixes this issue or a version of quic-go is released that works with go 1.13.
Go 1.14 has a timer reset deadlock (golang/go#38070). This also downgrades quic-go until either a go patch release fixes this issue or a version of quic-go is released that works with go 1.13.
The problem was discovered in kubernetes/kubernetes#88638, where Kubernetes was seeing some kubelets get wedged. They grabbed a stack trace for me, and after combing over it I summarized the problem. Below are the highlights:
(kubernetes/kubernetes#88638 (comment))
(kubernetes/kubernetes#88638 (comment))
CC @ianlancetaylor
The text was updated successfully, but these errors were encountered: