Skip to content

runtime: timer self-deadlock due to preemption point #38070

@mknyszek

Description

@mknyszek

The problem was discovered in kubernetes/kubernetes#88638, where Kubernetes was seeing some kubelets get wedged. They grabbed a stack trace for me, and after combing over it I summarized the problem. Below are the highlights:

[...] There's a self-deadlock going on in the timer system.

  1. A goroutine does: timer.Reset -> resettimer -> modtimer -> wakeNetPoller
    • At this point, the timer its resetting is in the timerModifying state.
  2. Now, at the beginning of wakeNetPoller, we get a preemption request (synchronous, so unrelated to asynchronous preemption), so we call into morestack.
  3. The chain now goes morestack -> gopreempt_m -> goschedImpl -> schedule -> checkTimers -> runtimer
  4. Now we try to run the timer we were modifying, but it's currently being modified, so we loop and osyield. It never stops being in timerModifying, though.

(kubernetes/kubernetes#88638 (comment))

[...] this can only happen if the timer is in the timerDeleted and at the top of the heap, and indeed (*time.Timer).Reset does stop (delete) the timer before resetting it. So, this doesn't imply a racy use of timers on Kubernetes' end.

Looking at the state machine in time.go again, [...] it could only ever happen with the wakeNetPoller call, because in addInitializedTimer we grab a runtime lock around cleantimers and doaddtimer, which prevents preemption.

Maybe what happened is that we accidentally inserted a preemption point? What if it was the case that in resettimer before wakeNetPoller was actually inlined, but in addInitializedTimer it's not. This means we added in a preemption point unintentionally.

One idea for the fix is to just... make sure wakeNetPoller gets inlined (and maybe https://go-review.googlesource.com/c/go/+/224902 actually does that), but that seems fragile. We should probably say that while a goroutine owns a timer in timerModifying it should not be allowed to be preempted, because it can cause an operation that waits (perhaps the same for other -ing states).

(kubernetes/kubernetes#88638 (comment))

CC @ianlancetaylor

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions