Skip to content

Commit

Permalink
runtime: make GODEBUG=dontfreezetheworld=1 safer
Browse files Browse the repository at this point in the history
GODEBUG=dontfreezetheworld=1 allows goroutines to continue execution
during fatal panic. This increases the chance that tracebackothers will
encounter running goroutines that it must skip, which is expected and
fine. However, it also introduces the risk that a goroutine transitions
from stopped to running in the middle of traceback, which is unsafe and
may cause traceback crashes.

Mitigate this by halting M execution if it naturally enters the
scheduler. This ensures that goroutines cannot transition from stopped
to running after freezetheworld. We simply deadlock rather than using
gcstopm to continue keeping disturbance to scheduler state to a minimum.

Change-Id: I9aa8d84abf038ae17142f34f4384e920b1490e81
Reviewed-on: https://go-review.googlesource.com/c/go/+/501255
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
  • Loading branch information
prattmic authored and gopherbot committed Jun 6, 2023
1 parent 1aaf1b2 commit 5b6e6d2
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 9 deletions.
13 changes: 7 additions & 6 deletions src/runtime/extern.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,14 @@ It is a comma-separated list of name=val pairs setting these named variables:
requires a rebuild), see https://pkg.go.dev/internal/goexperiment for details.
dontfreezetheworld: by default, the start of a fatal panic or throw
"freezes the world", stopping all goroutines, which makes it possible
to traceback all goroutines (running goroutines cannot be traced), and
"freezes the world", preempting all threads to stop all running
goroutines, which makes it possible to traceback all goroutines, and
keeps their state close to the point of panic. Setting
dontfreezetheworld=1 disables freeze, allowing goroutines to continue
executing during panic processing. This can be useful when debugging
the runtime scheduler, as freezetheworld perturbs scheduler state and
thus may hide problems.
dontfreezetheworld=1 disables this preemption, allowing goroutines to
continue executing during panic processing. Note that goroutines that
naturally enter the scheduler will still stop. This can be useful when
debugging the runtime scheduler, as freezetheworld perturbs scheduler
state and thus may hide problems.
efence: setting efence=1 causes the allocator to run in a mode
where each object is allocated on a unique page and addresses are
Expand Down
3 changes: 0 additions & 3 deletions src/runtime/panic.go
Original file line number Diff line number Diff line change
Expand Up @@ -1247,9 +1247,6 @@ func startpanic_m() bool {
if debug.schedtrace > 0 || debug.scheddetail > 0 {
schedtrace(true)
}
if debug.dontfreezetheworld > 0 {
return true
}
freezetheworld()
return true
case 1:
Expand Down
41 changes: 41 additions & 0 deletions src/runtime/proc.go
Original file line number Diff line number Diff line change
Expand Up @@ -921,6 +921,35 @@ var freezing atomic.Bool
// This function must not lock any mutexes.
func freezetheworld() {
freezing.Store(true)
if debug.dontfreezetheworld > 0 {
// Don't prempt Ps to stop goroutines. That will perturb
// scheduler state, making debugging more difficult. Instead,
// allow goroutines to continue execution.
//
// fatalpanic will tracebackothers to trace all goroutines. It
// is unsafe to trace a running goroutine, so tracebackothers
// will skip running goroutines. That is OK and expected, we
// expect users of dontfreezetheworld to use core files anyway.
//
// However, allowing the scheduler to continue running free
// introduces a race: a goroutine may be stopped when
// tracebackothers checks its status, and then start running
// later when we are in the middle of traceback, potentially
// causing a crash.
//
// To mitigate this, when an M naturally enters the scheduler,
// schedule checks if freezing is set and if so stops
// execution. This guarantees that while Gs can transition from
// running to stopped, they can never transition from stopped
// to running.
//
// The sleep here allows racing Ms that missed freezing and are
// about to run a G to complete the transition to running
// before we start traceback.
usleep(1000)
return
}

// stopwait and preemption requests can be lost
// due to races with concurrently executing threads,
// so try several times
Expand Down Expand Up @@ -3552,6 +3581,18 @@ top:

gp, inheritTime, tryWakeP := findRunnable() // blocks until work is available

if debug.dontfreezetheworld > 0 && freezing.Load() {
// See comment in freezetheworld. We don't want to perturb
// scheduler state, so we didn't gcstopm in findRunnable, but
// also don't want to allow new goroutines to run.
//
// Deadlock here rather than in the findRunnable loop so if
// findRunnable is stuck in a loop we don't perturb that
// either.
lock(&deadlock)
lock(&deadlock)
}

// This thread is going to run a goroutine and is not spinning anymore,
// so if it was marked as spinning we need to reset it now and potentially
// start a new spinning M.
Expand Down

0 comments on commit 5b6e6d2

Please sign in to comment.