Skip to content

runtime/pprof: crash "cannot read stack of running goroutine" in goroutine profile #74090

Closed
@mknyszek

Description

@mknyszek

This crash was seen internally at Google. The root cause is a bug in https://go.dev/cl/650697 and related to handling of system goroutines (specifically weird goroutines like finalizer and cleanup goroutines that may sometimes be system and sometimes user goroutines) in goroutine profiles. Fix incoming.

Full description:

Before CL 650697, there was only one system goroutine that could dynamically change between being a user goroutine and a system goroutine, and that was the finalizer/cleanup goroutine. In goroutine profiles, it was handled explicitly. It's status would be checked during the first STW, and its stack would be recorded. This let the goroutine profiler completely ignore system goroutines once the world was started again.

CL 650697 added dedicated cleanup goroutines (there may be more than one), and with this, the logic for finalizer goroutines no longer scaled. In that CL, I let the isSystemGoroutine check be dynamic and dropped the special case, but this was based on incorrect assumptions. Namely, it's possible for the scheduler to observe, for example, the finalizer goroutine as a system goroutine and ignore it, but then later the goroutine profiler itself sees it as a user goroutine. At that point it's too late and already running. This violates the invariant of the goroutine profile that all goroutines are handled by the profiler before they start executing. In practice, the result is that the goroutine profiler can crash when it checks this invariant (not checking the invariant means racily reading goroutine stack memory).

The root cause of the problem is that these system goroutines do not participate in the goroutine profiler's state machine. Normally, when profiling, goroutines transition from 'absent' to 'in-progress' to 'satisfied'. However with system goroutines, the state machine is ignored entirely. They always stay in the 'absent' state. This means that if a goroutine transitions from system to user, it is eligible for a profile record when it shouldn't be. That transition shouldn't be allowed to occur with respect to the goroutine profiler, because the goroutine profiler is trying to snapshot the state of every goroutine.

Metadata

Metadata

Assignees

Labels

BugReportIssues describing a possible bug in the Go implementation.NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.release-blocker

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions