Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime/pprof: cross system stack transitions in the heap profiler #66385

nsrip-dd opened this issue Mar 18, 2024 · 1 comment


Copy link

Proposal Details


I propose that the heap profiler cross system stack transitions in tracebacks, to be consistent with the other profilers.

The user-visible changes would be:

  • For heap allocations on the systemstack we would see the user stack leading to the allocation
  • For the same allocations, we would no longer see the runtime frames (since the heap profiler hides them)


The runtime profilers are inconsistent in how they handle system stack transitions in tracebacks. Given a sequence of calls like this:

main.main                  <--+                      +-- User portion                   |
runtime.systemstack_switch <--+

runtime.systemstack        <--+             +-- System portion
runtime.interestingEvent      |
runtime.recordEvent        <--+

The profilers report a traceback like so:

  • The CPU profiler shows both the system and user portion of the traceback
  • The block and mutex profilers show the user portion of the traceback
    • The recently-added runtime lock profiling shows the system and user portions
  • The runtime execution tracer shows the user portion of the traceback
  • The heap profiler shows only the system portion of the traceback, when the sampled allocation happens on a system stack

As a rule of thumb, I think we want the entire sequence of calls leading up to the event of interest, possibly excluding implementation details at the end of the sequence. More often than not, the user portion of the traceback is the most informative as a developer.

The heap profiler is the only one which won't show the user portion of the stack consistently. We see this in practice, for example, when starting a new goroutine requires allocating a new g. Today we'd see a traceback leading from runtime.systemstack to runtime.malg, but we wouldn't see the user portion of the call stack leading to the go statement. Note that under this proposal we wouldn't see the system stack frames after the go statement, because the heap profiler elides runtime frames from the end of tracebacks. (Source)

This is in part motivated by trying to use frame pointer unwinding for more of the runtime profilers, see Naive frame pointer unwinding isn't going to know whether or not it's crossing the systemstack transition. Either of crossing the transition or just capturing the user portion of the call stack would be much more straightforward to match with frame pointer unwinding than only capturing the system portion.

cc @golang/runtime @prattmic

@gopherbot gopherbot added this to the Proposal milestone Mar 18, 2024
Copy link

Change mentions this issue: runtime: use frame pointer unwinding for the heap profiler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Status: Incoming

No branches or pull requests

2 participants