Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/pprof: account CPU and allocations of goroutines to the frame where they are created #32223

Open
CAFxX opened this issue May 24, 2019 · 6 comments

Comments

@CAFxX
Copy link
Contributor

commented May 24, 2019

When troubleshooting CPU usage of production services it would be useful to have an option, at least in the flamegraph visualization, to account the CPU time and memory allocations of a goroutine to the frame that created the goroutine.

Currently, the way I do this is take a CPU or memory profile, and then go through the code to reconstruct where goroutines were created, so that I can then proceed to identify the full stacktrace that lead to excessive CPU or memory usage.

The way I imagine this could work in the flamegraph would be by considering stack traces to include not just the the stack of the goroutine, but also the transitive stacks of the goroutines that created the current goroutine (up to a maximum limit that - if reached - would cause the option to be disabled).

Currently AFAIK this would be hard to do as described as we only record the PC of where the goroutine is created. I am not knowledgeable enough to know if there are some other ways to do (now, or in the future) what I described above; if such a way existed it would make profiling much more effective and easier to use when dealing with large codebases that are go-happy.

@CAFxX CAFxX changed the title cmd/pprof: proportionally account CPU and allocations of goroutines to frame where they are created cmd/pprof: account CPU and allocations of goroutines to the frame where they are created May 24, 2019

@bcmills

This comment has been minimized.

Copy link
Member

commented May 24, 2019

“The frame that created the goroutine” is not always the right one for accounting. Every goroutine traces back to either main or init, so if you apply that transitively then it becomes useless, and if it isn't transitive then it becomes very confusing.

That said, we could probably do some sort of attribution using runtime/trace regions, if we don't already (CC @hyangah @dvyukov).

@CAFxX

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2019

Every goroutine traces back to either main or init, so if you apply that transitively then it becomes useless

Can you elaborate on why you think that would be useless? That's exactly what I would like to have: if a path through the call graph rooted in main created a goroutine, I would like the CPU time (or memory) consumed by that goroutine be accounted to main, as if it was a normal child "routine" instead of a goroutine.

Consider that right now all CPU/mem (including the ones used by goroutines) is already accounted as a child of an "implicit" root node... this, in my mind, doesn't make the current visualization useless.

@hyangah

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

That's exactly what I would like to have: if a path through the call graph rooted in main created a goroutine, I would like the CPU time (or memory) consumed by that goroutine be accounted to main, as if it was a normal child "routine" instead of a goroutine.

Most user-created goroutines will be rooted from init or main and it's hard to imagine for me to imagine the usefulness of such analysis. @CAFxX do you have a specific example that demonstrates such profiling and analysis was useful?

How about other tools such as tagging the cpu profile with the runtime/pprof.Label, or some static code analysis tools?

Keeping track of all the goroutine creation call stack may be not cheap, and we need to balance between the profiling cost and the usefulness of the profile.

@CAFxX

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

Most user-created goroutines will be rooted from init or main and it's hard to imagine for me to imagine the usefulness of such analysis.

Agreed that most goroutines will be rooted in either init or main, but I already addressed that:

Consider that right now all CPU/mem (including the ones used by goroutines) is already accounted as a child of an "implicit" root node... this, in my mind, doesn't make the current visualization useless.

The reason for the current choice is obviously technical (as it's cheaper to root things in an implicit root node, rather than keeping track of the full stacks of the transitively-spawning Gs), but if the argument is that rooting everything to a small subset of roots makes no sense, then the argument does not seem to me to be very compelling, as that's what we already do (by rooting everything in a single, arbitrary, implicit root).

@CAFxX do you have a specific example that demonstrates such profiling and analysis was useful?

Sure. Consider some sort of server that idiomatically spawns one goroutine for each request, and that indipendently does some background processing.

Without the proposed visualization, there is no way intuitive way to account the resources consumed by the goroutines to the server part (vs. the background processing part).

You may argue that in such a simple case you would easily see that the resources consumed by the goroutine can only belong to the server. The obvious counterpoint is that it's not always easy, in the real world:

  • you can have multiple listeners (e.g. for gRPC and HTTP, or gRPC and pubsub), that (after API adaptation) spawn goroutines running the same code: in this case it's impossible to know how to split the resource consumption between the listeners
  • the request goroutine can itself spawn one or more goroutines, e.g. to send parallel subrequests, or to handle timeouts, or to do async on-demand processing: in this case you would have multiple goroutine stacks all rooted on the implicit root node, with no easy way to reconstruct the call graph that triggered the resource usage

There are many more potential scenarios: the two above are things I actually struggle daily with.

@hyangah

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

Sure. Consider some sort of server that idiomatically spawns one goroutine for each request, and that indipendently does some background processing.

Without the proposed visualization, there is no way intuitive way to account the resources consumed by the goroutines to the server part (vs. the background processing part).

You may argue that in such a simple case you would easily see that the resources consumed by the goroutine can only belong to the server. The obvious counterpoint is that it's not always easy, in the real world:

you can have multiple listeners (e.g. for gRPC and HTTP, or gRPC and pubsub), that (after API adaptation) spawn goroutines running the same code: in this case it's impossible to know how to split the resource consumption between the listeners
the request goroutine can itself spawn one or more goroutines, e.g. to send parallel subrequests, or to handle timeouts, or to do async on-demand processing: in this case you would have multiple goroutine stacks all rooted on the implicit root node, with no easy way to reconstruct the call graph that triggered the resource usage
There are many more potential scenarios: the two above are things I actually struggle daily with.

That is exactly for which runtime/pprof.Labels and related APIs were designed. That requires explicit labeling but it provides more flexibility than classifying the profilies based on who created the frames. Also, they are propagated to children goroutines. There are blog posts and tutorials on the web (https://rakyll.org/profiler-labels/, etc). Tracing and profiling libraries such as OpenCensus supports the labels - which opens up the possibility of profiling across distributed processes. The tool pprof offers options to filter and focus based on the labels (in pprof-terminology, they are called tags. See the options such as -tagfocus, -taghide)

Currently only CPU profiles support labels and #23458 is a tracking issue to expand the label support to memory allocation profiles.

@gopherbot

This comment has been minimized.

Copy link

commented Aug 7, 2019

Change https://golang.org/cl/189317 mentions this issue: runtime/pprof: Mention goroutine label heritability

gopherbot pushed a commit that referenced this issue Aug 7, 2019
runtime/pprof: Mention goroutine label heritability
Document goroutine label inheritance. Goroutine labels are copied upon
goroutine creation and there is a test enforcing this, but it was not
mentioned in the docstrings for `Do` or `SetGoroutineLabels`.

Add notes to both of those functions' docstrings so it's clear that one
does not need to set labels as soon as a new goroutine is spawned if
they want to propagate tags.

Updates #32223
Updates #23458

Change-Id: Idfa33031af0104b884b03ca855ac82b98500c8b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/189317
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.