Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/pprof: proportionally account CPU time spent in GC to the allocating frames #32222

Open
CAFxX opened this issue May 24, 2019 · 4 comments

Comments

@CAFxX
Copy link
Contributor

commented May 24, 2019

When troubleshooting CPU usage of production services it would be useful to have an option, at least in the flamegraph/dot visualization, to proportionally assign the CPU time spent in GC to the frames that cause heap allocations.

Currently the way I do is take both CPU and memory allocation profiles, and then "mentally account" proportional GC CPU time based on which frames are actually causing allocations.

If pprof were to offer this option, such a proportional assignment would not have to be extremely precise: for my purposes, it would be OK even if it was based on an estimation of the amount of memory that is allocated.

Similarly, it would not be required to show how time is spent inside the runtime when this option is enabled. The rationale for this is that when I'm tracking down excessive GC CPU usage, I normally don't expect to be hitting a GC bug, but rather excessive allocations in my code. Also, I guess, it would make the implementation of this option much easier.

The way I imagine this could work in the flamegraph would be by having an additional "virtual" single-level stack frame as a child of each one of the frames that are performing heap allocations; the virtual stack frame would be called something like "GC (estimated)". In the graphviz output there would be a single virtual GC node, with its CPU time proportionally assigned to the incoming edges coming from frames that allocate.

I don't have strong opinions about whether GC assist time (if any) should be included in the virtual GC stack-frame, or kept separate.

@bcmills

This comment has been minimized.

Copy link
Member

commented May 24, 2019

@bcmills bcmills added this to the Unplanned milestone May 24, 2019

@hyangah

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

This needs changes in runtime's logic around GC and also CPU profiling, and changes in pprof tool to handle this extra information - @aclements, what do you think?

@aclements

This comment has been minimized.

Copy link
Member

commented May 30, 2019

I'm not sure I entirely understand the proposal. The costs of allocation are already attributed to the allocation sites (including assists, I believe?), but the costs of marking are proportional to retention, not allocation, even though marking is triggered by allocation. E.g., suppose there are two allocation sites, each responsible for half of allocated bytes, but one is always very short-lived and the other long-lived. All of the cost of marking comes from the long-lived allocations. On the other hand, if it weren't for the short-lived allocations, you'd be GC'ing half as often in this example. So what is the "right" attribution of costs in the example?

@hyangah

This comment has been minimized.

Copy link
Contributor

commented May 31, 2019

@aclements good point.

It's complicated than it seems, not only implementation-wise, but also there are many factors that influence GC. Allocation is just one of them. No matter how we try to associate GC time with one or two of the factors, that's still approximation. I am afraid that will cause more confusion than reporting the cpu time as GC. I'd prefer having the users look into memory-related profiles. @CAFxX what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.