When troubleshooting CPU usage of production services it would be useful to have an option, at least in the flamegraph/dot visualization, to proportionally assign the CPU time spent in GC to the frames that cause heap allocations.
Currently the way I do is take both CPU and memory allocation profiles, and then "mentally account" proportional GC CPU time based on which frames are actually causing allocations.
If pprof were to offer this option, such a proportional assignment would not have to be extremely precise: for my purposes, it would be OK even if it was based on an estimation of the amount of memory that is allocated.
Similarly, it would not be required to show how time is spent inside the runtime when this option is enabled. The rationale for this is that when I'm tracking down excessive GC CPU usage, I normally don't expect to be hitting a GC bug, but rather excessive allocations in my code. Also, I guess, it would make the implementation of this option much easier.
The way I imagine this could work in the flamegraph would be by having an additional "virtual" single-level stack frame as a child of each one of the frames that are performing heap allocations; the virtual stack frame would be called something like "GC (estimated)". In the graphviz output there would be a single virtual GC node, with its CPU time proportionally assigned to the incoming edges coming from frames that allocate.
I don't have strong opinions about whether GC assist time (if any) should be included in the virtual GC stack-frame, or kept separate.
The text was updated successfully, but these errors were encountered:
I'm not sure I entirely understand the proposal. The costs of allocation are already attributed to the allocation sites (including assists, I believe?), but the costs of marking are proportional to retention, not allocation, even though marking is triggered by allocation. E.g., suppose there are two allocation sites, each responsible for half of allocated bytes, but one is always very short-lived and the other long-lived. All of the cost of marking comes from the long-lived allocations. On the other hand, if it weren't for the short-lived allocations, you'd be GC'ing half as often in this example. So what is the "right" attribution of costs in the example?
It's complicated than it seems, not only implementation-wise, but also there are many factors that influence GC. Allocation is just one of them. No matter how we try to associate GC time with one or two of the factors, that's still approximation. I am afraid that will cause more confusion than reporting the cpu time as GC. I'd prefer having the users look into memory-related profiles. @CAFxX what do you think?