-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix truncated Go CPU profiles #3344
Fix truncated Go CPU profiles #3344
Conversation
ea435bc
to
cfc7a8e
Compare
Found a good sample – CloudWatch exporter. AWS Smithy Go code has always been very "deep" because of the countless middlewares, callbacks, hooks, etc. flamegraph_2024-06-07_1220-to-2024-06-07_1320.pb.gz RE: connecting / overlapping frames – the easiest way to examine truncated stack traces I found is to switch to the sandwich view, where the "ladder" at the top indicates presence of the issue (this is mimir querier in the screenshot, btw) |
There's one thing that may require an adjustment: a very conservative recursion check that may cause a situation where part of the profiles are not repaired, which makes the whole idea somewhat ineffective pyroscope/pkg/pprof/fix_go_heap_truncated.go Lines 171 to 174 in 4837be6
I suggest that we double it regardless of the profile/sample type to relax the restriction. If it doesn't help and CPU stack traces are still not repaired, we may need to play with constants – token size, suffix length, overlap size, etc. |
cfc7a8e
to
1a2f73b
Compare
My strategy is using pprof's new sandwich view, it just takes a bit more than doing it in Pyroscope itself.
pyroscope/pkg/pprof/fix_go_heap_truncated.go Lines 171 to 174 in 4837be6
The first time I see a difference in cloud-watch-exporter's profiles is if I am using values fairly high 56 or higher. |
Yeah, this is more like a safety check. I also haven't seen too many examples. I'm suggesting increasing it because I see that the repair does not happen in some cases where it should have worked; and this limit is the only meaningful explanation I can find. Actually, I added this limit after testing it in one of our internal deployments, due to the observed CPU burn |
Not quite sure where the requirement for minGroupSize 2 comes from
03ae168
to
041574a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
This extends the truncation fixing of heap profiles to also cover CPU profiles.
I am also adding an mimir-querier profile, but that seems to be not fixable, as it is doens't contain the connecting stacks.