Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix truncated Go CPU profiles #3344

Merged
merged 4 commits into from
Jun 11, 2024

Conversation

simonswine
Copy link
Contributor

@simonswine simonswine commented Jun 6, 2024

This extends the truncation fixing of heap profiles to also cover CPU profiles.

I am also adding an mimir-querier profile, but that seems to be not fixable, as it is doens't contain the connecting stacks.

image

@simonswine simonswine changed the title Fix truncated CPU profiles WIP: Fix truncated CPU profiles Jun 6, 2024
@simonswine simonswine force-pushed the 20240606_fix-truncated-pprofs branch from ea435bc to cfc7a8e Compare June 6, 2024 17:00
pkg/pprof/fix_go_truncated.go Outdated Show resolved Hide resolved
pkg/pprof/fix_go_truncated.go Outdated Show resolved Hide resolved
pkg/pprof/fix_go_truncated.go Outdated Show resolved Hide resolved
@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Jun 7, 2024

Found a good sample – CloudWatch exporter. AWS Smithy Go code has always been very "deep" because of the countless middlewares, callbacks, hooks, etc.

image

flamegraph_2024-06-07_1220-to-2024-06-07_1320.pb.gz


RE: connecting / overlapping frames – the easiest way to examine truncated stack traces I found is to switch to the sandwich view, where the "ladder" at the top indicates presence of the issue (this is mimir querier in the screenshot, btw)

image

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented Jun 7, 2024

There's one thing that may require an adjustment: a very conservative recursion check that may cause a situation where part of the profiles are not repaired, which makes the whole idea somewhat ineffective

if j == tokenLen {
// Profiles with deeply recursive stack traces are ignored.
return
}

I suggest that we double it regardless of the profile/sample type to relax the restriction.

If it doesn't help and CPU stack traces are still not repaired, we may need to play with constants – token size, suffix length, overlap size, etc.

@simonswine simonswine force-pushed the 20240606_fix-truncated-pprofs branch from cfc7a8e to 1a2f73b Compare June 7, 2024 17:20
@simonswine
Copy link
Contributor Author

RE: connecting / overlapping frames – the easiest way to examine truncated stack traces I found is to switch to the ?
sandwich view, where the "ladder" at the top indicates presence of the issue (this is mimir querier in the screenshot, btw)

My strategy is using pprof's new sandwich view, it just takes a bit more than doing it in Pyroscope itself.

There's one thing that may require an adjustment: a very conservative recursion check that may cause a situation where part of the profiles are not repaired, which makes the whole idea somewhat ineffective

if j == tokenLen {
// Profiles with deeply recursive stack traces are ignored.
return
}

The first time I see a difference in cloud-watch-exporter's profiles is if I am using values fairly high 56 or higher.

@kolesnikovae
Copy link
Collaborator

The first time I see a difference in cloud-watch-exporter's profiles is if I am using values fairly high 56 or higher.

Yeah, this is more like a safety check. I also haven't seen too many examples. I'm suggesting increasing it because I see that the repair does not happen in some cases where it should have worked; and this limit is the only meaningful explanation I can find. Actually, I added this limit after testing it in one of our internal deployments, due to the observed CPU burn

@simonswine simonswine force-pushed the 20240606_fix-truncated-pprofs branch from 03ae168 to 041574a Compare June 11, 2024 13:16
@simonswine simonswine changed the title WIP: Fix truncated CPU profiles Fix truncated Go CPU profiles Jun 11, 2024
@simonswine simonswine marked this pull request as ready for review June 11, 2024 13:18
@simonswine simonswine requested a review from a team as a code owner June 11, 2024 13:18
Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@simonswine simonswine merged commit 30af212 into grafana:main Jun 11, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants