New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: optimize pprof parsing in pull mode. #628
Conversation
Our profiling data suggested that a lot of time was invested in teh binary search to find locations and functions while writing scraped profiles. This is an attempt to improve the performance by preprocessing the functions and locations and putting them into a table. A benchmark is included to showcase the results with smaller and bigger profiles. As expected, there's no gain with small profiles, quite the opposite, as there's now an extra preprocessing. On the other hand, there are big gains as profiles get bigger (2x for the bigger case). While it'd be possible to find some heuristic to disable the optimization below a certain threshold, the absolute difference between small and big profiles is so big that I think it's not worth it, at least as a first approach.
size-limit report 📦
|
Codecov Report
@@ Coverage Diff @@
## main #628 +/- ##
==========================================
- Coverage 76.28% 75.76% -0.51%
==========================================
Files 43 45 +2
Lines 1454 1588 +134
Branches 284 292 +8
==========================================
+ Hits 1109 1203 +94
- Misses 313 356 +43
+ Partials 32 29 -3
Continue to review full report at Codecov.
|
ParametersDetails
Result
Details
Screenshots |
Going to try this with #630 |
Ran it with #630 — it looks better, but now there's this mapaccess function that takes up a lot of time. |
Yes, that's the tradeoff, using a map instead of the binary search. Looking deeper, the binary search already expects the identifiers to be consecutive numbers starting at 1, this means we can get rid of the map and use a slice instead. The results should be better now even for smaller profiles. Using a map for caching (first change 4aaec20):
Using a slice for caching (second change 14eaf48):
|
I have run the second change with #630 too, and it can be seen how there's no downside now: |
@abeaumont Awesome. I also looked at other metrics like memory usage and everything is looking great there as well. I'm going to finish #630, make it post one of those benchmarking summaries and then merge this one. |
I have checked the pprof spec and unfortunately, consecutive IDs are not guaranteed: https://github.com/google/pprof/blob/master/proto/profile.proto#L164-L165. I'll merge with both the slice and map approaches and make the slice part of the fast path. |
The pprof specification doesn't guarantee that IDs are consecutive, and that is currently supported, while still providing a fast path for the commmon case in which functions and locations have (sorted) consecutive IDs starting from 1.
I added generic support to any valid location/function format as specified in the pprof format in 217066e, while still retaining the slice-based fast path. As suggested by @kolesnikovae, we don't even create a new slice now, as the IDs are usually already sorted (and we can sort them when they are not). I also added some tests as the implementation got quite a bit more complex. |
* feat: support for global size-based retention policy * feat: use manager for ingester subservices * add querier block eviction test * decouple retention enforcer and ingester
Our profiling data suggested that a lot of time was invested in the
binary search to find locations and functions while writing scraped
profiles. This is an attempt to improve the performance by
preprocessing the functions and locations and putting them into a
table.
A benchmark is included to showcase the results with smaller and
bigger profiles. As expected, there's no gain with small profiles,
quite the opposite, as there's now an extra preprocessing.
On the other hand, there are big gains as profiles get bigger (2x for
the bigger case).
While it'd be possible to find some heuristic to disable the
optimization below a certain threshold, the absolute difference
between small and big profiles is so big that I think it's not worth
it, at least as a first approach.