Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize pprof parsing in pull mode. #628

Merged
merged 4 commits into from Dec 22, 2021

Conversation

abeaumont
Copy link
Contributor

@abeaumont abeaumont commented Dec 17, 2021

Our profiling data suggested that a lot of time was invested in the
binary search to find locations and functions while writing scraped
profiles. This is an attempt to improve the performance by
preprocessing the functions and locations and putting them into a
table.

A benchmark is included to showcase the results with smaller and
bigger profiles. As expected, there's no gain with small profiles,
quite the opposite, as there's now an extra preprocessing.
On the other hand, there are big gains as profiles get bigger (2x for
the bigger case).

While it'd be possible to find some heuristic to disable the
optimization below a certain threshold, the absolute difference
between small and big profiles is so big that I think it's not worth
it, at least as a first approach.

Our profiling data suggested that a lot of time was invested in teh
binary search to find locations and functions while writing scraped
profiles. This is an attempt to improve the performance by
preprocessing the functions and locations and putting them into a
table.

A benchmark is included to showcase the results with smaller and
bigger profiles. As expected, there's no gain with small profiles,
quite the opposite, as there's now an extra preprocessing.
On the other hand, there are big gains as profiles get bigger (2x for
the bigger case).

While it'd be possible to find some heuristic to disable the
optimization below a certain threshold, the absolute difference
between small and big profiles is so big that I think it's not worth
it, at least as a first approach.
@github-actions
Copy link
Contributor

github-actions bot commented Dec 17, 2021

size-limit report 📦

Path Size Loading time (3g) Running time (snapdragon) Total time
webapp/public/assets/app.js 291.62 KB (0%) 5.9 s (0%) 209 ms (+99.84% 🔺) 6.1 s

@codecov
Copy link

codecov bot commented Dec 17, 2021

Codecov Report

Merging #628 (217066e) into main (fc05077) will decrease coverage by 0.52%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #628      +/-   ##
==========================================
- Coverage   76.28%   75.76%   -0.51%     
==========================================
  Files          43       45       +2     
  Lines        1454     1588     +134     
  Branches      284      292       +8     
==========================================
+ Hits         1109     1203      +94     
- Misses        313      356      +43     
+ Partials       32       29       -3     
Impacted Files Coverage Δ
...ents/FlameGraph/FlameGraphComponent/DiffLegend.tsx 87.88% <0.00%> (-12.12%) ⬇️
webapp/javascript/redux/actionTypes.js 100.00% <0.00%> (ø)
webapp/javascript/components/CustomDatePicker.jsx
webapp/javascript/components/CustomDatePicker.tsx 25.93% <0.00%> (ø)
webapp/javascript/ui/Notifications.tsx 60.00% <0.00%> (ø)
webapp/javascript/redux/reducers/notifications.ts 37.50% <0.00%> (ø)
webapp/javascript/redux/actions.js 21.24% <0.00%> (+1.89%) ⬆️
webapp/javascript/util/formatDate.js 100.00% <0.00%> (+96.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc05077...217066e. Read the comment docs.

@pyroscopebot
Copy link
Collaborator

pyroscopebot commented Dec 17, 2021

Parameters

Details
Name Value
BENCH_RUN_FOR 10m
PYROBENCH_RAND_SEED 2306912
PYROBENCH_PROFILE_WIDTH 20
PYROBENCH_PROFILE_DEPTH 20
PYROBENCH_PROFILE_SYMBOL_LENGTH 30
PYROBENCH_APPS 20
PYROBENCH_CLIENTS 20
PYROBENCH_REQUESTS 10000

Result

main pr diff threshold
throughput 147.30 147.32 0.02 (0.02%) 5%
total items processed 121200.00 121707.00 507.00 (0.42%) 5%
Details
Name Description Query for main Query for pr
throughput rate(pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope:4040"}[5m]) rate(pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope_main:4040"}[5m])
total items processed pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope:4040"} pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope_main:4040"}

Screenshots

Throughput Throughput
Disk Usage Disk Usage
Memory Memory
Upload Errors (Total) Upload Errors (Total)
Successful Uploads (Total) Successful Uploads (Total)
CPU Utilization CPU Utilization

Generated by 🚫 dangerJS against 5e1dbdd

@petethepig
Copy link
Member

Going to try this with #630

@petethepig
Copy link
Member

Ran it with #630 — it looks better, but now there's this mapaccess function that takes up a lot of time.
Screen Shot 2021-12-17 at 5 36 28 PM

@abeaumont
Copy link
Contributor Author

Yes, that's the tradeoff, using a map instead of the binary search. Looking deeper, the binary search already expects the identifiers to be consecutive numbers starting at 1, this means we can get rid of the map and use a slice instead. The results should be better now even for smaller profiles.

Using a map for caching (first change 4aaec20):

BenchmarkProfile_ParseNoCache/Locations:_29,_functions_28-16              248778              4098 ns/op
BenchmarkProfile_ParseWithCache/Locations:_29,_functions_28-16            170401              6764 ns/op
BenchmarkProfile_ParseNoCache_Big/Locations:_1145,_functions_1145-16         120           9858162 ns/op
BenchmarkProfile_ParseWithCache_Big/Locations_1145,_functions_1145-16                216           5477587 ns/op

Using a slice for caching (second change 14eaf48):

BenchmarkProfile_ParseNoCache/Locations:_29,_functions_28-16              290997              4151 ns/op
BenchmarkProfile_ParseWithCache/Locations:_29,_functions_28-16            467185              2513 ns/op
BenchmarkProfile_ParseNoCache_Big/Locations:_1145,_functions_1145-16         109           9495172 ns/op
BenchmarkProfile_ParseWithCache_Big/Locations_1145,_functions_1145-16                314           3826945 ns/op

@abeaumont
Copy link
Contributor Author

I have run the second change with #630 too, and it can be seen how there's no downside now:

issue-628

@petethepig
Copy link
Member

petethepig commented Dec 22, 2021

@abeaumont Awesome. I also looked at other metrics like memory usage and everything is looking great there as well. I'm going to finish #630, make it post one of those benchmarking summaries and then merge this one.

@abeaumont
Copy link
Contributor Author

I have checked the pprof spec and unfortunately, consecutive IDs are not guaranteed: https://github.com/google/pprof/blob/master/proto/profile.proto#L164-L165. I'll merge with both the slice and map approaches and make the slice part of the fast path.

The pprof specification doesn't guarantee that IDs are consecutive,
and that is currently supported, while still providing a fast path for
the commmon case in which functions and locations have (sorted)
consecutive IDs starting from 1.
@abeaumont
Copy link
Contributor Author

I added generic support to any valid location/function format as specified in the pprof format in 217066e, while still retaining the slice-based fast path. As suggested by @kolesnikovae, we don't even create a new slice now, as the IDs are usually already sorted (and we can sort them when they are not).

I also added some tests as the implementation got quite a bit more complex.

@petethepig petethepig merged commit c626be1 into main Dec 22, 2021
@petethepig petethepig deleted the perf/optimize-pprof-scrapping branch December 22, 2021 23:23
korniltsev pushed a commit that referenced this pull request Jul 18, 2023
* feat: support for global size-based retention policy

* feat: use manager for ingester subservices

* add querier block eviction test

* decouple retention enforcer and ingester
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants