perf: optimize pprof parsing in pull mode. #628

abeaumont · 2021-12-17T17:35:18Z

Our profiling data suggested that a lot of time was invested in the
binary search to find locations and functions while writing scraped
profiles. This is an attempt to improve the performance by
preprocessing the functions and locations and putting them into a
table.

A benchmark is included to showcase the results with smaller and
bigger profiles. As expected, there's no gain with small profiles,
quite the opposite, as there's now an extra preprocessing.
On the other hand, there are big gains as profiles get bigger (2x for
the bigger case).

While it'd be possible to find some heuristic to disable the
optimization below a certain threshold, the absolute difference
between small and big profiles is so big that I think it's not worth
it, at least as a first approach.

Our profiling data suggested that a lot of time was invested in teh binary search to find locations and functions while writing scraped profiles. This is an attempt to improve the performance by preprocessing the functions and locations and putting them into a table. A benchmark is included to showcase the results with smaller and bigger profiles. As expected, there's no gain with small profiles, quite the opposite, as there's now an extra preprocessing. On the other hand, there are big gains as profiles get bigger (2x for the bigger case). While it'd be possible to find some heuristic to disable the optimization below a certain threshold, the absolute difference between small and big profiles is so big that I think it's not worth it, at least as a first approach.

github-actions · 2021-12-17T17:38:11Z

size-limit report 📦

Path	Size	Loading time (3g)	Running time (snapdragon)	Total time
webapp/public/assets/app.js	291.62 KB (0%)	5.9 s (0%)	209 ms (+99.84% 🔺)	6.1 s

codecov · 2021-12-17T17:40:35Z

Codecov Report

Merging #628 (217066e) into main (fc05077) will decrease coverage by 0.52%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #628      +/-   ##
==========================================
- Coverage   76.28%   75.76%   -0.51%     
==========================================
  Files          43       45       +2     
  Lines        1454     1588     +134     
  Branches      284      292       +8     
==========================================
+ Hits         1109     1203      +94     
- Misses        313      356      +43     
+ Partials       32       29       -3

Impacted Files	Coverage Δ
...ents/FlameGraph/FlameGraphComponent/DiffLegend.tsx	`87.88% <0.00%> (-12.12%)`	⬇️
webapp/javascript/redux/actionTypes.js	`100.00% <0.00%> (ø)`
webapp/javascript/components/CustomDatePicker.jsx
webapp/javascript/components/CustomDatePicker.tsx	`25.93% <0.00%> (ø)`
webapp/javascript/ui/Notifications.tsx	`60.00% <0.00%> (ø)`
webapp/javascript/redux/reducers/notifications.ts	`37.50% <0.00%> (ø)`
webapp/javascript/redux/actions.js	`21.24% <0.00%> (+1.89%)`	⬆️
webapp/javascript/util/formatDate.js	`100.00% <0.00%> (+96.30%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc05077...217066e. Read the comment docs.

pyroscopebot · 2021-12-17T18:01:57Z

Parameters

Details

Name	Value
`BENCH_RUN_FOR`	`10m`
`PYROBENCH_RAND_SEED`	`2306912`
`PYROBENCH_PROFILE_WIDTH`	`20`
`PYROBENCH_PROFILE_DEPTH`	`20`
`PYROBENCH_PROFILE_SYMBOL_LENGTH`	`30`
`PYROBENCH_APPS`	`20`
`PYROBENCH_CLIENTS`	`20`
`PYROBENCH_REQUESTS`	`10000`

Result

	main	pr	diff	threshold
throughput	147.30	147.32	0.02 (0.02%)	5%
total items processed	121200.00	121707.00	507.00 (0.42%)	5%

Details

Name	Description	Query for main	Query for pr
throughput		`rate(pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope:4040"}[5m])`	`rate(pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope_main:4040"}[5m])`
total items processed		`pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope:4040"}`	`pyroscope_http_request_duration_seconds_count{handler="/ingest", instance="pyroscope_main:4040"}`

Screenshots

Throughput

Disk Usage

Memory

Upload Errors (Total)

Successful Uploads (Total)

CPU Utilization

Generated by 🚫 dangerJS against 5e1dbdd

petethepig · 2021-12-18T00:56:14Z

Going to try this with #630

petethepig · 2021-12-18T01:36:55Z

Ran it with #630 — it looks better, but now there's this mapaccess function that takes up a lot of time.

abeaumont · 2021-12-20T09:13:12Z

Yes, that's the tradeoff, using a map instead of the binary search. Looking deeper, the binary search already expects the identifiers to be consecutive numbers starting at 1, this means we can get rid of the map and use a slice instead. The results should be better now even for smaller profiles.

Using a map for caching (first change 4aaec20):

BenchmarkProfile_ParseNoCache/Locations:_29,_functions_28-16              248778              4098 ns/op
BenchmarkProfile_ParseWithCache/Locations:_29,_functions_28-16            170401              6764 ns/op
BenchmarkProfile_ParseNoCache_Big/Locations:_1145,_functions_1145-16         120           9858162 ns/op
BenchmarkProfile_ParseWithCache_Big/Locations_1145,_functions_1145-16                216           5477587 ns/op

Using a slice for caching (second change 14eaf48):

BenchmarkProfile_ParseNoCache/Locations:_29,_functions_28-16              290997              4151 ns/op
BenchmarkProfile_ParseWithCache/Locations:_29,_functions_28-16            467185              2513 ns/op
BenchmarkProfile_ParseNoCache_Big/Locations:_1145,_functions_1145-16         109           9495172 ns/op
BenchmarkProfile_ParseWithCache_Big/Locations_1145,_functions_1145-16                314           3826945 ns/op

abeaumont · 2021-12-22T10:33:21Z

I have run the second change with #630 too, and it can be seen how there's no downside now:

petethepig · 2021-12-22T10:39:46Z

@abeaumont Awesome. I also looked at other metrics like memory usage and everything is looking great there as well. I'm going to finish #630, make it post one of those benchmarking summaries and then merge this one.

abeaumont · 2021-12-22T11:36:07Z

I have checked the pprof spec and unfortunately, consecutive IDs are not guaranteed: https://github.com/google/pprof/blob/master/proto/profile.proto#L164-L165. I'll merge with both the slice and map approaches and make the slice part of the fast path.

The pprof specification doesn't guarantee that IDs are consecutive, and that is currently supported, while still providing a fast path for the commmon case in which functions and locations have (sorted) consecutive IDs starting from 1.

abeaumont · 2021-12-22T17:32:53Z

I added generic support to any valid location/function format as specified in the pprof format in 217066e, while still retaining the slice-based fast path. As suggested by @kolesnikovae, we don't even create a new slice now, as the IDs are usually already sorted (and we can sort them when they are not).

I also added some tests as the implementation got quite a bit more complex.

* feat: support for global size-based retention policy * feat: use manager for ingester subservices * add querier block eviction test * decouple retention enforcer and ingester

github-actions bot requested review from petethepig and Rperry2174 December 17, 2021 17:35

Replace the map by a slice in function and location caching.

14eaf48

Add the bigger CPU profile to testdata.

0f982c8

petethepig approved these changes Dec 22, 2021

View reviewed changes

petethepig merged commit c626be1 into main Dec 22, 2021

petethepig deleted the perf/optimize-pprof-scrapping branch December 22, 2021 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize pprof parsing in pull mode. #628

perf: optimize pprof parsing in pull mode. #628

abeaumont commented Dec 17, 2021 •

edited

github-actions bot commented Dec 17, 2021 •

edited

codecov bot commented Dec 17, 2021 •

edited

pyroscopebot commented Dec 17, 2021 •

edited

petethepig commented Dec 18, 2021

petethepig commented Dec 18, 2021

abeaumont commented Dec 20, 2021

abeaumont commented Dec 22, 2021

petethepig commented Dec 22, 2021 •

edited

abeaumont commented Dec 22, 2021

abeaumont commented Dec 22, 2021

perf: optimize pprof parsing in pull mode. #628

perf: optimize pprof parsing in pull mode. #628

Conversation

abeaumont commented Dec 17, 2021 • edited

github-actions bot commented Dec 17, 2021 • edited

size-limit report 📦

codecov bot commented Dec 17, 2021 • edited

Codecov Report

pyroscopebot commented Dec 17, 2021 • edited

Parameters

Result

Screenshots

petethepig commented Dec 18, 2021

petethepig commented Dec 18, 2021

abeaumont commented Dec 20, 2021

abeaumont commented Dec 22, 2021

petethepig commented Dec 22, 2021 • edited

abeaumont commented Dec 22, 2021

abeaumont commented Dec 22, 2021

abeaumont commented Dec 17, 2021 •

edited

github-actions bot commented Dec 17, 2021 •

edited

codecov bot commented Dec 17, 2021 •

edited

pyroscopebot commented Dec 17, 2021 •

edited

petethepig commented Dec 22, 2021 •

edited