feat: pprof truncation #2754

kolesnikovae · 2023-11-24T11:01:24Z

The change is aimed at making location information such as file names and line numbers available in the frontend. Currently, we only preserve function names.

Internally we have two APIs for fetching profiling data:

SelectMergeStacktraces returns data in the flamebearer (flamegraph) format containing function names. This method is used by frontend to built flamegraphs, top table, etc.
SelectMergeProfile returns data in pprof format, including file names, line numbers, etc. This method is used by profilecli and frontend for exporting data in pprof format.

Use of SelectMergeProfile (pprof format) for building flamegraphs is complicated due to the fact that the data model is more sophisticated compared to flamegraph. Moreover, when we build a flamegraph, we only keep a specific number of "top nodes", to minimize the amount of work. The PR adds pprof truncation that is fully aligned with the approach we use in SelectMergeStacktraces API (and thus with the resulting flamegraph):

Query frontend still uses SelectMergeStacktraces, therefore the PR has no immediate impact. Next step is to switch from the SelectMergeStacktraces endpoint to SelectMergeProfile in the query-frontend.
This change does not affect pprof export: the exported profile includes all samples.

From the performance point of view, use of pprof internally is beneficial for medium and large profiles (2-3 times faster), and might be more expensive in case of small ones (max_nodes goes after slash):

Benchmark_block_Resolver_ResolvePprof_Small/0-10  	    1758	    696170 ns/op	 1047428 B/op	   10904 allocs/op
Benchmark_block_Resolver_ResolvePprof_Small/1K-10 	    1278	    949870 ns/op	 1047081 B/op	   10397 allocs/op
Benchmark_block_Resolver_ResolvePprof_Small/8K-10 	    1383	    857939 ns/op	 1097747 B/op	   10201 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/0-10    	       4	 323580094 ns/op	558700138 B/op	  992110 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/8K-10   	       2	 542241000 ns/op	278519352 B/op	  136130 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/16K-10  	       2	 555162334 ns/op	285653820 B/op	  230880 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/32K-10  	       2	 578640708 ns/op	298259452 B/op	  388564 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/64K-10  	       2	 640742208 ns/op	323215032 B/op	  657606 allocs/op

Benchmark_block_Resolver_ResolveTree_Small-10     	    2335	    504526 ns/op	  278999 B/op	    5781 allocs/op
Benchmark_block_Resolver_ResolveTree_Big-10       	       1	1458598750 ns/op	250876448 B/op	 6469298 allocs/op

big - is 8MB profile (gzip), with more than 1.5M of call sites.
small - a Go heap profile with ~500 call sites.

Please note that the method we're using currently (Benchmark_block_Resolver_ResolveTree) does not support truncation at symbolication, we can only drop insignificant nodes after the complete tree is built.

Note about pprof symbolication w/o truncation: it is 4-5 times faster than tree, but allocates twice more memory (on average).

Benchmark_block_Resolver_ResolveTree_Big-10       	       1	1458598750 ns/op	250876448 B/op	 6469298 
Benchmark_block_Resolver_ResolvePprof_Big/0-10    	       4	 323580094 ns/op	558700138 B/op	  992110 
Benchmark_block_Resolver_ResolvePprof_Big/64K-10  	       2	 640742208 ns/op	323215032 B/op	  657606
Benchmark_block_Resolver_ResolvePprof_Big/256K-10  	       2	 882890292 ns/op	459456996 B/op	 1699440

After certain point, truncation becomes pretty much senseless, however we still need to make the profile lighter for the frontend.

Performance aspect is crucial here, because the symbolication process is synchronous and is not parallelized (which is another issue). In pathological cases this stage can make a noticeable contribution to the duration of the query:

cyriltovena · 2023-11-27T08:17:42Z

pkg/phlaredb/symdb/resolver_pprof.go

@@ -13,193 +10,114 @@ type pprofProtoSymbols struct {
 	profile googlev1.Profile
 	symbols *Symbols
 	samples *schemav1.Samples
-	tree    *model.StacktraceTree
+	lut     []uint32


What does it stands for ? Locations.... ?

Oh, that's yet another lookup table

cyriltovena

LGTM

pkg/phlaredb/symdb/resolver_pprof_truncate.go

kolesnikovae added 3 commits November 24, 2023 19:00

Refine pprof truncation

cb4228b

Better tree sizing

6c7da7f

Fix v1 compatibility

97e45a0

kolesnikovae changed the title ~~WIP: feat: pprof truncation~~ feat: pprof truncation Nov 27, 2023

kolesnikovae marked this pull request as ready for review November 27, 2023 07:25

kolesnikovae requested a review from a team as a code owner November 27, 2023 07:25

cyriltovena reviewed Nov 27, 2023

View reviewed changes

cyriltovena approved these changes Nov 27, 2023

View reviewed changes

kolesnikovae commented Nov 27, 2023

View reviewed changes

pkg/phlaredb/symdb/resolver_pprof_truncate.go Outdated Show resolved Hide resolved

pkg/phlaredb/symdb/resolver_pprof_truncate.go Outdated Show resolved Hide resolved

kolesnikovae added 2 commits November 27, 2023 16:26

Update pkg/phlaredb/symdb/resolver_pprof_truncate.go

58adcc9

Update pkg/phlaredb/symdb/resolver_pprof_truncate.go

68d2966

kolesnikovae merged commit 000930f into main Nov 27, 2023
19 checks passed

kolesnikovae deleted the feat/pprof-truncation branch November 27, 2023 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pprof truncation #2754

feat: pprof truncation #2754

kolesnikovae commented Nov 24, 2023 •

edited

cyriltovena Nov 27, 2023

kolesnikovae Nov 27, 2023

cyriltovena left a comment

feat: pprof truncation #2754

feat: pprof truncation #2754

Conversation

kolesnikovae commented Nov 24, 2023 • edited

cyriltovena Nov 27, 2023

Choose a reason for hiding this comment

kolesnikovae Nov 27, 2023

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

kolesnikovae commented Nov 24, 2023 •

edited