Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pprof truncation #2754

Merged
merged 5 commits into from Nov 27, 2023
Merged

feat: pprof truncation #2754

merged 5 commits into from Nov 27, 2023

Conversation

kolesnikovae
Copy link
Collaborator

@kolesnikovae kolesnikovae commented Nov 24, 2023

The change is aimed at making location information such as file names and line numbers available in the frontend. Currently, we only preserve function names.

Internally we have two APIs for fetching profiling data:

  1. SelectMergeStacktraces returns data in the flamebearer (flamegraph) format containing function names. This method is used by frontend to built flamegraphs, top table, etc.
  2. SelectMergeProfile returns data in pprof format, including file names, line numbers, etc. This method is used by profilecli and frontend for exporting data in pprof format.

Use of SelectMergeProfile (pprof format) for building flamegraphs is complicated due to the fact that the data model is more sophisticated compared to flamegraph. Moreover, when we build a flamegraph, we only keep a specific number of "top nodes", to minimize the amount of work. The PR adds pprof truncation that is fully aligned with the approach we use in SelectMergeStacktraces API (and thus with the resulting flamegraph):

  • Query frontend still uses SelectMergeStacktraces, therefore the PR has no immediate impact. Next step is to switch from the SelectMergeStacktraces endpoint to SelectMergeProfile in the query-frontend.
  • This change does not affect pprof export: the exported profile includes all samples.

From the performance point of view, use of pprof internally is beneficial for medium and large profiles (2-3 times faster), and might be more expensive in case of small ones (max_nodes goes after slash):

Benchmark_block_Resolver_ResolvePprof_Small/0-10  	    1758	    696170 ns/op	 1047428 B/op	   10904 allocs/op
Benchmark_block_Resolver_ResolvePprof_Small/1K-10 	    1278	    949870 ns/op	 1047081 B/op	   10397 allocs/op
Benchmark_block_Resolver_ResolvePprof_Small/8K-10 	    1383	    857939 ns/op	 1097747 B/op	   10201 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/0-10    	       4	 323580094 ns/op	558700138 B/op	  992110 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/8K-10   	       2	 542241000 ns/op	278519352 B/op	  136130 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/16K-10  	       2	 555162334 ns/op	285653820 B/op	  230880 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/32K-10  	       2	 578640708 ns/op	298259452 B/op	  388564 allocs/op
Benchmark_block_Resolver_ResolvePprof_Big/64K-10  	       2	 640742208 ns/op	323215032 B/op	  657606 allocs/op
Benchmark_block_Resolver_ResolveTree_Small-10     	    2335	    504526 ns/op	  278999 B/op	    5781 allocs/op
Benchmark_block_Resolver_ResolveTree_Big-10       	       1	1458598750 ns/op	250876448 B/op	 6469298 allocs/op
  • big - is 8MB profile (gzip), with more than 1.5M of call sites.
  • small - a Go heap profile with ~500 call sites.

Please note that the method we're using currently (Benchmark_block_Resolver_ResolveTree) does not support truncation at symbolication, we can only drop insignificant nodes after the complete tree is built.

Note about pprof symbolication w/o truncation: it is 4-5 times faster than tree, but allocates twice more memory (on average).

Benchmark_block_Resolver_ResolveTree_Big-10       	       1	1458598750 ns/op	250876448 B/op	 6469298 
Benchmark_block_Resolver_ResolvePprof_Big/0-10    	       4	 323580094 ns/op	558700138 B/op	  992110 
Benchmark_block_Resolver_ResolvePprof_Big/64K-10  	       2	 640742208 ns/op	323215032 B/op	  657606
Benchmark_block_Resolver_ResolvePprof_Big/256K-10  	       2	 882890292 ns/op	459456996 B/op	 1699440

After certain point, truncation becomes pretty much senseless, however we still need to make the profile lighter for the frontend.


Performance aspect is crucial here, because the symbolication process is synchronous and is not parallelized (which is another issue). In pathological cases this stage can make a noticeable contribution to the duration of the query:

image

@kolesnikovae kolesnikovae changed the title WIP: feat: pprof truncation feat: pprof truncation Nov 27, 2023
@kolesnikovae kolesnikovae marked this pull request as ready for review November 27, 2023 07:25
@kolesnikovae kolesnikovae requested a review from a team as a code owner November 27, 2023 07:25
@@ -13,193 +10,114 @@ type pprofProtoSymbols struct {
profile googlev1.Profile
symbols *Symbols
samples *schemav1.Samples
tree *model.StacktraceTree
lut []uint32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it stands for ? Locations.... ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's yet another lookup table

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolesnikovae kolesnikovae merged commit 000930f into main Nov 27, 2023
19 checks passed
@kolesnikovae kolesnikovae deleted the feat/pprof-truncation branch November 27, 2023 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants