Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve memory allocation during parsing #25

Merged
merged 17 commits into from
Aug 31, 2023
Merged

improve memory allocation during parsing #25

merged 17 commits into from
Aug 31, 2023

Conversation

korniltsev
Copy link
Collaborator

@korniltsev korniltsev commented Aug 31, 2023

This PR is a complete rewrite of the parser to improve memory allocations.

  • The new implementation does not do any constant pool resolving, with new API, the user of the parse is responsible for resolving constant pool references to the actual bits.
  • The constant pool objects and events are now parsed with generated code (most of the files in types package, except idmap.go).
  • resolving methods is done without hashing for single chunk JFR file(there is a lazy fallback for hash resolving when there are multiple chunks). (our sdks dont do chunk splitting)
goos: linux
goarch: amd64
pkg: github.com/grafana/jfr-parser/parser
cpu: AMD Ryzen 9 5950X 16-Core Processor            
                                                     │    old.txt    │              new.txt               │
                                                     │    sec/op     │   sec/op     vs base               │
Parse/example-32                                       10.208m ±  5%   1.721m ± 6%  -83.15% (p=0.002 n=6)
Parse/async-profiler-32                                 30.29m ±  7%   15.15m ± 2%  -50.01% (p=0.002 n=6)
Parse/cortex-dev-01__kafka-0__cpu_lock0_alloc0__0-32   20.970m ± 13%   5.751m ± 3%  -72.57% (p=0.002 n=6)
Parse/goland-32                                         617.5m ±  3%   113.8m ± 1%  -81.57% (p=0.002 n=6)
Parse/goland-multichunk-32                             327.09m ±  6%   73.19m ± 5%  -77.62% (p=0.002 n=6)
geomean                                                 66.60m         16.57m       -75.12%

                                                     │    old.txt    │               new.txt               │
                                                     │     B/op      │     B/op      vs base               │
Parse/example-32                                       5437.5Ki ± 0%   317.1Ki ± 0%  -94.17% (p=0.002 n=6)
Parse/async-profiler-32                                3903.2Ki ± 0%   185.1Ki ± 0%  -95.26% (p=0.002 n=6)
Parse/cortex-dev-01__kafka-0__cpu_lock0_alloc0__0-32   9076.8Ki ± 0%   623.3Ki ± 0%  -93.13% (p=0.002 n=6)
Parse/goland-32                                        443.02Mi ± 0%   14.91Mi ± 0%  -96.63% (p=0.002 n=6)
Parse/goland-multichunk-32                             242.20Mi ± 0%   12.38Mi ± 0%  -94.89% (p=0.002 n=6)
geomean                                                 28.63Mi        1.445Mi       -94.96%

                                                     │   old.txt   │              new.txt               │
                                                     │  allocs/op  │  allocs/op   vs base               │
Parse/example-32                                       1.180k ± 0%   1.324k ± 0%  +12.20% (p=0.002 n=6)
Parse/async-profiler-32                                 927.0 ± 0%    725.0 ± 0%  -21.79% (p=0.002 n=6)
Parse/cortex-dev-01__kafka-0__cpu_lock0_alloc0__0-32   4.886k ± 0%   2.450k ± 0%  -49.86% (p=0.002 n=6)
Parse/goland-32                                        36.04k ± 0%   37.73k ± 0%   +4.68% (p=0.002 n=6)
Parse/goland-multichunk-32                             21.20k ± 0%   20.20k ± 0%   -4.73% (p=0.002 n=6)
geomean                                                5.275k        4.474k       -15.19%

Future improvements:

  • batch allocate stackframes to reduce number of allocations
  • some of the fields were skipped to save memory(for example line numbers) as they are currently not used. we may want to improve and skip them with a runtime option
  • readin metadata currently allocates a lot of objects with temporary maps, we could reduce it, by using slices and reusing slices
  • when field is skipped in the generated code we can avoid computng the result skipped value with shifts and ors.

@korniltsev korniltsev changed the title WIP v2 improve memory allocation during parsing Aug 31, 2023
@korniltsev korniltsev marked this pull request as ready for review August 31, 2023 08:00
@cyriltovena
Copy link

I think we should move this code into pyroscope. Specially if in the future we only want to stream and output pprof. This will make it easier to avoid allocation

some of the fields were skipped to save memory(for example line numbers) as they are currently not used. we may want to improve and skip them with a runtime option

We should pass the line number in pprof if we can, this is supported by the database. Does it also have filename ?

Copy link

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM may be we should have a header for generated files and a makefile.

@korniltsev korniltsev merged commit 08fa3a9 into main Aug 31, 2023
1 check passed
@zdyj3170101136
Copy link
Contributor

very big merge request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants