improve memory allocation during parsing #25

korniltsev · 2023-08-31T02:23:55Z

This PR is a complete rewrite of the parser to improve memory allocations.

The new implementation does not do any constant pool resolving, with new API, the user of the parse is responsible for resolving constant pool references to the actual bits.
The constant pool objects and events are now parsed with generated code (most of the files in types package, except idmap.go).
resolving methods is done without hashing for single chunk JFR file(there is a lazy fallback for hash resolving when there are multiple chunks). (our sdks dont do chunk splitting)

goos: linux
goarch: amd64
pkg: github.com/grafana/jfr-parser/parser
cpu: AMD Ryzen 9 5950X 16-Core Processor            
                                                     │    old.txt    │              new.txt               │
                                                     │    sec/op     │   sec/op     vs base               │
Parse/example-32                                       10.208m ±  5%   1.721m ± 6%  -83.15% (p=0.002 n=6)
Parse/async-profiler-32                                 30.29m ±  7%   15.15m ± 2%  -50.01% (p=0.002 n=6)
Parse/cortex-dev-01__kafka-0__cpu_lock0_alloc0__0-32   20.970m ± 13%   5.751m ± 3%  -72.57% (p=0.002 n=6)
Parse/goland-32                                         617.5m ±  3%   113.8m ± 1%  -81.57% (p=0.002 n=6)
Parse/goland-multichunk-32                             327.09m ±  6%   73.19m ± 5%  -77.62% (p=0.002 n=6)
geomean                                                 66.60m         16.57m       -75.12%

                                                     │    old.txt    │               new.txt               │
                                                     │     B/op      │     B/op      vs base               │
Parse/example-32                                       5437.5Ki ± 0%   317.1Ki ± 0%  -94.17% (p=0.002 n=6)
Parse/async-profiler-32                                3903.2Ki ± 0%   185.1Ki ± 0%  -95.26% (p=0.002 n=6)
Parse/cortex-dev-01__kafka-0__cpu_lock0_alloc0__0-32   9076.8Ki ± 0%   623.3Ki ± 0%  -93.13% (p=0.002 n=6)
Parse/goland-32                                        443.02Mi ± 0%   14.91Mi ± 0%  -96.63% (p=0.002 n=6)
Parse/goland-multichunk-32                             242.20Mi ± 0%   12.38Mi ± 0%  -94.89% (p=0.002 n=6)
geomean                                                 28.63Mi        1.445Mi       -94.96%

                                                     │   old.txt   │              new.txt               │
                                                     │  allocs/op  │  allocs/op   vs base               │
Parse/example-32                                       1.180k ± 0%   1.324k ± 0%  +12.20% (p=0.002 n=6)
Parse/async-profiler-32                                 927.0 ± 0%    725.0 ± 0%  -21.79% (p=0.002 n=6)
Parse/cortex-dev-01__kafka-0__cpu_lock0_alloc0__0-32   4.886k ± 0%   2.450k ± 0%  -49.86% (p=0.002 n=6)
Parse/goland-32                                        36.04k ± 0%   37.73k ± 0%   +4.68% (p=0.002 n=6)
Parse/goland-multichunk-32                             21.20k ± 0%   20.20k ± 0%   -4.73% (p=0.002 n=6)
geomean                                                5.275k        4.474k       -15.19%

Future improvements:

batch allocate stackframes to reduce number of allocations
some of the fields were skipped to save memory(for example line numbers) as they are currently not used. we may want to improve and skip them with a runtime option
readin metadata currently allocates a lot of objects with temporary maps, we could reduce it, by using slices and reusing slices
when field is skipped in the generated code we can avoid computng the result skipped value with shifts and ors.

cyriltovena · 2023-08-31T12:21:15Z

I think we should move this code into pyroscope. Specially if in the future we only want to stream and output pprof. This will make it easier to avoid allocation

some of the fields were skipped to save memory(for example line numbers) as they are currently not used. we may want to improve and skip them with a runtime option

We should pass the line number in pprof if we can, this is supported by the database. Does it also have filename ?

cyriltovena

LGTM may be we should have a header for generated files and a makefile.

zdyj3170101136 · 2023-10-27T08:23:20Z

very big merge request

korniltsev added 16 commits August 29, 2023 13:20

add more testdata, make it independent of the parser

6d653a4

add v2

16a4747

regen cpool execpt stacktrace

9ee39c7

fix

ad2f636

fix test

9e0b311

fix

8c7adea

cleanup

25a2643

rename package

00e6058

fix symbol processor

3f0ee07

add Threadstate

c9e5530

skip some fields

5f15745

fix varLong

513bf6e

make bench a little bit more honest

46e242f

todo

bf3e1eb

cleanup api a bit

978abd5

cleanup api a bit

6e62bc9

korniltsev mentioned this pull request Aug 31, 2023

fix: update jfr parser grafana/pyroscope#2340

Merged

korniltsev changed the title ~~WIP v2~~ improve memory allocation during parsing Aug 31, 2023

korniltsev marked this pull request as ready for review August 31, 2023 08:00

cyriltovena approved these changes Aug 31, 2023

View reviewed changes

add header and makefile

c881e11

korniltsev merged commit 08fa3a9 into main Aug 31, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve memory allocation during parsing #25

improve memory allocation during parsing #25

korniltsev commented Aug 31, 2023 •

edited

cyriltovena commented Aug 31, 2023

cyriltovena left a comment

zdyj3170101136 commented Oct 27, 2023

improve memory allocation during parsing #25

improve memory allocation during parsing #25

Conversation

korniltsev commented Aug 31, 2023 • edited

cyriltovena commented Aug 31, 2023

cyriltovena left a comment

Choose a reason for hiding this comment

zdyj3170101136 commented Oct 27, 2023

korniltsev commented Aug 31, 2023 •

edited