Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement #22

Merged
merged 24 commits into from
Aug 11, 2023
Merged

Enhancement #22

merged 24 commits into from
Aug 11, 2023

Conversation

zdyj3170101136
Copy link
Contributor

@zdyj3170101136 zdyj3170101136 commented Aug 1, 2023

reduce 99% memory allocation and save 55% cpu.

go version:

go version go1.18.9 linux/amd64

benchstat:

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
                       │   old.txt    │               new.txt               │
                       │    sec/op    │   sec/op     vs base                │
Parse/example-8          20.914m ± 1%   8.647m ± 1%  -58.65% (p=0.000 n=10)
Parse/async-profiler-8   118.61m ± 1%   40.10m ± 0%  -66.19% (p=0.000 n=10)
geomean                   49.81m        18.62m       -62.61%

                       │    old.txt    │               new.txt                │
                       │     B/op      │     B/op      vs base                │
Parse/example-8           6.424Mi ± 0%   5.276Mi ± 0%  -17.87% (p=0.000 n=10)
Parse/async-profiler-8   42.484Mi ± 0%   4.380Mi ± 0%  -89.69% (p=0.000 n=10)
geomean                   16.52Mi        4.807Mi       -70.90%

                       │    old.txt    │               new.txt               │
                       │   allocs/op   │  allocs/op   vs base                │
Parse/example-8          169.264k ± 0%   1.193k ± 0%  -99.30% (p=0.000 n=10)
Parse/async-profiler-8   439964.0 ± 0%    938.0 ± 0%  -99.79% (p=0.000 n=10)
geomean                    272.9k        1.058k       -99.61%

@zdyj3170101136
Copy link
Contributor Author

@abeaumont @korniltsev @petethepig

@korniltsev
Copy link
Collaborator

That's quite a lot of changes.
Could you describe what the changes are?

@zdyj3170101136
Copy link
Contributor Author

zdyj3170101136 commented Aug 2, 2023

That's quite a lot of changes. Could you describe what the changes are?

1, the reader use binary.Read to get int and string, let the memory escape to heap.
commit 39df228 fix this.
2, in types.go, even for base type like int, it will malloc a Int type, which pointer to int, and return as interface(), let the memory escape to heap. commit e587646 fix it, add argument reader.Reader to function parseField, change toLong(p Parseable) to toLong(r reader.Reader).
3, batch allocate stackframe and constants, commit 2aa0b33 and 1dcd969 fix it.
4, in building cpool, perallocate each class type. commit 5660ef6 and a0d5347 fix it.
5, perallocate stackframe, it is special because cpool could not malloc it directly from ParseClass but from parsefields. commit a1b27a8 and commit 550121d fix it.
6, let chunk implement chunk.Next() interface(), so we could cache common event_type, commit a526cc1 fix it.
7, the ClassMap store ClassMetaData as value in default, this lead to a lot of duffcopy, now the ClassMap store pointer to ClassMetaData. commit 7151b79 fix it.
8, parseFields use for _, v range over class.fields, lots of copy. fix this in 91d9ca7.
9, there have a lot of type which only have base type like string, int. but also implement Resolve interface{}, change it to return directly. commit d66951d fix this.

each commit is independent, if you think this pr is too large, you could review commit one by one.
and ask me to pull a merge request.

@korniltsev
Copy link
Collaborator

Thanks a lot for an explanation and for the PR
I will look into it soon.

Could you please also add benchmark results to the PR description, if possible using benchstat

@zdyj3170101136
Copy link
Contributor Author

zdyj3170101136 commented Aug 2, 2023

Thanks a lot for an explanation and for the PR I will look into it soon.

Could you please also add benchmark results to the PR description, if possible using benchstat

already add benchmark.

@zdyj3170101136
Copy link
Contributor Author

more detailed benchstat.
each number repersent operation as #22 (comment) said.
each compared to last operation.
seems 8 and 9 not really have enhancement.
operation 6 have small enhancement cause the example jfr file is small, only contains small number of event.
1,

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │   old.txt   │               old.txt1               │
         │   sec/op    │    sec/op     vs base                │
Parse-16   26.00m ± 9%   21.86m ± 14%  -15.91% (p=0.003 n=10)

         │   old.txt    │              old.txt1               │
         │     B/op     │     B/op      vs base               │
Parse-16   6.426Mi ± 0%   6.139Mi ± 0%  -4.46% (p=0.000 n=10)

         │   old.txt   │              old.txt1              │
         │  allocs/op  │  allocs/op   vs base               │
Parse-16   169.3k ± 0%   154.1k ± 0%  -8.94% (p=0.000 n=10)

2,

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │   old.txt1   │               old.txt2               │
         │    sec/op    │    sec/op     vs base                │
Parse-16   21.86m ± 14%   17.51m ± 10%  -19.89% (p=0.000 n=10)

         │   old.txt1   │              old.txt2               │
         │     B/op     │     B/op      vs base               │
Parse-16   6.139Mi ± 0%   5.872Mi ± 0%  -4.35% (p=0.000 n=10)

         │   old.txt1   │              old.txt2               │
         │  allocs/op   │  allocs/op   vs base                │
Parse-16   154.14k ± 0%   93.08k ± 0%  -39.61% (p=0.000 n=10)

3,

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │   old.txt2   │              old.txt3              │
         │    sec/op    │   sec/op     vs base               │
Parse-16   17.51m ± 10%   16.05m ± 9%  -8.33% (p=0.029 n=10)

         │   old.txt2   │               old.txt3               │
         │     B/op     │     B/op      vs base                │
Parse-16   5.872Mi ± 0%   4.537Mi ± 0%  -22.73% (p=0.000 n=10)

         │  old.txt2   │              old.txt3               │
         │  allocs/op  │  allocs/op   vs base                │
Parse-16   93.08k ± 0%   60.35k ± 0%  -35.17% (p=0.000 n=10)

4,

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │  old.txt3   │            old.txt4            │
         │   sec/op    │    sec/op     vs base          │
Parse-16   16.05m ± 9%   15.18m ± 10%  ~ (p=0.190 n=10)

         │   old.txt3   │              old.txt4               │
         │     B/op     │     B/op      vs base               │
Parse-16   4.537Mi ± 0%   4.525Mi ± 0%  -0.27% (p=0.000 n=10)

         │  old.txt3   │              old.txt4              │
         │  allocs/op  │  allocs/op   vs base               │
Parse-16   60.35k ± 0%   56.42k ± 0%  -6.51% (p=0.000 n=10)

5,

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │   old.txt4   │              old.txt5               │
         │    sec/op    │   sec/op     vs base                │
Parse-16   15.18m ± 10%   11.01m ± 4%  -27.46% (p=0.000 n=10)

         │   old.txt4   │               old.txt5               │
         │     B/op     │     B/op      vs base                │
Parse-16   4.525Mi ± 0%   5.487Mi ± 0%  +21.27% (p=0.000 n=10)

         │   old.txt4   │              old.txt5               │
         │  allocs/op   │  allocs/op   vs base                │
Parse-16   56.418k ± 0%   2.684k ± 0%  -95.24% (p=0.000 n=10)

6

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │  old.txt5   │              old.txt6              │
         │   sec/op    │   sec/op     vs base               │
Parse-16   11.01m ± 4%   10.28m ± 6%  -6.64% (p=0.001 n=10)

         │   old.txt5   │              old.txt6               │
         │     B/op     │     B/op      vs base               │
Parse-16   5.487Mi ± 0%   5.282Mi ± 0%  -3.73% (p=0.000 n=10)

         │  old.txt5   │              old.txt6               │
         │  allocs/op  │  allocs/op   vs base                │
Parse-16   2.684k ± 0%   1.236k ± 0%  -53.95% (p=0.000 n=10)

7

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │  old.txt6   │           old.txt7            │
         │   sec/op    │   sec/op     vs base          │
Parse-16   10.28m ± 6%   10.23m ± 4%  ~ (p=0.190 n=10)

         │   old.txt6   │              old.txt7               │
         │     B/op     │     B/op      vs base               │
Parse-16   5.282Mi ± 0%   5.275Mi ± 0%  -0.14% (p=0.000 n=10)

         │  old.txt6   │              old.txt7              │
         │  allocs/op  │  allocs/op   vs base               │
Parse-16   1.236k ± 0%   1.193k ± 0%  -3.48% (p=0.000 n=10)

8, 9

goos: linux
goarch: amd64
pkg: github.com/pyroscope-io/jfr-parser/parser
cpu: Intel Xeon Processor (Icelake)
         │   old.txt7   │           old.txt8            │
         │    sec/op    │   sec/op     vs base          │
Parse-16   10.225m ± 4%   9.987m ± 4%  ~ (p=0.280 n=10)

         │   old.txt7   │            old.txt8            │
         │     B/op     │     B/op      vs base          │
Parse-16   5.275Mi ± 0%   5.275Mi ± 0%  ~ (p=0.579 n=10)

         │  old.txt7   │           old.txt8            │
         │  allocs/op  │  allocs/op   vs base          │
Parse-16   1.193k ± 0%   1.193k ± 0%  ~ (p=0.628 n=10)

Copy link
Collaborator

@korniltsev korniltsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Huge improvements! Thanks a lot, I love it.

I left a bunch of comments but most are nitpicking.
Except the one about getPointerToStackFrames, this is the only thing we need to fix/clarify to merge the PR.

reader/reader.go Show resolved Hide resolved
parser/chunk.go Outdated Show resolved Hide resolved
parser/types.go Outdated Show resolved Hide resolved
parser/types.go Show resolved Hide resolved
parser/types.go Show resolved Hide resolved
parser/checkpoint.go Outdated Show resolved Hide resolved
@korniltsev
Copy link
Collaborator

korniltsev commented Aug 5, 2023

Would you like to add more jfr files for benchmarking/testing?

@zdyj3170101136
Copy link
Contributor Author

Would you like to add more jfr files for benchmarking/testing?

i add a jfr file of async-profiler 2.10 output, a typical usage in my company:

asprof -e cpu -i 10ms --alloc 512k --wall 200ms --lock 10ms -d 60

parser/chunk.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@korniltsev korniltsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's resolve conflicts and address #22 (comment)

@korniltsev
Copy link
Collaborator

Code looks good now.
I will do some more testing on our jfrs in 1-2 days and merge it.

@zdyj3170101136
Copy link
Contributor Author

@korniltsev do not forget it

@korniltsev korniltsev merged commit e11aad6 into grafana:main Aug 11, 2023
1 check passed
@korniltsev
Copy link
Collaborator

Great Job! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants