Improve the per-VU buffer pool #2879

davidpst · 2023-01-27T07:44:57Z

This is an implementation of a shared buffer pool for all the VUs using sync.Pool. This implementation aims to improve memory consumption that has been commented here #794

CLAassistant · 2023-01-27T07:45:03Z

All committers have signed the CLA.

na-- · 2023-01-27T08:48:07Z

The deps CI check probably fails because we no longer use github.com/oxtoacart/bpool, so you need to run something like go mod tidy and go mod vendor and commit the changes

codecov-commenter · 2023-01-27T09:30:40Z

Codecov Report

Merging #2879 (1acc385) into master (367163b) will increase coverage by 0.02%.
The diff coverage is 88.00%.

❗ Current head 1acc385 differs from pull request most recent head ba321dc. Consider uploading reports for the commit ba321dc to get more accurate results

@@            Coverage Diff             @@
##           master    #2879      +/-   ##
==========================================
+ Coverage   76.84%   76.86%   +0.02%     
==========================================
  Files         229      224       -5     
  Lines       16924    16913      -11     
==========================================
- Hits        13005    13000       -5     
+ Misses       3077     3075       -2     
+ Partials      842      838       -4

Flag	Coverage Δ
ubuntu	`76.86% <88.00%> (+0.10%)`	⬆️
windows	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
lib/vu_state.go	`84.61% <ø> (ø)`
lib/buffer_pool.go	`81.81% <81.81%> (ø)`
cmd/tests/test_state.go	`87.75% <88.88%> (-4.09%)`	⬇️
js/runner.go	`84.92% <100.00%> (+0.03%)`	⬆️
lib/netext/httpext/compression.go	`84.21% <100.00%> (-0.17%)`	⬇️
loader/filesystems.go	`60.00% <0.00%> (-40.00%)`	⬇️
js/common/initenv.go	`66.66% <0.00%> (-33.34%)`	⬇️
lib/netext/httpext/error_codes.go	`94.52% <0.00%> (-2.74%)`	⬇️
loader/loader.go	`82.87% <0.00%> (-0.69%)`	⬇️
... and 12 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

davidpst · 2023-01-27T09:35:54Z

You are right, done!

…t/k6 into feature/buffer-sync-pool

na-- · 2023-02-23T08:16:22Z

js/runner.go

-	setupData []byte
+	console    *console
+	setupData  []byte
+	BufferPool *lib.BufferPool


Why did you add a BufferPool to the js.Runner?

I added it to reuse same pool for all VUs. I suppose ´js.Runner´ instance is unique for a load test execution.

Yes, but is that actually better for performance than a separate BufferPool per VU? In the original simplistic checks I did for #794, a global sync.Pool actually performed a lot worse than a per-VU sync.Pool.

Maybe that has changed in recent Go and k6 versions, but we'd need to see a new benchmark comparison between both to justify the decision to go with a global pool. And, as a small nitpick, even if the the global pool is performing better now, it should still probably be unexported.

I ran the test (BenchmarkReadResponseBody) with both configurations and to have a global pool improves memory and time execution. First execution (with global pool) allocate less memory and can iterate more times.

I think the result make sense because we reuse more the pool objects, so we don't have to allocate memory neither spend time creating new objects.

`
Global pool for all VUs

BenchmarkReadResponseBody-10 1 30455673750 ns/op 138908381552 B/op 69924811 allocs/op

checks.........................: 100.00% ✓ 222192 ✗ 0 data_received..................: 15 GB 507 MB/s data_sent......................: 21 MB 689 kB/s http_req_blocked...............: avg=16.18µs min=0s med=1µs max=75.67ms p(90)=2µs p(95)=3µs http_req_connecting............: avg=5.29µs min=0s med=0s max=31.02ms p(90)=0s p(95)=0s http_req_duration..............: avg=2.31ms min=74µs med=1.14ms max=93.67ms p(90)=5.53ms p(95)=8.32ms { expected_response:true }...: avg=2.31ms min=74µs med=1.14ms max=93.67ms p(90)=5.53ms p(95)=8.32ms http_req_failed................: 0.00% ✓ 0 ✗ 222192 http_req_receiving.............: avg=420.75µs min=9µs med=34µs max=56.95ms p(90)=938µs p(95)=1.88ms http_req_sending...............: avg=27.8µs min=1µs med=3µs max=69.87ms p(90)=15µs p(95)=43µs http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...............: avg=1.86ms min=60µs med=860µs max=93.54ms p(90)=4.62ms p(95)=6.87ms http_reqs......................: 222192 7378.159847/s iteration_duration.............: avg=108.23ms min=101.33ms med=105.72ms max=204.63ms p(90)=117.13ms p(95)=121.4ms iterations.....................: 27774 922.269981/s vus............................: 100 min=100 max=100 vus_max........................: 100 min=100 max=100

Individual pool per VUs

BenchmarkReadResponseBody-10 1 30450333375 ns/op 174401310944 B/op 73585506 allocs/op

checks.........................: 100.00% ✓ 217008 ✗ 0 data_received..................: 15 GB 495 MB/s data_sent......................: 20 MB 673 kB/s http_req_blocked...............: avg=27µs min=0s med=1µs max=44.97ms p(90)=2µs p(95)=3µs http_req_connecting............: avg=14.71µs min=0s med=0s max=42.62ms p(90)=0s p(95)=0s http_req_duration..............: avg=2.99ms min=75µs med=1.36ms max=106.88ms p(90)=7.15ms p(95)=10.76ms { expected_response:true }...: avg=2.99ms min=75µs med=1.36ms max=106.88ms p(90)=7.15ms p(95)=10.76ms http_req_failed................: 0.00% ✓ 0 ✗ 217008 http_req_receiving.............: avg=590.9µs min=10µs med=59µs max=71.81ms p(90)=1.14ms p(95)=2.56ms http_req_sending...............: avg=34.37µs min=1µs med=3µs max=64.95ms p(90)=15µs p(95)=45µs http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_waiting...............: avg=2.37ms min=61µs med=1.02ms max=92.94ms p(90)=5.8ms p(95)=8.92ms http_reqs......................: 217008 7209.308715/s iteration_duration.............: avg=110.64ms min=101.37ms med=107.07ms max=227.74ms p(90)=121.7ms p(95)=129.85ms iterations.....................: 27126 901.163589/s vus............................: 100 min=100 max=100 vus_max........................: 100 min=100 max=100

`

olegbespalov

Changes look good to me, thanks 👍

I am postponing the final decision about where the buffer sync pool should be to Ned since I feel that he has a better context.

na-- · 2023-03-06T13:46:56Z

OK, so I finally had time to re-run my original benchmarks from #794 (comment), with these 4 configurations:

[results] the current k6 master, as of 0b1fe61
[results] this PR with the same master branch from 1. merged into it
[results] the code from 2. but with this patch to make the sync.Pool values per-VU
[results] no buffer pools (i.e. just buf := &bytes.Buffer{} in readResponseBody() 😱 ), for a control

Some details about how I ran all of the tests:

this is the test script I used
I ran each test 3 times with /usr/bin/time, so we have both the external process CPU and memory usage metrics and the k6 end-of-test summary
all of the tests were running against the same external server process running this simple code
both k6 and the server were compiled with Go 1.20.1
I've tried to include anything relevant for reproducing my results in this gist: https://gist.github.com/na--/3f2b663bedaab47b2fd4b2fbbb76c7e8

As you can see, the test has a few significant differences with the one you added to the benchmarks. It is much, much larger in terms of VUs and iterations, but it also expresses the work in terms of test iterations, not in terms of test duration. So, assuming the server is roughly constant, a more performant k6 implementation will be done with the script sooner. And having the server outside of the measured k6 process helps us to (mostly) isolate the benchmark test to just the k6 performance.

So, it seems like the local per-VU sync.Pool variant (the 3rd test case) performs the worst in all categories except the "no buffer" control (4th test case) - most memory and CPU usage for the least performance. This is somewhat surprising to me, but it is consistent with both your benchmarks and my original ones from a few years ago that I put in the issue.

However, things are more complicated when we compare 1. and 2 😕 A global sync.Pool indeed has a significantly lower memory usage! 🎉 However, it also has a noticeably higher CPU usage and lower throughput/iteration rate 😞 To be fair, this benchmark is not a very realistic real-life k6 use case. It just shows that the changes in this PR are not only positive, they also have some performance costs... 😞

Personally, I would be fine with merging the PR as it is, since having ~2.5x memory reduction is probably worth the ~10% CPU increase/throughput decrease (depending on what we measure and compare). We can just leave #794 open and investigate some of the other potential improvements from this comment (or others) at some later point? Alternatively, we can try to see if some further simple improvement won't push things enough so there are no downsides to merging the PR?

@grafana/k6-core, what do you think? @imiric, you recently did grafana/k6-docs#992 and we now have https://github.com/grafana/k6-benchmarks, maybe it's worth it to run the benchmarks with some of the same variants I tested here? FWIW, I probably won't be able to dedicate time again for a more thorough investigation here or more benchmarks for at least the next 2 weeks 😞

davidpst · 2023-03-08T09:15:51Z

Personally, I would be fine with merging the PR as it is, since having ~2.5x memory reduction is probably worth the ~10% CPU increase/throughput decrease (depending on what we measure and compare). We can just leave #794 open and investigate some of the other potential improvements from #794 (comment) (or others) at some later point? Alternatively, we can try to see if some further simple improvement won't push things enough so there are no downsides to merging the PR?

Yes, now GC has to run to remove those buffers so it consume some cpu when run, you can see it using pprof. In any case the memory optimization is bigger than the cpu consumption.

As you can see, the test has a few significant differences with the one you added to the benchmarks. It is much, much larger in terms of VUs and iterations, but it also expresses the work in terms of test iterations, not in terms of test duration. So, assuming the server is roughly constant, a more performant k6 implementation will be done with the script sooner. And having the server outside of the measured k6 process helps us to (mostly) isolate the benchmark test to just the k6 performance.

Sure, I committed the test with a small number of VUs and Iterations to avoid largest builds, but I ran it with a bigger number of course and the result was the same, memory optimization and the increase of GC related with this kind of pool.

imiric

Nice work @davidpst! 👏 I just have one minor comment.

I ran the script from k6-benchmarks on master (367163b) and this branch (b4a0314) on my local machine, and noticed the same behavior you guys did. This branch has much lower memory usage, at the slight cost of higher CPU usage and lower RPS.

See the results here. The graphs are a bit wonky, since the RAM line is squashed due to the higher order of RPS, but the effect is still visible. Otherwise, GitHub renders CSVs nicely in tables. :)

Question: would it be a good idea to make the new BufferPool the default implementation, but give the choice to the user to fallback to the previous behavior (via an env var), if they decide they'd rather not sacrifice CPU/RPS? I know we'd like to abandon the bpool dependency, but this approach might be safer in some circumstances.

This still requires more thorough testing, like @na-- says, but I don't think these additional benchmarks should block this PR from merging. We've seen that the memory improvement is substantial, and we can continue testing this on master.

lib/buffer_pool.go

imiric

@davidpst After an internal discussion with the team, we decided it was best to leave this PR open for a while, so that we can run a few more tests with it.

Namely, we'd like to test the performance with https://github.com/valyala/bytebufferpool, and see if we can reduce the CPU/RPS impact.

This will happen sometime in the upcoming weeks, but I can't say exactly when. Sorry about the delay!

imiric · 2023-03-09T15:19:43Z

Here are the benchmark results when using https://github.com/valyala/bytebufferpool.

It's a negligible improvement in memory usage, practically within the margin of error, but has an even bigger CPU and RPS penalty than plain sync.Pool.

I also tried https://github.com/libp2p/go-buffer-pool, which does something similar to what is mentioned in this comment. It manages a pool of sync.Pools, to separate buffers into buckets of similar size. The issue is that it requires knowing the buffer size upfront, which doesn't work reliably for us, since we have to depend on the Content-Length header, which may be wrong or missing. See the benchmark results.

It's slightly better than bytebufferpool and a single sync.Pool, but I would also chalk it up to being within the margin of error.

So I don't see any conclusive improvements with these minor changes. Maybe we'll need a custom buffer implementation, like @na-- suggested, but that will have to wait for another time, or someone else to pick it up.

na--

Let's then 🤷‍♂️ we can keep the issue open, for now, since we can probably claw back some of that CPU inefficiency with a bit of effort

imiric

Sorry, forgot to re-approve this.

Thanks for your work @davidpst! 🙇

We detected an incompatibility with k6-core: (https://github.com/grafana/xk6-browser/actions/runs/4434049483/attempts/2) - BPool becomes BufferPool in grafana/k6#2879. This PR fixes the issue. Co-authored-by: ka3de <daniel.jimenez@grafana.com>

davidpst and others added 7 commits January 25, 2023 13:26

Remove buffer pool from http request

a5d5aa9

Add sync.Pool for http resposne

79f571b

Add sync.Pool for http resposne

d8c034d

Add sync.Pool for http resposne

3f32512

Add sync.Pool for http resposne

2905acd

Merge branch 'grafana:master' into feature/buffer-sync-pool

d112673

Fix type assertion

65dd953

github-actions bot requested review from na-- and olegbespalov January 27, 2023 07:45

davidpst mentioned this pull request Jan 27, 2023

Remove/improve the per-VU buffer pool? #794

Open

Remove bpool dependency

f77c345

na-- added this to the v0.44.0 milestone Jan 30, 2023

davidpst and others added 13 commits January 31, 2023 15:28

Merge branch 'grafana:master' into feature/buffer-sync-pool

4a11003

Add benchmark test for memory consumption

e414674

Merge branch 'feature/buffer-sync-pool' of https://github.com/davidps…

e14817a

…t/k6 into feature/buffer-sync-pool

Add benchmark test for memory consumption

f64451c

Add benchmark test for memory consumption

d279323

Add benchmark test for memory consumption

c261006

Add benchmark test for memory consumption

25b5845

Add benchmark test for memory consumption

163ceb5

Merge branch 'grafana:master' into feature/buffer-sync-pool

4619b29

Merge branch 'master' into feature/buffer-sync-pool

63d7ac0

Merge branch 'grafana:master' into feature/buffer-sync-pool

bfed7df

Merge branch 'grafana:master' into feature/buffer-sync-pool

0a068ac

Merge branch 'grafana:master' into feature/buffer-sync-pool

ec24f9a

na-- reviewed Feb 23, 2023

View reviewed changes

davidpst and others added 2 commits February 27, 2023 18:51

Merge branch 'master' into feature/buffer-sync-pool

7822bd6

Fix go.sum

b4a0314

olegbespalov previously approved these changes Mar 1, 2023

View reviewed changes

na-- requested a review from imiric March 6, 2023 13:47

imiric reviewed Mar 8, 2023

View reviewed changes

lib/buffer_pool.go Outdated Show resolved Hide resolved

imiric reviewed Mar 8, 2023

View reviewed changes

Ignore lint type assertion

a9d1f2b

davidpst dismissed olegbespalov’s stale review via a9d1f2b March 8, 2023 15:49

Merge branch 'grafana:master' into feature/buffer-sync-pool

ba321dc

na-- approved these changes Mar 9, 2023

View reviewed changes

imiric approved these changes Mar 15, 2023

View reviewed changes

imiric merged commit 12f5dd8 into grafana:master Mar 15, 2023

inancgumus mentioned this pull request Mar 16, 2023

Update to k6-core master and fix BPool grafana/xk6-browser#830

Merged

davidpst deleted the feature/buffer-sync-pool branch March 17, 2023 13:50

imiric mentioned this pull request Apr 21, 2023

v0.44.0 release notes #3024

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the per-VU buffer pool #2879

Improve the per-VU buffer pool #2879

davidpst commented Jan 27, 2023

CLAassistant commented Jan 27, 2023 •

edited

na-- commented Jan 27, 2023

codecov-commenter commented Jan 27, 2023 •

edited

davidpst commented Jan 27, 2023

na-- Feb 23, 2023

davidpst Feb 28, 2023

na-- Feb 28, 2023

davidpst Feb 28, 2023 •

edited

olegbespalov left a comment

na-- commented Mar 6, 2023 •

edited

davidpst commented Mar 8, 2023

imiric left a comment

imiric left a comment

imiric commented Mar 9, 2023

na-- left a comment

imiric left a comment

Improve the per-VU buffer pool #2879

Improve the per-VU buffer pool #2879

Conversation

davidpst commented Jan 27, 2023

CLAassistant commented Jan 27, 2023 • edited

na-- commented Jan 27, 2023

codecov-commenter commented Jan 27, 2023 • edited

Codecov Report

davidpst commented Jan 27, 2023

na-- Feb 23, 2023

Choose a reason for hiding this comment

davidpst Feb 28, 2023

Choose a reason for hiding this comment

na-- Feb 28, 2023

Choose a reason for hiding this comment

davidpst Feb 28, 2023 • edited

Choose a reason for hiding this comment

olegbespalov left a comment

Choose a reason for hiding this comment

na-- commented Mar 6, 2023 • edited

davidpst commented Mar 8, 2023

imiric left a comment

Choose a reason for hiding this comment

imiric left a comment

Choose a reason for hiding this comment

imiric commented Mar 9, 2023

na-- left a comment

Choose a reason for hiding this comment

imiric left a comment

Choose a reason for hiding this comment

CLAassistant commented Jan 27, 2023 •

edited

codecov-commenter commented Jan 27, 2023 •

edited

davidpst Feb 28, 2023 •

edited

na-- commented Mar 6, 2023 •

edited