LineReader: Reuse temporary buffer to reduce per-line allocation #27782

bmoylan · 2021-09-07T16:17:10Z

What does this PR do?

Previously, the LineReader would allocate a []byte of size config.BufferSize before decoding each line. The underlying array's size allocation is fixed, so outBuffer.Append retains all of it even when the appended bytes are much shorter.

With this change, we store a single tempBuffer []byte which is reused across lines anywhere we need temporary storage. Converting to outBuffer.Write forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes.

Why is it important?

In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths:

goos: darwin
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/reader/readfile
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz

name                                 old time/op          new time/op          delta
EncoderReader/buffer-sized_lines-16           125µs ± 3%            94µs ± 9%  -24.55%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                 52.6µs ± 4%          36.3µs ±10%  -30.88%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  1.82ms ± 2%          1.70ms ±10%     ~     (p=0.151 n=5+5)
EncoderReader/skip_lines-16                   133µs ± 3%           140µs ± 8%     ~     (p=0.151 n=5+5)

name                                 old alloc/op         new alloc/op         delta
EncoderReader/buffer-sized_lines-16           442kB ± 0%           239kB ± 0%  -46.07%  (p=0.000 n=4+5)
EncoderReader/short_lines-16                  118kB ± 0%            15kB ± 0%  -87.27%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                  8.73MB ± 0%          7.63MB ± 0%  -12.62%  (p=0.000 n=4+5)
EncoderReader/skip_lines-16                   270kB ± 0%           220kB ± 0%  -18.58%  (p=0.008 n=5+5)

name                                 old allocs/op        new allocs/op        delta
EncoderReader/buffer-sized_lines-16             718 ± 0%             519 ± 0%  -27.72%  (p=0.008 n=5+5)
EncoderReader/short_lines-16                    522 ± 0%             421 ± 0%  -19.35%  (p=0.008 n=5+5)
EncoderReader/long_lines-16                   2.65k ± 0%           1.58k ± 0%  -40.54%  (p=0.008 n=5+5)
EncoderReader/skip_lines-16                     420 ± 0%             419 ± 0%   -0.24%  (p=0.008 n=5+5)

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Related to Memory usage about LineReader #20497 although does not directly address the same problem.

bmoylan · 2021-09-07T16:18:22Z

libbeat/reader/readfile/bench_test.go

@@ -0,0 +1,66 @@
+package readfile


Not sure if you all have preferences on where/how to include benchmarks. Happy to move/remove this as desired.

It's fine where you put it.

elasticmachine · 2021-09-07T16:24:00Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2021-09-08T15:18:34.144+0000
Duration: 152 min 44 sec
Commit: bd12458

Test stats 🧪

Test	Results
Failed	0
Passed	53918
Skipped	5324
Total	59242

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	53918
Skipped	5324
Total	59242

kvch · 2021-09-07T16:47:33Z

jenkins run tests

bmoylan · 2021-09-07T18:05:16Z

Hey @kvch thanks for taking a look! I think I fixed the lint error (missing license header) if we could test again 😄

adriansr · 2021-09-07T19:51:39Z

/test

elasticmachine · 2021-09-08T07:02:43Z

Pinging @elastic/agent (Team:Agent)

CHANGELOG.next.asciidoc

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>

kvch · 2021-09-09T06:51:43Z

Thank you!

) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ``` (cherry picked from commit 0e3788b)

* master: [Auditbeat] scanner honor include_files (elastic#27722) chore(ci): remove not used param when triggering e2e tests (elastic#27823) LineReader: Reuse temporary buffer to reduce per-line allocation (elastic#27782)

) (#27824) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ``` (cherry picked from commit 0e3788b) Co-authored-by: Brad Moylan <moylan.brad@gmail.com>

…stic#27782) ## What does this PR do? Previously, the `LineReader` would allocate a []byte of size `config.BufferSize` before decoding each line. The underlying array's size allocation is fixed, so `outBuffer.Append` retains all of it even when the appended bytes are much shorter. With this change, we store a single `tempBuffer []byte` which is reused across lines anywhere we need temporary storage. Converting to `outBuffer.Write` forces the buffer to copy data out of tempBuffer, but is able to only allocate space for the written bytes. ## Why is it important? In our production environment, we run beats with k8s-enforced memory limits and are trying to resolve OOMs. The LineReader code path contributes a significant amount of memory allocation. The benchmarks added in bench_test.go show this reduces the memory profile with various line lengths: ``` goos: darwin goarch: amd64 pkg: github.com/elastic/beats/v7/libbeat/reader/readfile cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz name old time/op new time/op delta EncoderReader/buffer-sized_lines-16 125µs ± 3% 94µs ± 9% -24.55% (p=0.008 n=5+5) EncoderReader/short_lines-16 52.6µs ± 4% 36.3µs ±10% -30.88% (p=0.008 n=5+5) EncoderReader/long_lines-16 1.82ms ± 2% 1.70ms ±10% ~ (p=0.151 n=5+5) EncoderReader/skip_lines-16 133µs ± 3% 140µs ± 8% ~ (p=0.151 n=5+5) name old alloc/op new alloc/op delta EncoderReader/buffer-sized_lines-16 442kB ± 0% 239kB ± 0% -46.07% (p=0.000 n=4+5) EncoderReader/short_lines-16 118kB ± 0% 15kB ± 0% -87.27% (p=0.008 n=5+5) EncoderReader/long_lines-16 8.73MB ± 0% 7.63MB ± 0% -12.62% (p=0.000 n=4+5) EncoderReader/skip_lines-16 270kB ± 0% 220kB ± 0% -18.58% (p=0.008 n=5+5) name old allocs/op new allocs/op delta EncoderReader/buffer-sized_lines-16 718 ± 0% 519 ± 0% -27.72% (p=0.008 n=5+5) EncoderReader/short_lines-16 522 ± 0% 421 ± 0% -19.35% (p=0.008 n=5+5) EncoderReader/long_lines-16 2.65k ± 0% 1.58k ± 0% -40.54% (p=0.008 n=5+5) EncoderReader/skip_lines-16 420 ± 0% 419 ± 0% -0.24% (p=0.008 n=5+5) ```

LineReader: Reuse temporary buffer to reduce per-line allocation

d2a2293

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 7, 2021

bmoylan commented Sep 7, 2021

View reviewed changes

Update CHANGELOG.next.asciidoc

5475da8

kvch self-requested a review September 7, 2021 16:47

add license to bench_test.go

9bff702

ChrsMark added the Team:Elastic-Agent Label for the Agent team label Sep 8, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 8, 2021

kvch reviewed Sep 8, 2021

View reviewed changes

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved

Update CHANGELOG.next.asciidoc

bd12458

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>

kvch approved these changes Sep 8, 2021

View reviewed changes

kvch added the backport-v7.16.0 Automated backport with mergify label Sep 8, 2021

kvch merged commit 0e3788b into elastic:master Sep 9, 2021

mergify bot mentioned this pull request Sep 9, 2021

[7.x](backport #27782) LineReader: Reuse temporary buffer to reduce per-line allocation #27824

Merged

bmoylan deleted the bm/linereader-reuse-buffer branch September 9, 2021 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LineReader: Reuse temporary buffer to reduce per-line allocation #27782

LineReader: Reuse temporary buffer to reduce per-line allocation #27782

bmoylan commented Sep 7, 2021 •

edited

Loading

bmoylan Sep 7, 2021

kvch Sep 8, 2021

elasticmachine commented Sep 7, 2021 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

Trends 🧪

Test stats 🧪

kvch commented Sep 7, 2021

bmoylan commented Sep 7, 2021

adriansr commented Sep 7, 2021

elasticmachine commented Sep 8, 2021

kvch commented Sep 9, 2021

LineReader: Reuse temporary buffer to reduce per-line allocation #27782

LineReader: Reuse temporary buffer to reduce per-line allocation #27782

Conversation

bmoylan commented Sep 7, 2021 • edited Loading

What does this PR do?

Why is it important?

Checklist

Related issues

bmoylan Sep 7, 2021

Choose a reason for hiding this comment

kvch Sep 8, 2021

Choose a reason for hiding this comment

elasticmachine commented Sep 7, 2021 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

kvch commented Sep 7, 2021

bmoylan commented Sep 7, 2021

adriansr commented Sep 7, 2021

elasticmachine commented Sep 8, 2021

kvch commented Sep 9, 2021

bmoylan commented Sep 7, 2021 •

edited

Loading

elasticmachine commented Sep 7, 2021 •

edited by jenkins-beats-ci bot

Loading