Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #14068 to 7.4: [Filebeat] Reduce memory usage of multiline #14073

Merged
merged 1 commit into from Oct 15, 2019

Conversation

adriansr
Copy link
Contributor

Cherry-pick of PR #14068 to 7.4 branch. Original message:

The use of time.After when multiline is enabled and lines don't match the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally by the Go runtime when time.After(duration) is used. According to the docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on the input file don't match the pattern.

Reproduced with this config:

multiline.pattern: '^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.d{3}'
multiline.negate: true
multiline.match: after
multiline.max_lines: 50
max_bytes: 1048576

and a random input file generated by:

cat <<EOF > genrandom.py
import base64
import os
import sys

LINE_SIZE = 160
RAW_SIZE = int(LINE_SIZE * 3 / 4)

if len(sys.argv) != 2:
    print 'Usage:', sys.argv[0], '<SIZE IN MEGABYTES>'
    sys.exit(1)

limit = int(sys.argv[1]) * 1024 * 1024
size = 0
f = open('/dev/urandom', 'rb')

while size < limit:
    os.write(2, '> written {0:.2f} MiB\r'.format(size / (1024.0 * 1024.0)))
    raw = f.read(RAW_SIZE * 100)
    while len(raw) >= RAW_SIZE:
        part, raw = raw[:RAW_SIZE], raw[RAW_SIZE:]
        line = base64.b64encode(part)
        print line
        size += len(line)
os.write(2, 'Done.\n')
EOF
python genrandom.py 1024 > 1gb.log

The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.

(cherry picked from commit ce651e0)
@adriansr adriansr merged commit 12ee6cd into elastic:7.4 Oct 15, 2019
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…14073)

The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.

(cherry picked from commit 572ee79)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants