Skip to content

Fluentd retains excessive amounts of memory after handling traffic peaks #1657

Closed
@joell

Description

@joell

When setting up a simple, 2-part Fluentd (TCP -> forwarder) -> (forwardee -> disk configuration) and giving it 5 million JSON objects to process all at once, resident-set memory consumption jumps from an inital 30MB to between 200-450MB, and does not come back down after computation is complete. This is observed using version 2.3.5-1.el7 of the TD Agent RPM package running on CentOS 7. (The version of Fluentd in that package is 0.12.36.)

Steps to reproduce:

# define and start a simple TCP forwarder
$ cat > test-in.conf <<'EOF'
<source>
  @type  tcp
  tag    testing
  format json
  port   10130
</source>

<match **>
  @type forward
  require_ack_response true
  flush_interval 10s

  <server>
    host 127.0.0.1
    port 10131
  </server>
</match>
EOF
$ td-agent --no-supervisor -c test-in.conf &

# define and start a simple forwadee that logs to disk
$ cat > test-out.conf <<'EOF'
<source>
   @type forward
   port  10131
</source>

<match **>
  @type              file
  format             json
  path               "/tmp/fluent-test"
  flush_interval     10s
  buffer_chunk_limit 16m
  compress           gzip
</match>
EOF
$ td-agent --no-supervisor -c test-out.conf &

# observe initial memory consumption
$ ps -o pid,vsz,rss,cmd | grep '[t]d-'
 4254 431724 30544 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-out.conf
 4259 498816 30052 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-in.conf

# pump a bunch of data through it
$ seq 1 5000000 | sed 's/.*/{"num": &, "filler": "this is filler text to make the event larger"}/' > /dev/tcp/localhost/10130 &

# wait for all data to flush to disk and CPU to return to idle
$ watch -n1 'ls -lt /tmp/fluent-test.* | head -5'

# observe final memory consumption
$ ps -o pid,rss,cmd | grep '[t]d-'
 4338 628248 288972 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-out.conf
 4343 1023868 461616 /opt/td-agent/embedded/bin/ruby /sbin/td-agent --no-supervisor -c test-in.conf

As you can see from the RSS numbers, each td-agent process started out around 30MB and they ended at ~290MB and ~460MB, respectively. Neither process will release that memory if you wait a while. (In the real-world staging system we initially discovered this on, memory consumption of the test-out.conf-equivalent configuration reached over 3GB, and the test-in.conf-equivalent was a Fluent Bit instance exhibiting a recently-fixed duplication bug.)

Reviewing a Fluentd-related kubernetes issue during our own diagnostics, we noticed that the behavior we observed seemed similar to the Fluentd behavior described there when built without jemalloc. This led us to check if the td-agent binary we were using was in fact linked with jemalloc. According to the FAQ, jemalloc is used when building the Treasure Data RPMs, and though we found jemalloc libraries installed on the system, we couldn't find any existence of jemalloc in the process running in memory. Specifically, we tried the following things:

# given a td-agent process with PID 4343...

# no jemalloc shared library mentioned in the memory mapping
$ pmap 4343 | grep jemalloc
# ... so it doesn't look like it's dynamically linked

# grab the entire memory space and search it for references to jemalloc
$ gcore 4343
$ strings core.4343 | fgrep jemalloc

# ... but if it were statically linked, you'd expect to find some of these strings
$ strings /opt/td-agent/embedded/lib/libjemalloc.a | fgrep jemalloc

In short, this leads us to wonder... are the binaries invoked by td-agent actually linked with jemalloc? If they are not, is the old memory fragmentation problem that jemalloc solved what we are observing here? (And if they aren't, am I raising this issue in the wrong place, and if so where should I raise it?)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions