Skip to content

Fluentd 0.14.13 gets buffer overflow errors with the same settings as Fluentd 0.12.32 #1485

Closed
@qingling128

Description

@qingling128

During some basic load testing, we found that the upgrade to Fluentd 0.14.13 from 0.12.32 caused buffer overflow errors to be triggered very frequently.

[Error]

2017-03-01 17:02:20 +0000 [warn]: #0 failed to write data into buffer by buffer overflow action=:block

[Steps to reproduce]

  1. Create a GCP instance (1 vCPU, 3.75GB, Debian GNU/Linux 8)
  2. Install docker:
$ sudo apt-get install docker.io
  1. Create some test logs with the logs generator container:
$ mkdir ~/logs
$ sudo docker run -i -e "LOGS_GENERATOR_DURATION=1s" -e "LOGS_GENERATOR_LINES_TOTAL=1000000" gcr.io/google_containers/logs-generator:v0.1.0 2>&1 | awk '{print "{\"log\":\"" $0 "\"}"}' > ~/logs/log.log
  1. Create configs
$ mkdir ~/config.d
$ cat ~/config.d/config.conf
<match fluent.**>
  type null
</match>

<source>
  type tail
  format json
  time_key time
  path /var/log/containers/*.log
  pos_file /var/log/gcp-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag reform.*
  read_from_head true
</source>

<match reform.**>
  type record_reformer
  enable_ruby true
  tag kubernetes.${tag_suffix[4].split('-')[0..-2].join('-')}
</match>

# We use 2 output stanzas - one to handle the container logs and one to handle
# the node daemon logs, the latter of which explicitly sends its logs to the
# compute.googleapis.com service rather than container.googleapis.com to keep
# them separate since most users don't care about the node logs.
<match kubernetes.**>
  type google_cloud
  # Set the buffer type to file to improve the reliability and reduce the memory consumption
  buffer_type file
  buffer_path /var/log/fluentd-buffers/kubernetes.containers.buffer
  # Set queue_full action to block because we want to pause gracefully
  # in case of the off-the-limits load instead of throwing an exception
  buffer_queue_full_action block
  # Set the chunk limit conservatively to avoid exceeding the GCL limit
  # of 10MiB per write request.
  buffer_chunk_limit 2M
  # Cap the combined memory usage of this buffer and the one below to
  # 2MiB/chunk * (6 + 2) chunks = 16 MiB
  buffer_queue_limit 6
  # Never wait more than 5 seconds before flushing logs in the non-error case.
  flush_interval 5s
  # Never wait longer than 30 seconds between retries.
  max_retry_wait 30
  # Disable the limit on the number of retries (retry forever).
  disable_retry_limit
  # Use multiple threads for processing.
  num_threads 2
</match>

# Keep a smaller buffer here since these logs are less important than the user's
# container logs.
<match **>
  type google_cloud
  detect_subservice false
  buffer_type file
  buffer_path /var/log/fluentd-buffers/kubernetes.system.buffer
  buffer_queue_full_action block
  buffer_chunk_limit 2M
  buffer_queue_limit 2
  flush_interval 5s
  max_retry_wait 30
  disable_retry_limit
  num_threads 2
</match>
  1. Start the container and check the logs.
export FLUENTD_ID=`sudo docker run -d -v $(pwd)/logs:/var/log/containers -v ~/config.d:/etc/fluent/config.d qingling128/testing:buffer-overflow-test-0-12-32 `
sudo docker logs -f $FLUENTD_ID

export FLUENTD_ID=`sudo docker run -d -v $(pwd)/logs:/var/log/containers -v ~/config.d:/etc/fluent/config.d qingling128/testing:buffer-overflow-test-0-14-13 `
sudo docker logs -f $FLUENTD_ID

Both images are built from fluentd-gcp-image. The only difference is the Fluentd version in the Gemfile. buffer-overflow-test-0-12-32 has Fluentd 0.12.32, and can process the logs successfully with the config settings above. buffer-overflow-test-0-14-13 has Fluentd 0.14.13, and it can't process the logs without a buffer overflow error unless we increase the buffer size to:

buffer_chunk_limit 8M
buffer_queue_limit 32
num_threads 8

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingv0.14

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions