Fluentd 0.14.13 gets buffer overflow errors with the same settings as Fluentd 0.12.32

During some basic load testing, we found that the upgrade to Fluentd 0.14.13 from 0.12.32 caused buffer overflow errors to be triggered very frequently.

[Error]
```
2017-03-01 17:02:20 +0000 [warn]: #0 failed to write data into buffer by buffer overflow action=:block
```

[Steps to reproduce]
1. Create a GCP instance (1 vCPU, 3.75GB, Debian GNU/Linux 8)
2. Install docker:
```
$ sudo apt-get install docker.io
```
3. Create some test logs with the logs generator container:
```
$ mkdir ~/logs
$ sudo docker run -i -e "LOGS_GENERATOR_DURATION=1s" -e "LOGS_GENERATOR_LINES_TOTAL=1000000" gcr.io/google_containers/logs-generator:v0.1.0 2>&1 | awk '{print "{\"log\":\"" $0 "\"}"}' > ~/logs/log.log
```
4. Create configs
```
$ mkdir ~/config.d
$ cat ~/config.d/config.conf
<match fluent.**>
  type null
</match>

<source>
  type tail
  format json
  time_key time
  path /var/log/containers/*.log
  pos_file /var/log/gcp-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag reform.*
  read_from_head true
</source>

<match reform.**>
  type record_reformer
  enable_ruby true
  tag kubernetes.${tag_suffix[4].split('-')[0..-2].join('-')}
</match>

# We use 2 output stanzas - one to handle the container logs and one to handle
# the node daemon logs, the latter of which explicitly sends its logs to the
# compute.googleapis.com service rather than container.googleapis.com to keep
# them separate since most users don't care about the node logs.
<match kubernetes.**>
  type google_cloud
  # Set the buffer type to file to improve the reliability and reduce the memory consumption
  buffer_type file
  buffer_path /var/log/fluentd-buffers/kubernetes.containers.buffer
  # Set queue_full action to block because we want to pause gracefully
  # in case of the off-the-limits load instead of throwing an exception
  buffer_queue_full_action block
  # Set the chunk limit conservatively to avoid exceeding the GCL limit
  # of 10MiB per write request.
  buffer_chunk_limit 2M
  # Cap the combined memory usage of this buffer and the one below to
  # 2MiB/chunk * (6 + 2) chunks = 16 MiB
  buffer_queue_limit 6
  # Never wait more than 5 seconds before flushing logs in the non-error case.
  flush_interval 5s
  # Never wait longer than 30 seconds between retries.
  max_retry_wait 30
  # Disable the limit on the number of retries (retry forever).
  disable_retry_limit
  # Use multiple threads for processing.
  num_threads 2
</match>

# Keep a smaller buffer here since these logs are less important than the user's
# container logs.
<match **>
  type google_cloud
  detect_subservice false
  buffer_type file
  buffer_path /var/log/fluentd-buffers/kubernetes.system.buffer
  buffer_queue_full_action block
  buffer_chunk_limit 2M
  buffer_queue_limit 2
  flush_interval 5s
  max_retry_wait 30
  disable_retry_limit
  num_threads 2
</match>
```
5. Start the container and check the logs.
```
export FLUENTD_ID=`sudo docker run -d -v $(pwd)/logs:/var/log/containers -v ~/config.d:/etc/fluent/config.d qingling128/testing:buffer-overflow-test-0-12-32 `
sudo docker logs -f $FLUENTD_ID

export FLUENTD_ID=`sudo docker run -d -v $(pwd)/logs:/var/log/containers -v ~/config.d:/etc/fluent/config.d qingling128/testing:buffer-overflow-test-0-14-13 `
sudo docker logs -f $FLUENTD_ID
```

Both images are built from [fluentd-gcp-image](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-gcp/fluentd-gcp-image). The only difference is the Fluentd version in the Gemfile. `buffer-overflow-test-0-12-32` has Fluentd `0.12.32`, and can process the logs successfully with the config settings above. `buffer-overflow-test-0-14-13` has Fluentd `0.14.13`, and it can't process the logs without a buffer overflow error unless we increase the buffer size to:
```
buffer_chunk_limit 8M
buffer_queue_limit 32
num_threads 8
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fluentd 0.14.13 gets buffer overflow errors with the same settings as Fluentd 0.12.32 #1485

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fluentd 0.14.13 gets buffer overflow errors with the same settings as Fluentd 0.12.32 #1485

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions