Multiline logs split when rotated by Kubelet (containerd) #4094

m5lapp · 2023-03-13T10:12:58Z

Describe the bug

We are running Fluentd in a Kubernetes environment using the containerd runtime and having logs rotated by the Kubelet when they reach 10MB in size and retaining the last four plus the current log. When the log rotation happens when a large multi-line log is being written and therefore gets split across the old and the new log file, the log is picked up by Fluentd as many single-line logs rather than a multiline log.

To Reproduce

Run Kubelet with the containerd runtime
Configure Kubelet to rotate logs with the following two parameters:
1. containerLogMaxFiles: 5
2. containerLogMaxSize: 10Mi
When the log file is close to the 10MB limit, start writing a large multi-line log
1. The first line should begin with the tag [SKY_LOG_START] as per the configuration below
Ensure that the multi-line log spans both the old and the new rotated logs files
1. The multi-line log should end with [SKY_LOG_END] as per the configuration below
Observe in the output log file that the multi-line log you wrote has been captured as many single-line log entries and not one multi-line log

Expected behavior

A multi-line log that get split across two files when our logs get rotated should still be picked up as a single log entry by Fluentd.

Your Environment

- Fluentd version: v1.14.3
- TD Agent version: N/A
- Operating system: Red Hat Enterprise Linux 8.7
- Kernel version: 4.18.0-425.10.1.el8_7.x86_64
- Kubernetes version: 1.21.13
- containerd version: v1.6.4

Your Configuration

<system>
  log_level info
</system>

#enable monitoring endpoint for ready and liveness probes
<source>
    @type monitor_agent
    bind 0.0.0.0
    port 24220
</source>
<source>
    @type tail
    enable_stat_watcher false
    pos_file /var/lib/containerd/fluentd/fluentd-containers.pos
    tag containers.*
    read_from_head true
    # Read from /var/log/containers/ as filenames have details used by the
    # kubernetes_metadata plugin.
    path /var/log/containers/*.log
    refresh_interval 1s
    <parse>
        @type regexp
        keep_time_key true
        expression /^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$/
    </parse>
</source>

<filter containers.**>
    @type kubernetes_metadata
    verify_ssl false
</filter>

<filter containers.**>
    @type concat
    key log
    separator "\n"
    stream_identity_key kubernetes.pod_name
    multiline_start_regexp /\[SKY_LOG_START\]/
    multiline_end_regexp /\[SKY_LOG_END\]/
    flush_interval 5
    timeout_label @NORMAL
</filter>

<filter **>
    @type record_transformer
    <record>
        location slu
        environment sandbox
    </record>
</filter>

<match **>
    @type relabel
    @label @NORMAL
</match>

<label @NORMAL>
    <match **>
        @type file
        @id output
        delimiter SPACE
        output_tag false
        output_time false
        path /var/lib/containerd/fluentd/containers
        
        #Rotate the output file every hour
        time_slice_format %Y%m%d%H
        append true
        flush_interval 0s
        flush_at_shutdown true
        #Force the event time as now
        localtime true
    </match>
</label>

## Prometheus Input Plugin Configuration

# Prometheus input plugin that exports metrics
<source>
    @type prometheus
</source>

# Prometheus input plugin that collects metrics from MonitorAgent
<source>
    @type prometheus_monitor
    <labels>
        host ${hostname}
    </labels>
</source>

# Prometheus input plugin that collects metrics for output plugin
<source>
    @type prometheus_output_monitor
    <labels>
        host ${hostname}
    </labels>
</source>

Your Error Log

N/A

Additional context

When the Kubelet does the log rotation it swaps the current log and the most recently rotated log's inodes around as follows:

393234 -rw-r-----. 1 root root 9.7M Mar 10 10:33 0.log
393223 -rw-r-----. 1 root root  11M Mar 10 10:32 0.log.20230310-103201

# Rotation 1
393223 -rw-r-----. 1 root root 4.6M Mar 10 10:33 0.log
393234 -rw-r-----. 1 root root  11M Mar 10 10:33 0.log.20230310-103312

# Rotation 2
393234 -rw-r-----. 1 root root 6.7M Mar 10 10:44 0.log
393223 -rw-r-----. 1 root root  12M Mar 10 10:43 0.log.20230310-104354

The text was updated successfully, but these errors were encountered:

github-actions · 2023-04-13T10:01:39Z

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

mmn01-sky · 2023-04-13T10:02:34Z

Are there any updates on this? Keen to get this resolved.

daipom · 2023-04-13T14:11:37Z

Sorry for the delay.

This looks like an issue of fluent-plugin-concat.

Isn't the filter_concat timeouting before in_tail reads the new file?
If you are getting a log message Timeout flush: xxx , that should be the cause.

Although it would be a less likely case that the rotation would take more than 5 seconds,
since you are setting refresh_interval 1s so that in_tail should follow the new file immediately...

Anyway, to narrow down the cause, we should check if this log message occurs during the rotation.

Example:

[info]: #0 following tail of /test/fluentd/input/test.log
[info]: #0 fluentd worker is now running worker=0
(datetime) test: {"message":"[START]Hello"}
[info]: #0 Timeout flush: test:default
[info]: #0 following tail of /test/fluentd/input/test2.log
(datetime) test: {"message":"World![END]"}

github-actions · 2023-05-14T10:01:20Z

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

github-actions · 2023-05-22T10:01:40Z

This issue was automatically closed because of stale in 7 days

github-actions bot added the stale label Apr 13, 2023

daipom removed the stale label Apr 13, 2023

daipom added the waiting-for-user Similar to "moreinfo", but especially need feedback from user label Apr 13, 2023

github-actions bot added the stale label May 14, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiline logs split when rotated by Kubelet (containerd) #4094

Multiline logs split when rotated by Kubelet (containerd) #4094

m5lapp commented Mar 13, 2023

github-actions bot commented Apr 13, 2023

mmn01-sky commented Apr 13, 2023

daipom commented Apr 13, 2023

github-actions bot commented May 14, 2023

github-actions bot commented May 22, 2023

Multiline logs split when rotated by Kubelet (containerd) #4094

Multiline logs split when rotated by Kubelet (containerd) #4094

Comments

m5lapp commented Mar 13, 2023

Describe the bug

To Reproduce

Expected behavior

Your Environment

Your Configuration

Your Error Log

Additional context

github-actions bot commented Apr 13, 2023

mmn01-sky commented Apr 13, 2023

daipom commented Apr 13, 2023

github-actions bot commented May 14, 2023

github-actions bot commented May 22, 2023