Skip to content

need to ability to limit disk buffer size and remove obsolete buffer files #2136

@yiwenshao

Description

@yiwenshao

Is your feature request related to a problem? Please describe.
I am using fluent-bit to collect log. To ensure data integrity, I uses "storage.type filesystem". However, I encounter two problems.

  1. There is not way to limit disk size used for an input. if the output blocks, the buffer directory will grow indefinitely.
  2. old files in the buffer directory are not removed. since it's hard to correlate buffer directory with input, it's also hard to find the right directory to remove obsolete files manually.

for example:
my configuration is:

  • fluent-bit.conf
[SERVICE]
    # This is a commented line
    Daemon    off
    flush                     1
    log_Level                 debug
    storage.path              /data/hostpath/fluent-bit/buffer
    storage.sync              normal
    storage.checksum          on
    storage.backlog.mem_limit 30M
    HTTP_Server  On
    HTTP_Listen  0.0.0.0
    HTTP_PORT    2020

@INCLUDE conf/*.conf

  • the configuration in conf/
[INPUT]
    Name tail
    Mem_Buf_Limit 1M
    Buffer_Chunk_Size 32KB
    storage.type  filesystem
    Buffer_Max_Size  64KB
    Path /data/hostpath/fluent-bit/logs/log.block
    DB /data/hostpath/fluent-bit/pos/log.block.pos
    Tag log.block.*
    Path_Key file

[OUTPUT]
    Name  es
    Retry_Limit False
    Match log.block.*
    Host 192.168.0.12
    Port 9200
    Index test
    Type _doc
  1. suppose that elasticsearch is temporally unreachable(network failure). fluent-bit will keep reading log files and store records in /data/hostpath/fluent-bit/buffer/tail.0. Since no limit is imposed, the disk may be full, which will do harm to the system.

  2. With the files in the buffer not deleted, if we change the path to /data/hostpath/fluent-bit/logs/log.block.v2 and restart fluent-bit. fluent-bit will still use /data/hostpath/fluent-bit/buffer/tail.0 as buffer, which means the configuration instructs fluent-bit to collect log in /data/hostpath/fluent-bit/logs/log.block.v2 while fluent-bit may send contents in /data/hostpath/fluent-bit/logs/log.block to the output. We can delete /data/hostpath/fluent-bit/buffer/tail.0 manually to avoid this, however, it will be hard to find which tail.x directory to delete as the number of inputs in conf/* grows.
    maybe using tag name insteand of tail.x as buffer directory name can avoid this? for example, the buffer path form the above configuration becomes log.block.* instead of tail.0

  • By writing all the inputs in one file, we may be able to find the corresponding buffer directory for an input. For example:

     [INPUT]
     Path /data/f1
     .....
     [INPUT]
     Path /data/f2
     [INPUT]
     Path /data/f3
    

    We may find that files f1, f2 and f3 use tail.0, tail.1, and tail.2 respectively. However, it's not something that can be relied on. and also, in my situation, the inputs are gathered in the conf/* directory, which makes it hard to find this correlation. if we could specify the buffer path, the problem will be gone. For example:

       [INPUT]
       Path /data/f1
       Buffer_path /data/hostpath/fluent-bit/buffer/tag1
       .....
       [INPUT]
       Path /data/f2
       Buffer_path /data/hostpath/fluent-bit/buffer/tag2
       [INPUT]
       Path /data/f3
       Buffer_path /data/hostpath/fluent-bit/buffer/tag3
    
  • another reason why I want to find the corresponding buffer directory for an input is that the files in the buffer may be broken, which will never be flushed. Those files need to be deleted.(as mentioned here)

Describe the solution you'd like

  • add parameter(Disk_Buf_Limit?) in input section to limit disk buffer size (something like Mem_Buf_Limit)
  • specify buffer path in input section, just like what we do in fluentd

Describe alternatives you've considered
if the directory name for buffer contains tag name instead of tail.x, the we will still be able to correlate input with buffer path, and delete obsolete files if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions