Is your feature request related to a problem? Please describe.
I am using fluent-bit to collect log. To ensure data integrity, I uses "storage.type filesystem". However, I encounter two problems.
- There is not way to limit disk size used for an input. if the output blocks, the buffer directory will grow indefinitely.
- old files in the buffer directory are not removed. since it's hard to correlate buffer directory with input, it's also hard to find the right directory to remove obsolete files manually.
for example:
my configuration is:
[SERVICE]
# This is a commented line
Daemon off
flush 1
log_Level debug
storage.path /data/hostpath/fluent-bit/buffer
storage.sync normal
storage.checksum on
storage.backlog.mem_limit 30M
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
@INCLUDE conf/*.conf
- the configuration in conf/
[INPUT]
Name tail
Mem_Buf_Limit 1M
Buffer_Chunk_Size 32KB
storage.type filesystem
Buffer_Max_Size 64KB
Path /data/hostpath/fluent-bit/logs/log.block
DB /data/hostpath/fluent-bit/pos/log.block.pos
Tag log.block.*
Path_Key file
[OUTPUT]
Name es
Retry_Limit False
Match log.block.*
Host 192.168.0.12
Port 9200
Index test
Type _doc
-
suppose that elasticsearch is temporally unreachable(network failure). fluent-bit will keep reading log files and store records in /data/hostpath/fluent-bit/buffer/tail.0. Since no limit is imposed, the disk may be full, which will do harm to the system.
-
With the files in the buffer not deleted, if we change the path to /data/hostpath/fluent-bit/logs/log.block.v2 and restart fluent-bit. fluent-bit will still use /data/hostpath/fluent-bit/buffer/tail.0 as buffer, which means the configuration instructs fluent-bit to collect log in /data/hostpath/fluent-bit/logs/log.block.v2 while fluent-bit may send contents in /data/hostpath/fluent-bit/logs/log.block to the output. We can delete /data/hostpath/fluent-bit/buffer/tail.0 manually to avoid this, however, it will be hard to find which tail.x directory to delete as the number of inputs in conf/* grows.
maybe using tag name insteand of tail.x as buffer directory name can avoid this? for example, the buffer path form the above configuration becomes log.block.* instead of tail.0
-
By writing all the inputs in one file, we may be able to find the corresponding buffer directory for an input. For example:
[INPUT]
Path /data/f1
.....
[INPUT]
Path /data/f2
[INPUT]
Path /data/f3
We may find that files f1, f2 and f3 use tail.0, tail.1, and tail.2 respectively. However, it's not something that can be relied on. and also, in my situation, the inputs are gathered in the conf/* directory, which makes it hard to find this correlation. if we could specify the buffer path, the problem will be gone. For example:
[INPUT]
Path /data/f1
Buffer_path /data/hostpath/fluent-bit/buffer/tag1
.....
[INPUT]
Path /data/f2
Buffer_path /data/hostpath/fluent-bit/buffer/tag2
[INPUT]
Path /data/f3
Buffer_path /data/hostpath/fluent-bit/buffer/tag3
-
another reason why I want to find the corresponding buffer directory for an input is that the files in the buffer may be broken, which will never be flushed. Those files need to be deleted.(as mentioned here)
Describe the solution you'd like
- add parameter(Disk_Buf_Limit?) in input section to limit disk buffer size (something like Mem_Buf_Limit)
- specify buffer path in input section, just like what we do in fluentd
Describe alternatives you've considered
if the directory name for buffer contains tag name instead of tail.x, the we will still be able to correlate input with buffer path, and delete obsolete files if needed.
Is your feature request related to a problem? Please describe.
I am using fluent-bit to collect log. To ensure data integrity, I uses "storage.type filesystem". However, I encounter two problems.
for example:
my configuration is:
suppose that elasticsearch is temporally unreachable(network failure). fluent-bit will keep reading log files and store records in /data/hostpath/fluent-bit/buffer/tail.0. Since no limit is imposed, the disk may be full, which will do harm to the system.
With the files in the buffer not deleted, if we change the path to
/data/hostpath/fluent-bit/logs/log.block.v2and restart fluent-bit. fluent-bit will still use/data/hostpath/fluent-bit/buffer/tail.0as buffer, which means the configuration instructs fluent-bit to collect log in/data/hostpath/fluent-bit/logs/log.block.v2while fluent-bit may send contents in/data/hostpath/fluent-bit/logs/log.blockto the output. We can delete /data/hostpath/fluent-bit/buffer/tail.0 manually to avoid this, however, it will be hard to find which tail.x directory to delete as the number of inputs in conf/* grows.maybe using tag name insteand of tail.x as buffer directory name can avoid this? for example, the buffer path form the above configuration becomes log.block.* instead of tail.0
By writing all the inputs in one file, we may be able to find the corresponding buffer directory for an input. For example:
We may find that files f1, f2 and f3 use tail.0, tail.1, and tail.2 respectively. However, it's not something that can be relied on. and also, in my situation, the inputs are gathered in the conf/* directory, which makes it hard to find this correlation. if we could specify the buffer path, the problem will be gone. For example:
another reason why I want to find the corresponding buffer directory for an input is that the files in the buffer may be broken, which will never be flushed. Those files need to be deleted.(as mentioned here)
Describe the solution you'd like
Describe alternatives you've considered
if the directory name for buffer contains tag name instead of tail.x, the we will still be able to correlate input with buffer path, and delete obsolete files if needed.