-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in_tail: plugin does not pickup rotated logs under heavy load #1108
Comments
We have the same problem :\ |
ping? |
There's a timer checking rotated files every 2.5 seconds, did it work for you? fluent-bit/plugins/in_tail/tail_fs_stat.c Lines 176 to 178 in 8cc3a18
|
@l2dy, no it did not because there's inotify version of this code which is running in fluent-bit fluent-bit/plugins/in_tail/tail_fs_inotify.c Lines 159 to 160 in f6bb10a
|
Indeed. If you use Fluent Bit built with |
@holycheater Did you find any solution to this problem? |
We made a hack to restart fluent-bit after rotating logs, building fluent-bit without inotify (it would just stat() watched files on a regular interval) should work too |
Having a similar issue in a gcloud k8s cluster and using fluent-bit v1.2.0 The node becomes problematic with other pods/apps unable to start due to the Increasing the watches is just a workaround and will just delay the issue, as the actual files never reach that high. (after restarting fluentbit the number reported goes to ~120 files actually open) |
I've been encountering this as well. I've been sending a million lines to stdout very quickly from just one single Docker container. I reliably get about the first ~650,000 lines. The last line I get is always the last line before the log was rotated. I never get even one line from the new file after rotation. Popping into the sqlite database to poke around, record shows no recognition of rotation and an offset that exceeds the new file's length. @edsiper any insight on this? |
I am seeing the same issue under high scale. Any workarounds/updates? |
We had the same issue running fluent-bit on EKS. Logs came at a rate of 50 k req/s. After updating fluent-bit to v1.6.8, bumping up the resources, and using a combination of memory and filesystem for buffering messages, we were able to avoid the log loss. We also modified the log rotation on EKS to rotate files only after reaching 10 GB.
|
@akashmantry, our team also experienced similar issue in GKE. After a very spiky load (10 MB/s) for a few seconds file is rotated but it seems that Fluent Bit is not tracking a new file (after one more rotation it detects a new file). Do you know if upgrade to 1.6.8 by itself could resolve this issue or we should combine all methods? |
@loburm you might have to try a combination of these to figure out what works for your use case. switching to the latest version is always recommended. for us, moving to Kafka and increasing the log file size eliminated the log loss problems. |
@edsiper I think we had a similar case in GKE recently. First there was a heavy load (multiple MBs per second), during which log rotation happens. Fluent Bit hasn't managed to detect a new file and continues reading logs only after the next rotation. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
This issue was closed because it has been stalled for 5 days with no activity. |
Bug Report
Describe the bug
tail_fs_event receives IN_Q_OVERFLOW inotify events from time to time, thus missing IN_MOVE_SELF events.
To Reproduce
tail a lot of files by pattern with heavy writing to them. The setup I have reads around 30 nginx access log files by pattern.
example cfg:
Example fluent-bit log with traces (had to add some extra output to tackle the problem)
sysctl params related to inotify:
Expected behavior
reload files after log-rotate
Your Environment
Additional context
We try to collect access logs on several nginx machines. Every morning after log-rotate tail plugin fails to pick up most of rename events (more often all of them).
Basically, looks like it happens because files are tailed synchronously with receiving inotify events (reading, filling up buffer, processing content) while other events are piling up in the queue.
The text was updated successfully, but these errors were encountered: