Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

file_input - Data loss when tailing symlink file with log rotation (k8s container logs) #85

Closed
djaglowski opened this issue Mar 31, 2021 · 4 comments · Fixed by #168 or #182
Closed
Assignees
Labels
help wanted Extra attention is needed

Comments

@djaglowski
Copy link
Member

See contrib issue for details.

@djaglowski djaglowski added the help wanted Extra attention is needed label Mar 31, 2021
@rockb1017
Copy link
Contributor

Would the required changes be in operator/builtin/input/file/ ?

@djaglowski
Copy link
Member Author

Yes, exactly!

@rockb1017
Copy link
Contributor

i have written up draft doc for another solution
https://docs.google.com/document/d/186ry60Tb2jYc-PUpAmGkLnDe5zglmgLtb0m5C_9H7W4/edit?usp=sharing
I am thinking of Solution B in the doc.
please take a look and comment.

@djaglowski
Copy link
Member Author

Here's my understanding of how this will work:

Current functionality that must be maintained:

  • max_concurrent_files must be respected at all times. The user must be able to set this value and know that it is respected.
  • If the number of matching files exceeds max_concurrent_files, then the files will be broken into batches, and each subsequent polling interval will consume the next batch. (New implementation may change size of batches)

New functionality that will be added:

  • Files will be kept open between polling intervals. When the next interval is reached, new files will be opened. The new files and old files will be cross referenced to determine if any of the old files have been rotated out of the matching pattern. Those that have been rotated out, will be consumed. All old files will then be closed. The new files will then be consumed, and kept open.

A limitation will be documented, which states that if both of 1) the number of matching files exceeds max_concurrent_files, and 2) some files are being rotated out of the matching pattern(s), then some log lines may be lost. This is because the batching functionality described above may result in a lack of overlap between subsequent polling intervals, and therefore we may not be able to detect that one of the open files has been rotated out of the matching pattern.

Most importantly, we need to ensure that the user has the ability to decide which tradeoff they are willing to make:

  • If it is most important that the max number of concurrent files is respected, then the user can set the max_concurrent_files value as necessary.
  • If it is most important that no log lines are lost due to rotation out of the matching pattern, then the user can set max_concurrent_files to a very high number. (We should consider allowing max_concurrent_files=0 to represent unlimited, though the file system will ultimately set a limit.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
2 participants