Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail_files behavior #2613

Closed
jjfalling opened this issue Sep 21, 2016 · 3 comments
Closed

tail_files behavior #2613

jjfalling opened this issue Sep 21, 2016 · 3 comments

Comments

@jjfalling
Copy link
Contributor

Most users might expect that tail_files would apply ReadFromLast only on the first scan and from then on it is not taken into account anymore. This behavior was not possible in 1.x because of filebeat worked but it should be possible to change it in the 5.x versions.

@ruflin
Copy link
Member

ruflin commented Sep 21, 2016

During startup of filbeat 5.x, a first scan is done to set all the states. During this scan tail_files should apply but not during all following scans. The problem in 5.x is that tail_files currently still applies on a harvester level as the offset is set inside the harvester. This logic could be moved to the prospector which makes it possible, to only apply it during the first scan.

@ruflin
Copy link
Member

ruflin commented Oct 25, 2016

@jjfalling One question that popped up during implementation: Should only the files be tailed for which no state exists yet or all of them? Means if a file was harvested before and there is already a state of an offset point to the middle of the file, should filebeat set the offset to the end in the first scan or because a state exist start sending the files from the middle of the file?

ruflin added a commit to ruflin/beats that referenced this issue Nov 1, 2016
Previously if a file was falling under ignore_older on startup and no previous state was found, also no state was persisted. This was important in 1.x releases to prevent the registry file to contain too many files as no cleanup was possible. With the new clean_* options the registry file can now be cleaned up. For consistency all files that fall under ignore_older should have a state, even if they are not crawled before.

This change is also required to move tail_files under the prospector (elastic#2613). Otherwise files which fall under `ignore_older` the state could not be set to the end of the file.
tsg pushed a commit that referenced this issue Nov 1, 2016
Previously if a file was falling under ignore_older on startup and no previous state was found, also no state was persisted. This was important in 1.x releases to prevent the registry file to contain too many files as no cleanup was possible. With the new clean_* options the registry file can now be cleaned up. For consistency all files that fall under ignore_older should have a state, even if they are not crawled before.

This change is also required to move tail_files under the prospector (#2613). Otherwise files which fall under `ignore_older` the state could not be set to the end of the file.
@ruflin
Copy link
Member

ruflin commented Nov 1, 2016

Checking the behaviour of nxlog:

ReadFromLast
This optional directive takes a boolean value. If it is set to TRUE, it instructs the module to only read logs which arrived after nxlog was started in case the saved position could not be read (for example on first start). When SavePos is TRUE and a previously saved position value could be read, the module will resume reading from this saved position. If this is FALSE, the module will read all logs from the file. This can result in quite a lot of messages which is usually not the expected behaviour. If this directive is not specified, it defaults to TRUE.

In summary, nxlog does not apply tail_files if a state already exists.

ruflin added a commit to ruflin/beats that referenced this issue Nov 3, 2016
tail_files is now only applied on the first run and after that ignored. Also the state for all files falling under tail_files and not having a state, a state will directly be written.

* Implement tail_files by setting ignore_older to 1ns for the first run
* Fix typo in stats variable names

Closes elastic#2613 and elastic#2788
@urso urso closed this as completed in #2932 Nov 7, 2016
urso pushed a commit that referenced this issue Nov 7, 2016
tail_files is now only applied on the first run and after that ignored. Also the state for all files falling under tail_files and not having a state, a state will directly be written.

* Implement tail_files by setting ignore_older to 1ns for the first run
* Fix typo in stats variable names

Closes #2613 and #2788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants