Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat 1.1.2 does not record correct inode values of rotated logs on exit #1281

Closed
jstangroome opened this issue Apr 1, 2016 · 7 comments

Comments

Projects
None yet
3 participants
@jstangroome
Copy link

commented Apr 1, 2016

From the discussion at https://discuss.elastic.co/t/filebeat-1-1-2-does-not-record-correct-inode-values-of-rotated-logs-on-exit/45749

I am seeing filebeat re-ship rotated log files when the filebeat service restarts.

We are experiencing the problem on a local ext4 file system on Ubuntu 14.04 running in AWS EC2.

While filebeat remains running, it is correctly processing the files. The problem is that when it persists its state to the registry file, it is persisting the older inode values of the files before they were rotated, not the inodes of the files at those paths at the time filebeat exited.

I have created a basic configuration and bash script which starts filebeat, writes to a log file, rotates it a couple of times (with pauses in between to allow filebeat to notice) and then stops filebeat.

Full repro script, conf, and output log available as a gist:
https://gist.github.com/jstangroome/a1a70306cbd7a9420be15ec2bbe109a0

After filebeat stops, the registry has persisted the wrong inodes for the log files present on disk. As such, when filebeat starts again, an existing log file will appear to be a new file without an existing offset and its contents will be shipped a second time.

My repro script was run on a local ext4 file system on Ubuntu 14.04 running in a Virtualbox VM.

This might be an edge case not handled in #1010

@tsg tsg added bug Filebeat labels Apr 1, 2016

@tsg

This comment has been minimized.

Copy link
Collaborator

commented Apr 1, 2016

Thanks for the well documented bug report. We'll investigate.

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Apr 12, 2016

@jstangroome I can confirm this bug also on master and Mac OS X. Will not create a system test to reproduce it.

ruflin added a commit to ruflin/beats that referenced this issue Apr 12, 2016

ruflin added a commit to ruflin/beats that referenced this issue Apr 12, 2016

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Apr 12, 2016

Here is a PR that introduces a system check to test for the problem: #1375 Next step: Fixing it.

ruflin added a commit to ruflin/beats that referenced this issue Apr 12, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Apr 12, 2016

@jstangroome I think I found a solution (see #1375). The problem occured when multiple files were rotated at the same time and it was written to the first file before the rotation was detected. Can you test if master works now for you?

@tsg I tried to find a fix that is minimal and backportable. Should we backport it?

@tsg

This comment has been minimized.

Copy link
Collaborator

commented Apr 12, 2016

@ruflin We're not yet sure if we'll have another release in the 1.X series, but if the cherry-pick is trivial, please do it to the 1.2 branch. Alternatively, you can tag this as backport and we do it when we're sure we will have a release.

@ruflin ruflin added the backport label Apr 12, 2016

@ruflin

This comment has been minimized.

Copy link
Collaborator

commented Apr 12, 2016

@jstangroome We really appreciate all the work you put into this one. Thanks a lot.

@jstangroome

This comment has been minimized.

Copy link
Author

commented Apr 12, 2016

@ruflin No problems. I'm just happy you've been able to push a fix so quickly and I'm hoping I'll get the opportunity to try the latest build in our environment soon.

ruflin added a commit to ruflin/beats that referenced this issue Apr 13, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
* Refactor some variables for better sync handling and fix tests

ruflin added a commit to ruflin/beats that referenced this issue Apr 15, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
* Refactor some variables for better sync handling and fix tests
* Make offset private and only set it through method

ruflin added a commit to ruflin/beats that referenced this issue Apr 15, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
* Refactor some variables for better sync handling and fix tests
* Make offset private and only set it through method

ruflin added a commit to ruflin/beats that referenced this issue Apr 15, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
* Refactor some variables for better sync handling and fix tests
* Make offset private and only set it through method

ruflin added a commit to ruflin/beats that referenced this issue Apr 15, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
* Refactor some variables for better sync handling and fix tests
* Make offset private and only set it through method

ruflin added a commit to ruflin/beats that referenced this issue Apr 18, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
* Refactor some variables for better sync handling and fix tests
* Make offset private and only set it through method
* Rename system tests to close_older as this is what is actually checked
* Update registrar only when file was rotated or changed
* Add test to check state between shutdown and rotation.Only update registry when changes happen.

ruflin added a commit to ruflin/beats that referenced this issue Apr 18, 2016

Fix filebeat file rotation registrar issue elastic#1281
When multiple files were rotated at the same time, the first rotation was overwriting the states for the other rotated files. This happened because the new file directly had new content inside and the state for all the other files was fetched from disk, resetting the changes which were in memory.

The problem is now fixed by not writing the state to disk every time state changes are discovered by the registrar, as it is up to the prospector to decide if these changes are the most recent ones. Now changes get written to disk every time a scan is done. The downside of this solution is that the registrar is also updated, if no changes did happen.

* System test added to verify that issue is fixed
* Closes elastic#1281
* Refactor some variables for better sync handling and fix tests
* Make offset private and only set it through method
* Rename system tests to close_older as this is what is actually checked
* Update registrar only when file was rotated or changed
* Add test to check state between shutdown and rotation.Only update registry when changes happen.

@urso urso closed this in #1375 Apr 18, 2016

urso added a commit that referenced this issue Apr 18, 2016

Merge pull request #1375 from ruflin/fb-registry
Fix filebeat file rotation registrar issue #1281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.