Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat - inode re-use on OS leading to loss of beats data #1341

Closed
billmurrin opened this issue Apr 7, 2016 · 7 comments
Closed

filebeat - inode re-use on OS leading to loss of beats data #1341

billmurrin opened this issue Apr 7, 2016 · 7 comments
Labels

Comments

@billmurrin
Copy link

I have two related Elastic beats discussions on the forum.

https://discuss.elastic.co/t/deleting-filebeat-registry-file/46112
https://discuss.elastic.co/t/old-file-with-new-name-found/41766/19

The bottom line is that the way the filebeat registry persists data is causing data loss on my platform.

I have a solution that is providing a log every 10 minutes, that data is archived/compressed hourly so at any given time there are only a handful of uncompressed available until the next time it compresses. Since a log is generated every 10 minutes, this is alot of logs per day and the inodes are getting re-used.

The registry retains a notice of all logs since I started running filebeat. This is causing an inode re-use to get detected as an updated file in which either three different things (haven't pinpointed the why) are happening.

  • The file picks up at the offset of the original file
  • The file restarts at offset zero
  • The file is bypassed and no data is sent.

Here are a few examples from my blog post

This is mybeat log showing an example of when it does not send any data.

2016-03-28T14:10:07Z INFO Old file with new name found: /mydata/hello.20160327.0940.data is no /mydata/hello.20160328.1400.data
2016-03-28T14:10:07Z INFO Detected rename of a previously harvested file: /mydata/hello.20160327.0940.data -> /mydata/hello.20160328.1400.data
2016-03-28T14:10:07Z INFO Harvester started for file: /mydata/hello.20160328.1400.data

Here is the corresponding inode's from the registry file:

"/mydata/hello.20160324.1210.data": {
"source":"/mydata/hello.20160327.0940.data","offset":34145599,"FileStateOS": {
"inode":920,"device":64792}
"/mydata/hello.20160327.0940.data": {
"source":"/mydata/hello.20160328.1400.data","offset":31301200,"FileStateOS": {
"inode":920,"device":64792}`

And here is an example of when the file restarts at offset zero

2016-03-28T14:30:01Z INFO Old file with new name found: /mydata/hello.20160327.1410.data is no /mydata/hello.20160328.1420.data
2016-03-28T14:30:01Z INFO Detected rename of a previously harvested file: /mydata/hello.20160327.1410.data -> /mydata/hello.20160328.1420.data
2016-03-28T14:30:01Z INFO Harvester started for file: /mydata/hello.20160328.1420.data
2016-03-28T14:30:01Z INFO File was truncated. Begin reading file from offset 0: /mydata/hello.20160328.1420.data

Here are the corresponding registry entries for that file:

"/mydata/hello.20160327.1410.data": {
"source":"/mydata/hello.20160328.1420.data","offset":41483772,"FileStateOS": {
"inode":5381,"device":64792}
}
"/mydata/hello.20160328.1420.data": {
"source":"/mydata/hello.20160328.1420.data","offset":32127895,"FileStateOS": {
"inode":5381,"device":64792}
}

After upgrading to 1.2, I still see the same issue occurring. I have ignore_older and close_older set to one hour but it doesn't matter because it believes the file is an update of something it already processed. This issue makes the fidelity of the data too low for me to recommend the solution at this point. I cannot operationalize something with 10 minute gaps in it. Appreciate any help you can provide on getting this issue resolved. Mahalo.

@driskell
Copy link

driskell commented Apr 7, 2016

This relates to #1022. Once the registry is maintained and deletions removed it should fix things like this.

@billmurrin
Copy link
Author

Thank you, also want to add that you guys should add a epoch timestamp or something similar to the registry so that ignore_older can be properly assessed.

@ruflin
Copy link
Member

ruflin commented Apr 12, 2016

@billmurrin As far as I understand, the script you wrote solved your issue? There is also an other issue related to log rotation and overwriting states that is fixed here: #1375 But in your case there is not log rotation?

@billmurrin
Copy link
Author

I have not cron'd the script I wrote to run automatically so I am still seeing the inode re-use issues periodically, but still far less than before. The log rotation in my case is the archival of the logs. The logs, i guess you could say, rotate out when they get compressed/gzip'd.

@morganchristiansson
Copy link

morganchristiansson commented May 25, 2016

Also see # 271

edit: yes - elastic/filebeat#271

@ruflin
Copy link
Member

ruflin commented May 26, 2016

@morganchristiansson I assume you were referring to https://github.com/elastic/filebeat/issues/271 ?

@ruflin
Copy link
Member

ruflin commented Jul 18, 2016

@billmurrin This problem should now be solvable through clean_removed and clean_idle / clean_inactive (still needs renaming). #1922 Closing this issue. Would be great if you could test out this feature. It is currently part of the nightly builds and will be shipped soonish with beta1.

@ruflin ruflin closed this as completed Jul 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants