filebeat - inode re-use on OS leading to loss of beats data #1341

billmurrin · 2016-04-07T04:06:11Z

I have two related Elastic beats discussions on the forum.

https://discuss.elastic.co/t/deleting-filebeat-registry-file/46112
https://discuss.elastic.co/t/old-file-with-new-name-found/41766/19

The bottom line is that the way the filebeat registry persists data is causing data loss on my platform.

I have a solution that is providing a log every 10 minutes, that data is archived/compressed hourly so at any given time there are only a handful of uncompressed available until the next time it compresses. Since a log is generated every 10 minutes, this is alot of logs per day and the inodes are getting re-used.

The registry retains a notice of all logs since I started running filebeat. This is causing an inode re-use to get detected as an updated file in which either three different things (haven't pinpointed the why) are happening.

The file picks up at the offset of the original file
The file restarts at offset zero
The file is bypassed and no data is sent.

Here are a few examples from my blog post

This is mybeat log showing an example of when it does not send any data.

2016-03-28T14:10:07Z INFO Old file with new name found: /mydata/hello.20160327.0940.data is no /mydata/hello.20160328.1400.data
2016-03-28T14:10:07Z INFO Detected rename of a previously harvested file: /mydata/hello.20160327.0940.data -> /mydata/hello.20160328.1400.data
2016-03-28T14:10:07Z INFO Harvester started for file: /mydata/hello.20160328.1400.data

Here is the corresponding inode's from the registry file:

"/mydata/hello.20160324.1210.data": {
"source":"/mydata/hello.20160327.0940.data","offset":34145599,"FileStateOS": {
"inode":920,"device":64792}
"/mydata/hello.20160327.0940.data": {
"source":"/mydata/hello.20160328.1400.data","offset":31301200,"FileStateOS": {
"inode":920,"device":64792}`

And here is an example of when the file restarts at offset zero

2016-03-28T14:30:01Z INFO Old file with new name found: /mydata/hello.20160327.1410.data is no /mydata/hello.20160328.1420.data
2016-03-28T14:30:01Z INFO Detected rename of a previously harvested file: /mydata/hello.20160327.1410.data -> /mydata/hello.20160328.1420.data
2016-03-28T14:30:01Z INFO Harvester started for file: /mydata/hello.20160328.1420.data
2016-03-28T14:30:01Z INFO File was truncated. Begin reading file from offset 0: /mydata/hello.20160328.1420.data

Here are the corresponding registry entries for that file:

"/mydata/hello.20160327.1410.data": {
"source":"/mydata/hello.20160328.1420.data","offset":41483772,"FileStateOS": {
"inode":5381,"device":64792}
}
"/mydata/hello.20160328.1420.data": {
"source":"/mydata/hello.20160328.1420.data","offset":32127895,"FileStateOS": {
"inode":5381,"device":64792}
}

After upgrading to 1.2, I still see the same issue occurring. I have ignore_older and close_older set to one hour but it doesn't matter because it believes the file is an update of something it already processed. This issue makes the fidelity of the data too low for me to recommend the solution at this point. I cannot operationalize something with 10 minute gaps in it. Appreciate any help you can provide on getting this issue resolved. Mahalo.

The text was updated successfully, but these errors were encountered:

driskell · 2016-04-07T07:02:01Z

This relates to #1022. Once the registry is maintained and deletions removed it should fix things like this.

billmurrin · 2016-04-07T07:05:32Z

Thank you, also want to add that you guys should add a epoch timestamp or something similar to the registry so that ignore_older can be properly assessed.

ruflin · 2016-04-12T11:55:23Z

@billmurrin As far as I understand, the script you wrote solved your issue? There is also an other issue related to log rotation and overwriting states that is fixed here: #1375 But in your case there is not log rotation?

billmurrin · 2016-04-13T06:29:35Z

I have not cron'd the script I wrote to run automatically so I am still seeing the inode re-use issues periodically, but still far less than before. The log rotation in my case is the archival of the logs. The logs, i guess you could say, rotate out when they get compressed/gzip'd.

morganchristiansson · 2016-05-25T11:11:34Z

Also see # 271

edit: yes - elastic/filebeat#271

ruflin · 2016-05-26T08:10:39Z

@morganchristiansson I assume you were referring to https://github.com/elastic/filebeat/issues/271 ?

ruflin · 2016-07-18T11:14:25Z

@billmurrin This problem should now be solvable through clean_removed and clean_idle / clean_inactive (still needs renaming). #1922 Closing this issue. Would be great if you could test out this feature. It is currently part of the nightly builds and will be shipped soonish with beta1.

andrewkroh added bug Filebeat Filebeat labels Apr 7, 2016

ruflin closed this as completed Jul 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filebeat - inode re-use on OS leading to loss of beats data #1341

filebeat - inode re-use on OS leading to loss of beats data #1341

billmurrin commented Apr 7, 2016

driskell commented Apr 7, 2016

billmurrin commented Apr 7, 2016

ruflin commented Apr 12, 2016

billmurrin commented Apr 13, 2016

morganchristiansson commented May 25, 2016 •

edited

Loading

ruflin commented May 26, 2016

ruflin commented Jul 18, 2016

filebeat - inode re-use on OS leading to loss of beats data #1341

filebeat - inode re-use on OS leading to loss of beats data #1341

Comments

billmurrin commented Apr 7, 2016

driskell commented Apr 7, 2016

billmurrin commented Apr 7, 2016

ruflin commented Apr 12, 2016

billmurrin commented Apr 13, 2016

morganchristiansson commented May 25, 2016 • edited Loading

ruflin commented May 26, 2016

ruflin commented Jul 18, 2016

morganchristiansson commented May 25, 2016 •

edited

Loading