From a8448ee204c1305771cc488bed628e74c91b9c8d Mon Sep 17 00:00:00 2001 From: Nicolas Ruflin Date: Thu, 3 Nov 2016 15:39:56 +0100 Subject: [PATCH] Set offset of files under ignore_older to file.size() (#2907) * Set offset of files under ignore_older to file.size() Until now if a file was falling under ignore_older, offset 0 was set for the file. In case the file was updated again, all content of the file would be read. This change will set the offset for files falling under ignore_older to the size of the file. This applies on start / restart for files which were not seen before. The assumption behind this change is, that files falling under ignore_older are normally not updated, and if these files are updated only the newly added lines are expected to be read no the complete file. The setting of the offset only happens once when it falls for the first time under ignore_older and no state exists. For files which were harvested and then fall under ignore_older, the offset is already file.size(). As this change only applies to files without a state, this should also not have any side affects on Windows where it can happen that a file was updated but the timestamp wasn't and it falls under ignore_older. The reason it doesn't have an affect is that ignore_older only applies an offset if there is no previous state. * Add doc line --- CHANGELOG.asciidoc | 3 +++ filebeat/docs/faq.asciidoc | 8 ++++---- .../reference/configuration/filebeat-options.asciidoc | 2 +- filebeat/prospector/prospector_log.go | 4 ++++ filebeat/tests/system/test_registrar.py | 4 ++-- 5 files changed, 14 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc index f672fd38bce..3953d39c2a5 100644 --- a/CHANGELOG.asciidoc +++ b/CHANGELOG.asciidoc @@ -21,6 +21,9 @@ https://github.com/elastic/beats/compare/v5.0.0...master[Check the HEAD diff] *Topbeat* *Filebeat* +- If a file is falling under ignore_older during startup, offset is now set to end of file instead of 0. + With the previous logic the whole file was sent in case a line was added and it was inconsitent with + files which were harvested previously. {pull}2907[2907] *Winlogbeat* diff --git a/filebeat/docs/faq.asciidoc b/filebeat/docs/faq.asciidoc index b00506055d7..fbb1bbbd46b 100644 --- a/filebeat/docs/faq.asciidoc +++ b/filebeat/docs/faq.asciidoc @@ -20,8 +20,8 @@ Filebeat might be incorrectly configured or unable to send events to the output. * Make sure the config file specifies the correct path to the file that you are collecting. See <> for more information. -* Verify that the file is not older than the value specified by <>. By default, Filebeat -stops reading files that are older than 24 hours. You can change this behavior by specifying a different value for +* Verify that the file is not older than the value specified by <>. ignore_older is disable by +default so this depends on the value you have set. You can change this behavior by specifying a different value for <>. * Make sure that Filebeat is able to send events to the configured output. Run Filebeat in debug mode to determine whether it's publishing events successfully: @@ -47,7 +47,7 @@ There are additional configuration options that you can use to close file handle The `close_renamed` and `close_removed` options can be useful on Windows to resolve issues related to file rotation. See <>. The `close_eof` option can be useful in environments with a large number of files that have only very few entries. The `close_timeout` option is useful in environments where closing file handlers is more important than sending all log lines. For more details, see <>. -Make sure that you read the documentation for these configuration options before using any of them. +Make sure that you read the documentation for these configuration options before using any of them. [float] [[reduce-registry-size]] @@ -112,4 +112,4 @@ harvested, a newline character is required after the last line, or Filebeat will the file. include::../../libbeat/docs/faq-limit-bandwidth.asciidoc[] -include::../../libbeat/docs/shared-faq.asciidoc[] \ No newline at end of file +include::../../libbeat/docs/shared-faq.asciidoc[] diff --git a/filebeat/docs/reference/configuration/filebeat-options.asciidoc b/filebeat/docs/reference/configuration/filebeat-options.asciidoc index 8ad3b368d19..80257f03d15 100644 --- a/filebeat/docs/reference/configuration/filebeat-options.asciidoc +++ b/filebeat/docs/reference/configuration/filebeat-options.asciidoc @@ -175,7 +175,7 @@ The files affected by this setting fall into two categories: * Files that were never harvested * Files that were harvested but weren't updated for longer than `ignore_older` -When a file that has never been harvested is updated, the reading starts from the beginning as the state of the file was created with the offset 0. For a file that has been harvested previously, reading continues at the last position. +For files which were never seen before, the offset state is set to the end of the file. If a state already exist, the offset is not changed. In case a file is updated again later, reading continues at the set offset position. The `ignore_older` setting relies on the modification time of the file to determine if a file is ignored. If the modification time of the file is not updated when lines are written to a file (which can happen on Windows), the `ignore_older` setting may cause Filebeat to ignore files even though content was added at a later time. diff --git a/filebeat/prospector/prospector_log.go b/filebeat/prospector/prospector_log.go index 8551070b720..3e779555d40 100644 --- a/filebeat/prospector/prospector_log.go +++ b/filebeat/prospector/prospector_log.go @@ -298,6 +298,10 @@ func (p *ProspectorLog) handleIgnoreOlder(lastState, newState file.State) error return nil } + // Set offset to end of file to be consistent with files which were harvested before + // See https://github.com/elastic/beats/pull/2907 + newState.Offset = newState.Fileinfo.Size() + // Write state for ignore_older file as none exists yet newState.Finished = true err := p.Prospector.updateState(input.NewEvent(newState)) diff --git a/filebeat/tests/system/test_registrar.py b/filebeat/tests/system/test_registrar.py index b4a8ced0dfa..c2ea05abf14 100644 --- a/filebeat/tests/system/test_registrar.py +++ b/filebeat/tests/system/test_registrar.py @@ -1328,8 +1328,8 @@ def test_ignore_older_state(self): data = self.get_registry() assert len(data) == 1 - # Check that offset is 0 even though there is content in it - assert data[0]["offset"] == 0 + # Check that offset is set to the end of the file + assert data[0]["offset"] == os.path.getsize(testfile1) def test_ignore_older_state_clean_inactive(self): """