Skip to content

Commit

Permalink
Set offset of files under ignore_older to file.size()
Browse files Browse the repository at this point in the history
Until now if a file was falling under ignore_older, offset 0 was set for the file. In case the file was updated again, all content of the file would be read. This change will set the offset for files falling under ignore_older to the size of the file. This applies on start / restart for files which were not seen before.

The assumption behind this change is, that files falling under ignore_older are normally not updated, and if these files are updated only the newly added lines are expected to be read no the complete file.

The setting of the offset only happens once when it falls for the first time under ignore_older and no state exists. For files which were harvested and then fall under ignore_older, the offset is already file.size().

As this change only applies to files without a state, this should also not have any side affects on Windows where it can happen that a file was updated but the timestamp wasn't and it falls under ignore_older. The reason it doesn't have an affect is that ignore_older only applies an offset if there is no previous state.
  • Loading branch information
ruflin committed Nov 3, 2016
1 parent 858ea49 commit a516a23
Show file tree
Hide file tree
Showing 5 changed files with 12 additions and 7 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.asciidoc
Expand Up @@ -21,6 +21,9 @@ https://github.com/elastic/beats/compare/v5.0.0...master[Check the HEAD diff]
*Topbeat*

*Filebeat*
- If a file is falling under ignore_older during startup, offset is now set to end of file instead of 0.
With the previous logic the whole file was sent in case a line was added and it was inconsitent with
files which were harvested previously. {pull}2907[2907]

*Winlogbeat*

Expand Down
8 changes: 4 additions & 4 deletions filebeat/docs/faq.asciidoc
Expand Up @@ -20,8 +20,8 @@ Filebeat might be incorrectly configured or unable to send events to the output.

* Make sure the config file specifies the correct path to the file that you are collecting. See <<filebeat-configuration>>
for more information.
* Verify that the file is not older than the value specified by <<ignore-older,`ignore_older`>>. By default, Filebeat
stops reading files that are older than 24 hours. You can change this behavior by specifying a different value for
* Verify that the file is not older than the value specified by <<ignore-older,`ignore_older`>>. ignore_older is disable by
default so this depends on the value you have set. You can change this behavior by specifying a different value for
<<ignore-older,`ignore_older`>>.
* Make sure that Filebeat is able to send events to the configured output. Run Filebeat in debug mode to determine whether
it's publishing events successfully:
Expand All @@ -47,7 +47,7 @@ There are additional configuration options that you can use to close file handle

The `close_renamed` and `close_removed` options can be useful on Windows to resolve issues related to file rotation. See <<windows-file-rotation>>. The `close_eof` option can be useful in environments with a large number of files that have only very few entries. The `close_timeout` option is useful in environments where closing file handlers is more important than sending all log lines. For more details, see <<configuration-filebeat-options>>.

Make sure that you read the documentation for these configuration options before using any of them.
Make sure that you read the documentation for these configuration options before using any of them.

[float]
[[reduce-registry-size]]
Expand Down Expand Up @@ -112,4 +112,4 @@ harvested, a newline character is required after the last line, or Filebeat will
the file.

include::../../libbeat/docs/faq-limit-bandwidth.asciidoc[]
include::../../libbeat/docs/shared-faq.asciidoc[]
include::../../libbeat/docs/shared-faq.asciidoc[]
Expand Up @@ -175,7 +175,7 @@ The files affected by this setting fall into two categories:
* Files that were never harvested
* Files that were harvested but weren't updated for longer than `ignore_older`

When a file that has never been harvested is updated, the reading starts from the beginning as the state of the file was created with the offset 0. For a file that has been harvested previously, reading continues at the last position.
For files which were never seen before, the offset state is set to the end of the file. If a state already exist, the offset is not changed. In case a file is updated again later, reading continues at the set offset position.

The `ignore_older` setting relies on the modification time of the file to determine if a file is ignored. If the modification time of the file is not updated when lines are written to a file (which can happen on Windows), the `ignore_older` setting may cause Filebeat to ignore files even though content was added at a later time.

Expand Down
2 changes: 2 additions & 0 deletions filebeat/prospector/prospector_log.go
Expand Up @@ -298,6 +298,8 @@ func (p *ProspectorLog) handleIgnoreOlder(lastState, newState file.State) error
return nil
}

newState.Offset = newState.Fileinfo.Size()

// Write state for ignore_older file as none exists yet
newState.Finished = true
err := p.Prospector.updateState(input.NewEvent(newState))
Expand Down
4 changes: 2 additions & 2 deletions filebeat/tests/system/test_registrar.py
Expand Up @@ -1328,8 +1328,8 @@ def test_ignore_older_state(self):
data = self.get_registry()
assert len(data) == 1

# Check that offset is 0 even though there is content in it
assert data[0]["offset"] == 0
# Check that offset is set to the end of the file
assert data[0]["offset"] == os.path.getsize(testfile1)

def test_ignore_older_state_clean_inactive(self):
"""
Expand Down

0 comments on commit a516a23

Please sign in to comment.