Skip to content

Conversation

@manusfreedom
Copy link

To use all features of:
jordansissel/ruby-filewatch#32

@manusfreedom manusfreedom changed the title - Add follow_only_path option Add follow_only_path option, linked to https://github.com/jordansissel/ruby-filewatch/pull/32 Jun 10, 2015
@purbon
Copy link

purbon commented Oct 27, 2015

please jenkins, test this.

@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

@purbon
Copy link

purbon commented Oct 27, 2015

please jenkins, test this.

@guyboertje
Copy link
Contributor

Filewatch code has now been copied into this plugin code base and been extensively refactored. The changes mentioned above did not make it into the copy.

Our preferred approach is to use fingerprinting which allows us to evaluate whether content has been seen before regardless of path or inode.

On discovery, fingerprints, one way hashes, are taken of a chunk of bytes in two well known offsets in the file. On file discovery, we try to match this file with one we have seen already in the sincedb collection. We try to find a match on the first fingerprint and, if found, verify against the second.
One big challenge is when the discovered file is very small but growing, we have to delay the fingerprint taking until later before we can match. For real tail cases, for a rotated file, the content is new but in read cases where the same content is accidentally copied in we need to build fingerprints before we can match and determine whether new unread content exists beyond where we last read on the previously seen content.

@guyboertje guyboertje closed this May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants