-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: sincedb_clean_after not being respected #276
Conversation
for now the plugin relied on other files coming in or getting new content - would cause the sincedb flush to eventually trigger (due changes). however, if there isn't any activity going on we still need to cleanup sincedb periodically to respect the clean_after setting...
CI 🔴 is intermittent here's a 🟢 https://travis-ci.org/github/kares/logstash-input-file/builds/725910454 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
sleep(@settings.stat_interval) | ||
# we need to check potential expired keys (sincedb_clean_after) periodically | ||
sincedb_collection.flush_at_interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
We already have a SincedbCollection#write_if_requested
in this until quit?
loop, which conditionally sends SincedbCollection#flush_at_interval
IFF a write has been requested. Should we simply change this to send SincedbCollection#flush_at_interval
instead, or is there a valid reason to sometimes flush twice in a loop?
Also, this isn't new to your PR, but it looks like we are passing an instance of SincedbCollection
around a fair bit, with this flush_at_interval
and other state-mutating methods being invoked by potentially multiple threads simultaneously 😩. We should probably file a separate issue to audit the thread-safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the described in the bug report and in the description the problem is when no writes are requested (no new discovered files or new content in watched files) write_if_requested
is simply a noop and thus sincedb_clean_after
won't get triggered.
the plugin should guarantee a cleanup of expired sincedb entried regardless of whether there's actual new content. currently cleanup is only guaranteed to happen on shutdown.
have decided to add an explicit flush_at_interval
along side while keeping the existing write_if_requested
, since they accomplish different things we want to happen in a loop a) write changes if any b) guarantee periodic sincedb flushes. the actual write won't happen twice in the loop since flushing only happens if sincedb_write_interval
time has passed.
the whole thing is ripe for deeper refactoring but unfortunately I do not have the cycles to get into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaauie is your preference still (given ^^^) :
I think that the change to FileWatch::Watch#subscribe can be simplified a bit by jumping straight to invoking SincedbCollection#flush_at_interval exactly once per loop instead of invoking it once-or-twice.
do not have strong opinions for either way given the current state of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that the actual flush is safeguarded to not flush more often than the interval, so it doesn't matter much if we attempt it once or twice in the loop. I'll update my review to an LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the reorganization and the as_of
change to make sure we mark the time-of-action correctly in cases where the clock advances while we are preparing to flush.
I think that the change to FileWatch::Watch#subscribe
can be simplified a bit by jumping straight to invoking SincedbCollection#flush_at_interval
exactly once per loop instead of invoking it once-or-twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
In certain cases, in read mode such as the one described in #250 sincedb is not updated.
There was no periodic check for sincedb updates that would cause the
sincedb_clean_after
entries to cleanup.The cleanup relied on new files being discovered or new content being added to existing files -> causing sincedb updates.
The fix here is to periodically flush sincedb (from the watch loop).
Besides, to make the process more deterministic, there's a minor change to make sure the same "updated" timestamp is used to mark the last changed time.
resolves #250
expected to also resolve #260