Fix: sincedb_clean_after not being respected #276

kares · 2020-09-10T11:27:17Z

In certain cases, in read mode such as the one described in #250 sincedb is not updated.

There was no periodic check for sincedb updates that would cause the sincedb_clean_after entries to cleanup.
The cleanup relied on new files being discovered or new content being added to existing files -> causing sincedb updates.

The fix here is to periodically flush sincedb (from the watch loop).
Besides, to make the process more deterministic, there's a minor change to make sure the same "updated" timestamp is used to mark the last changed time.

resolves #250
expected to also resolve #260

for now the plugin relied on other files coming in or getting new content - would cause the sincedb flush to eventually trigger (due changes). however, if there isn't any activity going on we still need to cleanup sincedb periodically to respect the clean_after setting...

kares · 2020-09-10T15:15:30Z

CI 🔴 is intermittent here's a 🟢 https://travis-ci.org/github/kares/logstash-input-file/builds/725910454

colinsurprenant

LGTM!

yaauie · 2020-09-10T18:47:22Z

lib/filewatch/watch.rb

        sleep(@settings.stat_interval)
+        # we need to check potential expired keys (sincedb_clean_after) periodically
+        sincedb_collection.flush_at_interval


🤔

We already have a SincedbCollection#write_if_requested in this until quit? loop, which conditionally sends SincedbCollection#flush_at_interval IFF a write has been requested. Should we simply change this to send SincedbCollection#flush_at_interval instead, or is there a valid reason to sometimes flush twice in a loop?

Also, this isn't new to your PR, but it looks like we are passing an instance of SincedbCollection around a fair bit, with this flush_at_interval and other state-mutating methods being invoked by potentially multiple threads simultaneously 😩. We should probably file a separate issue to audit the thread-safety.

as the described in the bug report and in the description the problem is when no writes are requested (no new discovered files or new content in watched files) write_if_requested is simply a noop and thus sincedb_clean_after won't get triggered.

the plugin should guarantee a cleanup of expired sincedb entried regardless of whether there's actual new content. currently cleanup is only guaranteed to happen on shutdown.

have decided to add an explicit flush_at_interval along side while keeping the existing write_if_requested, since they accomplish different things we want to happen in a loop a) write changes if any b) guarantee periodic sincedb flushes. the actual write won't happen twice in the loop since flushing only happens if sincedb_write_interval time has passed.

the whole thing is ripe for deeper refactoring but unfortunately I do not have the cycles to get into that.

@yaauie is your preference still (given ^^^) :

I think that the change to FileWatch::Watch#subscribe can be simplified a bit by jumping straight to invoking SincedbCollection#flush_at_interval exactly once per loop instead of invoking it once-or-twice.

do not have strong opinions for either way given the current state of code.

You're right, that the actual flush is safeguarded to not flush more often than the interval, so it doesn't matter much if we attempt it once or twice in the loop. I'll update my review to an LGTM.

yaauie

+1 to the reorganization and the as_of change to make sure we mark the time-of-action correctly in cases where the clock advances while we are preparing to flush.

I think that the change to FileWatch::Watch#subscribe can be simplified a bit by jumping straight to invoking SincedbCollection#flush_at_interval exactly once per loop instead of invoking it once-or-twice.

yaauie

LGTM 👍

kares added 3 commits September 10, 2020 11:58

Refactor: keep same ts for serializer writes

ab0cca9

Refactor: remove (confusing) un-used method

62f2f2d

kares changed the title ~~Sincedb clean~~ Fix: sincedb_clean_after not being respected Sep 10, 2020

kares added 2 commits September 10, 2020 13:38

Test: try harder to verify (due slow CI)

5484bf1

Changelog and version bump

a5a002e

kares assigned elasticsearch-bot Sep 10, 2020

colinsurprenant approved these changes Sep 10, 2020

View reviewed changes

yaauie assigned yaauie and unassigned elasticsearch-bot Sep 10, 2020

yaauie reviewed Sep 10, 2020

View reviewed changes

yaauie approved these changes Sep 25, 2020

View reviewed changes

kares merged commit 20db56d into logstash-plugins:master Sep 25, 2020

andsel mentioned this pull request Oct 16, 2020

Update patch plugin versions in gemfile lock for 6.8.13 elastic/logstash#12352

Closed

kares mentioned this pull request Nov 11, 2020

Same inode being used - SinceDB not updated properly with logstash-input-file v4.2.1 #279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: sincedb_clean_after not being respected #276

Fix: sincedb_clean_after not being respected #276

kares commented Sep 10, 2020 •

edited

Loading

kares commented Sep 10, 2020

colinsurprenant left a comment

yaauie Sep 10, 2020

kares Sep 11, 2020 •

edited

Loading

kares Sep 16, 2020

yaauie Sep 25, 2020

yaauie left a comment

yaauie left a comment

Fix: sincedb_clean_after not being respected #276

Fix: sincedb_clean_after not being respected #276

Conversation

kares commented Sep 10, 2020 • edited Loading

kares commented Sep 10, 2020

colinsurprenant left a comment

Choose a reason for hiding this comment

yaauie Sep 10, 2020

Choose a reason for hiding this comment

kares Sep 11, 2020 • edited Loading

Choose a reason for hiding this comment

kares Sep 16, 2020

Choose a reason for hiding this comment

yaauie Sep 25, 2020

Choose a reason for hiding this comment

yaauie left a comment

Choose a reason for hiding this comment

yaauie left a comment

Choose a reason for hiding this comment

kares commented Sep 10, 2020 •

edited

Loading

kares Sep 11, 2020 •

edited

Loading