Dynamic log supression #6132

calmh · 2019-11-07T09:05:34Z

Sometimes we log a shitload of data - when we're stuck retrying failing sync for lots of items, for example, or when there's a bug and we get stuck retrying something like socket accepts thousands of times a second. We do want to log most of this, but at a reasonable rate. Some requirements I have on the logging:

Failed items must be logged. At least once, and with some indication that failures are ongoing. I don't need to see every time everything fails, but a grep for something that has failed must show something.
Repeated log entries are fine if the volume is low. For example, if I manually pause and resume a folder I want log output for that in real time, even if I do it twice in succession.
We should not log repeated log entries infinitely.

I propose that we add some sort of suppression to the logging layer. Something like the following would work for me.

Assume we keep track on the number of log entries produced per minute, over at least the last three to five minutes.
Once every minute we rotate the buckets and make the decision on whether suppression is on or off based on comparing the rate to some threshold. If the rate is low enough to not cause problems (say, <100 messages/minute on average over the last five minutes) we don't do any kind of suppression, if it's higher we enable suppression. There is a log entry when we enable or disable suppression.
Assume also that we keep track of the hashes of log messages we've printed (sans timestamp etc), in the same one minute buckets.
When we're in suppression and a log entry matches an existing hash we don't log it and just bump a counter somewhere instead.
Once a minute we log the number of suppressed messages and clear the counter.

The result would be that under normal circumstances there isn't any suppression. Once we end up in a state with 100k failed items we'd log them once, trip the threshold, next time just say "100k previously seen log messages suppressed since $last_minute" and that'd be that.

The cost here would be keeping hashes of the last few minutes log entries. In normal operation that's roughly zero. When there's a shitload of entries we will incur som memory cost and may want a limit. If we limit ourselves to say 1M log messages tracked that might be maybe 32 megabytes of hashes and the same again in map overhead and whatnot. I think that's an OK price to pay. There could of course be an off switch.

As a special case we could always do a comparison to exactly the last message printed and not repeat that, instead saying "last message repeated 48736 times" once every minute or so. This just for when we get stuck in some buggy accept loop.

With this change, certain identical log messages are filtered out, reducing log volume when there are large numbers of repeats.

imsodin · 2019-11-30T15:02:23Z

From #6134 (comment) by calmh:

So, I think this works, but do we need it? We have fixed the puller output, which is the large logging culprit in the normal case. We have added rotation, which avoids murdering Windows folks' systems by default. We can still get stuck in logging loops due to bugs, but the rotation will act as a fuse for that and for people with systemd/syslog setups that's unfortunate but their log systems can do deduplication as required... I think we're good without this for now.

calmh added the enhancement New features or improvements of some kind, as opposed to a problem (bug) label Nov 7, 2019

calmh added a commit to calmh/syncthing that referenced this issue Nov 7, 2019

lib/logger: Implement suppression (fixes syncthing#6132)

4778d1b

With this change, certain identical log messages are filtered out, reducing log volume when there are large numbers of repeats.

calmh mentioned this issue Nov 7, 2019

lib/logger: Implement suppression (fixes #6132) #6134

Closed

imsodin closed this as completed Nov 30, 2019

st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Nov 30, 2020

syncthing locked and limited conversation to collaborators Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic log supression #6132

Dynamic log supression #6132

calmh commented Nov 7, 2019 •

edited

Loading

imsodin commented Nov 30, 2019

Dynamic log supression #6132

Dynamic log supression #6132

Comments

calmh commented Nov 7, 2019 • edited Loading

imsodin commented Nov 30, 2019

calmh commented Nov 7, 2019 •

edited

Loading