Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limit duplicate log lines #2326

Closed
phemmer opened this issue Jan 26, 2017 · 6 comments
Closed

Rate limit duplicate log lines #2326

phemmer opened this issue Jan 26, 2017 · 6 comments

Comments

@phemmer
Copy link
Contributor

phemmer commented Jan 26, 2017

Feature Request

Opening a feature request kicks off a discussion.

Proposal:

Telegraf should rate limit duplicate log entries to prevent the same line from spamming the logs.

Current behavior:

There is no rate limiting on duplicate lines, so telegraf can repeat the same error hundreds of times per second.

Desired behavior:

Telegraf should keep track of what it has logged, and prevent the same log message from being output within an X-second period, where X is configurable.
Note that there should not be a restriction on the duplicate lines being consecutive to be suppressed. Meaning the following should not happen:

00:00:01 Error: foo
00:00:01 Error: bar
00:00:01 Error: foo
00:00:01 Error: bar

Use case: [Why is this important (helps with prioritizing requests)]

When errors occur in some of the plugins, they can be extremely repetitive in their messages. Sometimes generating hundreds of the same messages per second. This can drown out other more quiet errors. It can also result in log destinations being saturated, or filled up (e.g. bandwidth or disk space).

@sparrc sparrc added this to the Future Milestone milestone Jan 27, 2017
@phemmer
Copy link
Contributor Author

phemmer commented Feb 5, 2017

One of the basic problems with solving this, and other things (such as tracking error count per-plugin, or starting telegraf in a new namespace via exec (#2087)), is that some of the plugins write errors directly to STDOUT/STDERR.

We could do something like opening a pipe and redirecting STDOUT/STDERR into the pipe, but with this we don't know which plugin generated the error, and it prevents us from properly tracking errors per plugin (#1348).

We could also force plugins to use Accumulator.AddError() by redirect STDOUT/STDERR to /dev/null. The only downside to this that I can think of is if a plugin wants to log non-error messages.

The only other option I can think of is to provide a per-plugin logger for plugins to use. We would again redirect STDOUT/STDERR to /dev/null.

I think these latter 2 options are the better solutions, as aside from addressing #1348 & #2087, it also makes it much easier for admins to ensure telegraf logs go to syslog. Personally this isn't a problem for me as my systems use systemd, so STDOUT/STDERR is automatically collected into the journal. But for non-systemd users, getting the logs into syslog is a lot harder/messier.

@sparrc thoughts?

@sparrc
Copy link
Contributor

sparrc commented Feb 5, 2017

I think I'd prefer not to have a per-plugin logger, but could be convinced otherwise if it's really needed. Whatever we do, it needs to reflect influxdata/influxdb#7671

@sparrc sparrc closed this as completed Feb 5, 2017
@phemmer
Copy link
Contributor Author

phemmer commented Feb 6, 2017

So what does closing this mean? The issues mentioned will not be addressed?

@sparrc
Copy link
Contributor

sparrc commented Feb 6, 2017

oops, didn't mean to close it

@sparrc sparrc reopened this Feb 6, 2017
@phemmer phemmer mentioned this issue Feb 7, 2017
2 tasks
@danielnelson danielnelson removed this from the Future Milestone milestone Jun 14, 2017
@MyaLongmire
Copy link
Contributor

I believe this is handled by systemd and journalctl. Therefore this does not involve telegraf and I will close the issue. If you are still having problems feel free to open another issue.

@phemmer
Copy link
Contributor Author

phemmer commented May 31, 2022

No, systemd does not handle this. Systemd will rate limit all log lines, not duplicate log lines. In fact this functionality is needed to prevent the systemd rate limit from suppressing important information.

But the primary importance is to prevent spammy plugins from drowning out legitimate messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants