Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allow NaN to be produced as a metric #10

Merged
merged 1 commit into from
Oct 11, 2020
Merged

Conversation

paulgear
Copy link
Owner

@paulgear paulgear commented Oct 5, 2020

@rodrigogansobarbieri This is my alternative to PR #7. I don't think any metrics should be producing NaN at all - it's better to leave them out than to report NaN to Nagios or Telegraf. Let me know what you think. I think this should at least prevent the "CRITICAL: offset is out of range (nan)" nonsense, but I need to develop some test cases and see if it makes sense every time.

@rodrigogansobarbieri
Copy link

Hi Paul. I tested the change and it looks good! It addresses the "nan" case and prints the real offset. However, since it also happens with the case of having no sync peer, shouldn't this error message have a higher priority to prevent obfuscating this condition to Nagios?

@paulgear
Copy link
Owner Author

paulgear commented Oct 5, 2020

Thanks @rodrigogansobarbieri - I think this will still work, because in the case you provided, offset was still reasonable, even though the system had no sync peer. I'll make sure that gets included as one of my test cases.

I have no plans to change the logic so that loss of a sync peer is OK. I'm open to arguments to the contrary, but I think a better solution is to use Telegraf instead of Nagios and use Prometheus alerting (or the TICK stack equivalent) to alert when sync is lost for a period of time. (cc: @afreiberger in issue #6) Solving this in the Nagios check would require maintaining state across separate runs of the check, which is not something it has any support for at the moment.

@paulgear
Copy link
Owner Author

I'm going to merge this as-is for now, and add the unit tests in a later PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants