Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple severity from Nagios plugin return codes #9

Merged
merged 1 commit into from Jan 14, 2016
Merged

Decouple severity from Nagios plugin return codes #9

merged 1 commit into from Jan 14, 2016

Conversation

saj
Copy link
Contributor

@saj saj commented Jan 4, 2016

An alternative solution the problem identified in #8.

Compared with #8, this approach should be fully backwards-compatible -- but comes at the cost of additional complexity.

This change is backwards-compatible.

Nagios plugins for non-trivial services will often poke multiple data
points looking for anomalous or undesirable behaviour.  Each data point
is often reflected with its own check Result in nagiosplugin.  If all
data points are within tolerance, we can rely on nagiosplugin to return
an OK result to its parent process.  Similarly, if one or more data
points are outside tolerance, we can rely on nagiosplugin to return a
non-OK (WARNING or CRITICAL) result to its parent process.  This pattern
greatly simplifies plugin development:  plugins may focus on the
monotonous task of data-gathering, leaving the ultimate 'result join' to
the nagiosplugin library.

Life get complicated when one or more data points are unable to be
queried (for whatever reason).  Prior to this change, any single UNKNOWN
Result would have superseded all other batched check Results:  including
other WARNING or CRITICAL Results.  Amongst organisations that do not
treat an individual UNKNOWN result as a pageable event, this behaviour
suppressed actual failures (see #8).

This commit introduces the notion of a 'status policy':  a mapping
between a conventional check status and its severity, relative to other
statuses.  Results are now ordered by severity instead of the fixed
numeric constants defined by Nagios.  Organisations may now prioritise
plugin return codes to match their established monitoring policy.  Unit
tests in check_test.go demonstrate use.

This decoupling is invisible by default.  The default status policy
mimics old behaviour.  To enable alternative severity prioritisation,
the caller must invoke the new NewCheckWithOptions() initialiser.
@saj saj changed the title Decouple plugin return codes from result severity Decouple severity from Nagios plugin return codes Jan 4, 2016
@olorin
Copy link
Owner

olorin commented Jan 13, 2016

Apologies for the significant delay, been flat out this month. I will look at this tonight.

@olorin
Copy link
Owner

olorin commented Jan 14, 2016

I think this is a good solution; the additional complexity is unfortunate, but I think the flexibility here more than outweighs it - nice. 👍

Thanks for reworking this change, the effort's appreciated - and I'll try to be quicker with the next one. :)

olorin added a commit that referenced this pull request Jan 14, 2016
Decouple severity from Nagios plugin return codes
@olorin olorin merged commit 45f5074 into olorin:master Jan 14, 2016
@saj saj deleted the decouple-severity branch January 15, 2016 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants