Decouple severity from Nagios plugin return codes #9

saj · 2016-01-04T13:09:23Z

An alternative solution the problem identified in #8.

Compared with #8, this approach should be fully backwards-compatible -- but comes at the cost of additional complexity.

This change is backwards-compatible. Nagios plugins for non-trivial services will often poke multiple data points looking for anomalous or undesirable behaviour. Each data point is often reflected with its own check Result in nagiosplugin. If all data points are within tolerance, we can rely on nagiosplugin to return an OK result to its parent process. Similarly, if one or more data points are outside tolerance, we can rely on nagiosplugin to return a non-OK (WARNING or CRITICAL) result to its parent process. This pattern greatly simplifies plugin development: plugins may focus on the monotonous task of data-gathering, leaving the ultimate 'result join' to the nagiosplugin library. Life get complicated when one or more data points are unable to be queried (for whatever reason). Prior to this change, any single UNKNOWN Result would have superseded all other batched check Results: including other WARNING or CRITICAL Results. Amongst organisations that do not treat an individual UNKNOWN result as a pageable event, this behaviour suppressed actual failures (see #8). This commit introduces the notion of a 'status policy': a mapping between a conventional check status and its severity, relative to other statuses. Results are now ordered by severity instead of the fixed numeric constants defined by Nagios. Organisations may now prioritise plugin return codes to match their established monitoring policy. Unit tests in check_test.go demonstrate use. This decoupling is invisible by default. The default status policy mimics old behaviour. To enable alternative severity prioritisation, the caller must invoke the new NewCheckWithOptions() initialiser.

olorin · 2016-01-13T22:35:29Z

Apologies for the significant delay, been flat out this month. I will look at this tonight.

olorin · 2016-01-14T07:52:30Z

I think this is a good solution; the additional complexity is unfortunate, but I think the flexibility here more than outweighs it - nice. 👍

Thanks for reworking this change, the effort's appreciated - and I'll try to be quicker with the next one. :)

Decouple severity from Nagios plugin return codes

saj mentioned this pull request Jan 4, 2016

Make WARNING/CRITICAL supersede UNKNOWN status #8

Closed

saj changed the title ~~Decouple plugin return codes from result severity~~ Decouple severity from Nagios plugin return codes Jan 4, 2016

olorin added a commit that referenced this pull request Jan 14, 2016

Merge pull request #9 from saj/decouple-severity

45f5074

Decouple severity from Nagios plugin return codes

olorin merged commit 45f5074 into olorin:master Jan 14, 2016

saj deleted the decouple-severity branch January 15, 2016 12:21

saj mentioned this pull request Jan 15, 2016

Add NewOUWCStatusPolicy() and document its use #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple severity from Nagios plugin return codes #9

Decouple severity from Nagios plugin return codes #9

saj commented Jan 4, 2016

olorin commented Jan 13, 2016

olorin commented Jan 14, 2016

Decouple severity from Nagios plugin return codes #9

Decouple severity from Nagios plugin return codes #9

Conversation

saj commented Jan 4, 2016

olorin commented Jan 13, 2016

olorin commented Jan 14, 2016