FatalContext #3

mpounsett · 2015-01-14T12:25:01Z

Original report by Christian Kauhaus (Bitbucket: ckauhaus, GitHub: ckauhaus).

I'd like to share one small bit of code which I found myself reusing
again and again, for situations in which you need want to signal that
something went wrong, but cannot do that by crossing a (possibly
unknown) threshold or raising an exception (because an exception might
actually mean critical failure instead of unknown).
This is the idiom I use:

#!python
class FatalContext(nagiosplugin.Context):
    def __init__(self, name):
        super(FatalContext, self).__init__(name)

    def evaluate(self, metric, resource):
        return nagiosplugin.Result(nagiosplugin.state.Critical,
                                   "Fatal Error: %r" % metric.value,
                                   metric
            )

...

    archive_fatal_ctx = FatalContext('fatal_archive')

...
            try:
                sess = get_session(session_id)
            except IOError as e:
                yield nagiosplugin.Metric('fatal_%s' % self.name_postfix, repr(e))
                return
...

If you find this a useful scenario as well, go ahead and put it into the
nagiosplugin distribution.

The text was updated successfully, but these errors were encountered:

mpounsett · 2022-02-06T06:14:28Z

I think any kind of exception being raised during the taking of a measurement means that the measurement couldn't be taken. By definition that can't be a critical or warning, since you don't know the result of the measurement you were trying to take.

If, for example, you want to know "is this port answering" and the answer is "no" then the connect failure exception should be caught and a False or 0 (zero) result returned. If you're trying to measure whether a daemon is returning the correct content and you're getting an exception because of a timeout, that is the very definition of UNKNOWN. I don't think these two examples should be mixed in a single test, which is the sort of thing that would lead to an unexpected exception leading you to want to return CRITICAL.

For that reason I'm going to mark this wontfix, because I don't think it's a bug that unhandled exceptions result in an UNKNOWN state.

If you've got an argument for why I'm wrong I'm willing to entertain the idea... I just can't think of a use case where I think this is a good idea.

mpounsett added major library feature New feature or request needs review Needs developer review before assigning a milestone and removed library labels Nov 7, 2019

SwampFalc mentioned this issue Nov 4, 2020

Best practice for "missing" metrics? #30

Closed

mpounsett removed the needs review Needs developer review before assigning a milestone label Feb 6, 2022

mpounsett closed this as completed Feb 6, 2022

mpounsett added the wontfix This will not be worked on label Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FatalContext #3

FatalContext #3

mpounsett commented Jan 14, 2015

mpounsett commented Feb 6, 2022

FatalContext #3

FatalContext #3

Comments

mpounsett commented Jan 14, 2015

mpounsett commented Feb 6, 2022