Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added updating of existing Incidents so new incidents are contanstly … #15

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rjr162
Copy link
Contributor

@rjr162 rjr162 commented Dec 27, 2016

…created with each new warning/critical

Added a metric update option by using the event_handler in the following fashion:

    event_handler   cachet_notify!<host> -m=true

The cachet_notify script checks for the presence of '-m=true' in the string and then breaks out the first 'word' prior to the first space as the component name

…created with each new warning/critical

Added a metric update option by using the event_handler in the following fashion:

        event_handler   cachet_notify!<host> -m=true

The cachet_notify script checks for the presence of '-m=true' in the string and then breaks out the first 'word' prior to the first space as the component name

Edits by Ron Rossman Jr <ronrossman@gmail.com>
@rjr162
Copy link
Contributor Author

rjr162 commented Dec 27, 2016

It's been a while since the last merge and feedback, so I think I have all the original issues resolved and added in a metric component for anyone using metrics on the page load times. Downside is I don't think there's a way to tell Cachet to not auto-update at the set interval where the lowest value is 1 which can create a bit of wonkiness with the chart (although that's an issue on the cachet auto-update side of things and the fact Nagios only fires the event handler when there's a status update change). It may be better to use the python cachet utility for URL checking if you want something that just runs and updates the page load time metric on a consistent basis.

to use you just add -m=true after the component name in the event handler. Give it a test if you wish and give some feedback. You may need to pre-create the metric in Cachet for it to work right. If it's too poor, no issues yanking that part out

@2Belette
Copy link

2Belette commented Apr 6, 2017

It has been a while I haven't gave you feedback as I had no time to test and the server needed to be re-installed, it is done now :)
I have tested but I have an issue and I keep having multiple event created instead of having one event updated using -m=true

I also tried to test using :

./cachet_notify 'host.fr' 'dispo' CRITICAL HARD 'test service down' -m=true

I got

KO HARD: creating incident
Array
(
    [name] => nagios dispo
    [message] => test service down
    [status] => 1
    [visible] => 1
    [component_id] => 5
    [component_status] => 4
    [notify] => 1
)

But if I do a

./cachet_notify 'host.fr' 'dispo' OK HARD 'test service down' -m=true

I got:

OK Hard: creating incident
Array
(
    [name] => nagios dispo
    [message] => test service down
    [status] => 4
    [visible] => 1
    [component_id] => 5
    [component_status] => 1
    [notify] => 1
)
OK HARD: updating incident
Can't find incident "nagios dispo"

And on Cachet I still got two incident created: one for the CRITICAL, one for the OK when it goes back to normal.

For Nagios alert I got the same issue, or sometimes it does't update Cachet at all...

Any idea?
many thanks

EDIT:
I am still trying to understand the issue, in the meantime I confirm to you that the past pull request you made to solve the issue of going back to Normal status after CRITICAL or WARNING seems to work well :)

Another thing is at the begging of cachet_notify you make a test against the number of parameters, I think this has to be extended to 7 as -m=true is adding one more, I needed to change it to make it work

@rjr162
Copy link
Contributor Author

rjr162 commented Apr 8, 2017 via email

@2Belette
Copy link

Thanks for your reply ;)

Another thing I am thinking about is that it would be usefull to select which alerts we want to receive from Nagios. For example I have "hacked" your script to exit(0) for Warning Soft, as if I don't do that Cachet is receiving too much false positive from Nagios on my installation.

Would be great to add a parameter to say -warning or -critical where -warning includes both and -critical only critical alerts.

Just an idea

PS: I confirm the metrics are messed-up and Cachet keeps creating multiple event and doesn't update the same

Many thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants