Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug:1665361] Alerts for offline nodes #6

Closed
gluster-ant opened this issue Mar 12, 2020 · 4 comments
Closed

[bug:1665361] Alerts for offline nodes #6

gluster-ant opened this issue Mar 12, 2020 · 4 comments
Labels
Migrated The bugs migrated from bugzilla to Github Type:Bug

Comments

@gluster-ant
Copy link
Collaborator

URL: https://bugzilla.redhat.com/1665361
Creator: nigelb at redhat
Time: 20190111T06:57:04

I want to have a report that tells us which Jenkins nodes are offline and why they're offline. This is offline in terms of Jenkins. We often have failures in a few nodes and it takes us a few weeks to get around to fixing them.

This bug is for a solution as well as implementing it.

Option 1: A jenkins job which makes API calls and sends us an email in case there are machines offline.

Option 2: Nagios check which alerts us. This is slightly more explosive :)

@gluster-ant gluster-ant added Migrated The bugs migrated from bugzilla to Github Type:Bug labels Mar 12, 2020
@gluster-ant
Copy link
Collaborator Author

Time: 20190114T10:23:14
mscherer at redhat commented:
I suspect option 2 is not what we want.

But yeah, nagios do handle this quite well, doing notification, etc, etc. But would still need to do the basic script that do the API call anyway, the difference would be between "send a email", or "do a api call to nagios to trigger a alert", and I think we could switch between thel quite easily if needed.

@gluster-ant
Copy link
Collaborator Author

Time: 20190527T02:09:34
sankarshan at redhat commented:
Is there any decision on whether Option#1 can be implemented? Deepshikha, can we have Naresh to look into this?

@gluster-ant
Copy link
Collaborator Author

Time: 20190527T04:00:05
dkhandel at redhat commented:
According to me we should have it on nagios rather than alerting jenkins job. Nagios is already in place for builders to alert about any memory failures or so. Though I don't receive notifications (that's a different story) but would be good to have just one such source of alerting.

Naresh can look at the script if we agree on this.

@mscherer
Copy link
Collaborator

As we have alerts, I am closing this bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Migrated The bugs migrated from bugzilla to Github Type:Bug
Projects
None yet
Development

No branches or pull requests

2 participants