-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suppressing alerts programatically #2673
Comments
This is a nice idea. I could probably provide a method for this to be done over a unix socket file...
alarms are always applied before templates, no matter how they are read. I now checked the system calls, and it seems however that if overlapping templates or alarms are given, the result is random (ie 2 alarms with the same name, or 2 templates with the same name). I will try to fix this... |
Thanks. Thinking about it a bit more, I probably want to suppress the entire template rather than the individual alarms - that saves me having to configure which disks are part of the RAID array, because probably they all are. |
While I would really like this feature also for other things than Disk I/O, couldn't the solution for this to be instead that netdata explicitly reads |
Related to #3187 |
I have been working on this one instead of #3187, which adds more complexity. The API will support silencing/enabling all health checks, but also selective alarms and templates, for specific hosts, families, charts and contexts, with various combinations of the criteria that may make sense. I expect it will be finished next week. |
Moving to next sprint due to #5017 |
##### Summary fixes #2673 fixes #2149 fixes #5017 fixes #3830 fixes #3187 fixes #5154 Implements a command API for health which will accept commands via a socket to selectively suppress health checks. Allows different ports to accept different request types (streaming, dashboard, api, registry, netdata.conf, badges, management) Removes support for multi-threaded and single-threaded web servers. ##### Component Name health, daemon
##### Summary fixes netdata#2673 fixes netdata#2149 fixes netdata#5017 fixes netdata#3830 fixes netdata#3187 fixes netdata#5154 Implements a command API for health which will accept commands via a socket to selectively suppress health checks. Allows different ports to accept different request types (streaming, dashboard, api, registry, netdata.conf, badges, management) Removes support for multi-threaded and single-threaded web servers. ##### Component Name health, daemon
I got alerts overnight of high disk utilisation and backlog from about 1am to 4am last night.
It was more than 1 hour later that I looked at them, but I think I have found the culprit: it's the monthly mdadm RAID scrub, which starts at 00:57 on the first Sunday of the month.
What I would like to do is have some way to automatically silence/suppress the alert here. That is, modify the cronjob to:
So while disks.conf defines template
10min_disk_utilization
, I just want to disable specific alarm instancesdisk_util.sda.10min_disk_utilization
anddisk_util.sdb.10min_disk_utilization
I see that it should be possible:
As far as I can see, the health API only allows querying alarms, not controlling them, so I think I have to drop a file under
/opt/netdata/etc/netdata/health.d/
and then send a SIGUSR2. Is that correct?What order are these files read in - e.g. alphabetically - and does it matter here? That is, if netdata reads an 'alarm' definition before the corresponding 'template', will it still do the right thing?
And what's the best way to make a "null alarm" which does nothing, to override an alarm from a template?
Thanks,
Brian.
The text was updated successfully, but these errors were encountered: