Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stock Netdata alarms #4727

Open
cakrit opened this issue Nov 23, 2018 · 5 comments

Comments

@cakrit
Copy link
Contributor

@cakrit cakrit commented Nov 23, 2018

From #810 :

We have many things to do, like:

add more alarms for common issues, especially for the monitored applications
fix the threshold and calculations, so that the alarms will be more useful and less chatty
provide better notifications
etc

We need to discuss this, by prioritizing the importance of data collection plugins and finding users willing the help us test new alarms in production.

We need a lot of help from the community on this one, so please help us! A lot of PRs will be linked to this general issue. We will also create other issues linked to this one, asking for help from past contributors.

@Ferroin

This comment has been minimized.

Copy link
Collaborator

@Ferroin Ferroin commented Nov 26, 2018

I intend to be looking at stuff for the applications we can monitor that I actually understand and making PR's to add alarms for them, even if it's just examples that are disabled by default.

@ktsaou

This comment has been minimized.

Copy link
Member

@ktsaou ktsaou commented Nov 26, 2018

@Ferroin this is good.
if you also disable alarms by default on your installations, I think we should change the thresholds or make them more reasonable.

@Ferroin

This comment has been minimized.

Copy link
Collaborator

@Ferroin Ferroin commented Nov 26, 2018

The only ones I disable by default are the softnet ones, but I also only do so after making certain that they hardware can sustain the network workloads the system is supposed to handle.

Personally, I think most of the default thresholds are reasonable, except possibly the disk backlog alarm, but that's going to be so configuration specific that I see little value in trying to fine-tune it.

@ktsaou ktsaou referenced this issue Dec 7, 2018
5 of 14 tasks complete
@shreyabhandare

This comment has been minimized.

Copy link

@shreyabhandare shreyabhandare commented Dec 14, 2018

hi, can someone guide as to how to start with this issue, is this a good place to start for a complete newbie?

@cakrit

This comment has been minimized.

Copy link
Contributor Author

@cakrit cakrit commented Dec 14, 2018

Hi @shreyabhandare and thanks for the offer!
Understanding alarms requires reading this. It's a bit like learning a new language, but it's a newbie's excellent introduction to the key concepts in netdata. After reading the guide, you can find the stock alarms we have for the various collectors and look at improving the alarms for a monitored entity you are very familiar with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.