Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve stock Netdata alarms #4727

Closed
cakrit opened this issue Nov 23, 2018 · 6 comments
Closed

Improve stock Netdata alarms #4727

cakrit opened this issue Nov 23, 2018 · 6 comments
Assignees
Labels
area/health good first issue A nice issue that can be handled by first time contributors help wanted priority/medium

Comments

@cakrit
Copy link
Contributor

cakrit commented Nov 23, 2018

From #810 :

We have many things to do, like:

add more alarms for common issues, especially for the monitored applications
fix the threshold and calculations, so that the alarms will be more useful and less chatty
provide better notifications
etc

We need to discuss this, by prioritizing the importance of data collection plugins and finding users willing the help us test new alarms in production.

We need a lot of help from the community on this one, so please help us! A lot of PRs will be linked to this general issue. We will also create other issues linked to this one, asking for help from past contributors.

@cakrit cakrit added help wanted area/health good first issue A nice issue that can be handled by first time contributors labels Nov 23, 2018
@cakrit cakrit self-assigned this Nov 23, 2018
@Ferroin
Copy link
Member

Ferroin commented Nov 26, 2018

I intend to be looking at stuff for the applications we can monitor that I actually understand and making PR's to add alarms for them, even if it's just examples that are disabled by default.

@ktsaou
Copy link
Member

ktsaou commented Nov 26, 2018

@Ferroin this is good.
if you also disable alarms by default on your installations, I think we should change the thresholds or make them more reasonable.

@Ferroin
Copy link
Member

Ferroin commented Nov 26, 2018

The only ones I disable by default are the softnet ones, but I also only do so after making certain that they hardware can sustain the network workloads the system is supposed to handle.

Personally, I think most of the default thresholds are reasonable, except possibly the disk backlog alarm, but that's going to be so configuration specific that I see little value in trying to fine-tune it.

@shrebhan
Copy link

hi, can someone guide as to how to start with this issue, is this a good place to start for a complete newbie?

@cakrit
Copy link
Contributor Author

cakrit commented Dec 14, 2018

Hi @shreyabhandare and thanks for the offer!
Understanding alarms requires reading this. It's a bit like learning a new language, but it's a newbie's excellent introduction to the key concepts in netdata. After reading the guide, you can find the stock alarms we have for the various collectors and look at improving the alarms for a monitored entity you are very familiar with.

@cakrit
Copy link
Contributor Author

cakrit commented Jan 21, 2020

No activity on this one. We'll improve case-by-case.

@cakrit cakrit closed this as completed Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health good first issue A nice issue that can be handled by first time contributors help wanted priority/medium
Projects
None yet
Development

No branches or pull requests

4 participants