Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring a specific process #7660

Closed
tinyhammers opened this issue Jan 2, 2020 · 4 comments
Closed

Monitoring a specific process #7660

tinyhammers opened this issue Jan 2, 2020 · 4 comments
Labels
area/health no changelog Issues which are not going to be added to changelog question

Comments

@tinyhammers
Copy link
Contributor

Happy New Year! Hope you all had a nice break away from those pesky computers.
But, new year, new Netdata question.
Caveat: I opened my laptop this morning and couldn't remember what my job is, let alone how to do it, so I could be missing the obvious here.

I need to monitor a specific process (in this case, Icecast). I've added icecast: *icecast* to apps_groups.conf and I've got it showing as a dimension in the apps.process and apps.threads charts. This is awesome.

What I need to do now is raise an alarm if the number of processes drops below 1.
I've stolen the template from #4614 and have sort of got an idea, but I can't figure out how to make do what I want.

Currently I have this

   on: apps.processes
   os: linux
hosts: *
families: *
lookup: min -1s unaligned of icecast
units: processes
every: 10s
 crit: $this == 0
delay: up 10s down 1m multiplier 2 max 10m
 info: Icecast has died
   to: sysadmin

As you can see, it's pretty much copypasta from #4614. I removed the warning line as I don't want a warning, I just need to know if Icecast is running or not.
I assumed that crit: $this == 0 would then shout if there were no processes running.

It does not.

Was crossing my fingers that the lookup line would work as it did in issue 4614, but maybe the problem is there?
Totally confused, and still full of cheese from Xmas to be honest.

@tinyhammers tinyhammers added no changelog Issues which are not going to be added to changelog question labels Jan 2, 2020
@thiagoftsm
Copy link
Contributor

Hi @tinyhammers ,

The line families: * is applied only when you have a template, but here you are configuring an alarm.
Do you have any information about this alarm in your error.log?
Netdata also has an example in our documentation https://docs.netdata.cloud/health/reference/#example-1.
You also can find a complete example in this thread #873 (comment) .

Best regards!

@ilyam8
Copy link
Member

ilyam8 commented Jan 2, 2020

@tinyhammers

try this

alarm: apps_icecast_processes
   on: apps.processes
 calc: $icecast
units: processes
every: 10s
 crit: $this == nan OR $this == 0
delay: up 10s down 1m multiplier 2 max 10m
 info: icecast has died
   to: sysadmin

@tinyhammers
Copy link
Contributor Author

@ilyam8 ❤️
Best new years gift ever.
That is working perfectly, thank you so much.
Could you explain what the line crit: $this == nan OR $this == 0 is doing, so I can try and get a better understanding of how it works?
Thanks again for helping. You guys are amazing 😍

@ilyam8
Copy link
Member

ilyam8 commented Jan 3, 2020

@tinyhammers ☺️

actually should be

alarm: apps_icecast_processes
   on: apps.processes
 calc: $icecast
units: processes
every: 10s
 crit: $this == nan
delay: up 10s down 1m multiplier 2 max 10m
 info: icecast is not up
   to: sysadmin

$this == 0 doesnt work, because if there is no processes the value is nan, not 0.

   on: apps.processes
 calc: $icecast
 crit: $this == nan

you can read it as: chart apps.processes has no icecast dimension.

@ilyam8 ilyam8 closed this as completed Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/health no changelog Issues which are not going to be added to changelog question
Projects
None yet
Development

No branches or pull requests

3 participants