Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAlert if docker container stops/dies #1504
Comments
brian-brazil
added
the
question
label
Mar 24, 2016
This comment has been minimized.
This comment has been minimized.
|
There's no easy way to do this. Rather than asking "did a container die" it's better to ask "do I have enough containers running?" or "is my latency acceptable?", as a dead container doesn't automatically mean that there's any user impact or human involvement required. This can be done by aggregation on |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil is there an example of the aggregation of up() and another alert? |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
trompx
commented
Mar 29, 2016
|
Hello, I have the same problem and was wondering if at least it would be possible in case of no data to still have the instance label returned? I have an inhibit rule that is supposed to mute all alerts for down containers when cadvisor is down, based on the criteria equal: ['instance'] so it doesn't work in this case as no labels are returned. |
This comment has been minimized.
This comment has been minimized.
If there's no data, we can't return anything as there's nothing to work off. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil is there any description of how up works? My experiment- I tried creating jobs for 2 of my containers where I am running kibana:port and elasticsearch:port in the prometheus.yml (snippet below),
but than the value of up is 0 for both of these since by kibana and elasticsearch have no data output stream. Please see picture below- So I am still unsure how to use 'up{job="myjob"}' since the only job that returns value=1 is cAdvisor :( |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil "you need to maintain a separate source of truth to know what's meant to be running" that would be my docker-compose.yml |
This comment has been minimized.
This comment has been minimized.
|
A better way would be to auto-generate alerts using absent based on your configuration management, but this is all getting very complex. You should look at monitoring services, not individual containers. |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil It could have easily solved if prometheus could return null in case there is no data matching a query such as |
This comment has been minimized.
This comment has been minimized.
|
It returns empty in that case, Prometheus has no notion of null. It sounds like you're looking for |
This comment has been minimized.
This comment has been minimized.
|
@brian-brazil any examples/documentation on absent()? I would try to experiment with it if I can figure out how best to use it. |
This comment has been minimized.
This comment has been minimized.
trompx
commented
Mar 29, 2016
|
In my case, I thought as prometheus was pulling data from all the target groups (instances) of all the jobs, that in case a value was missing, it would at least pass the instance ip:port as it still scrape thoses instances thus having that info. Anyway, I am deploying my infra with ansible, so I guess the best way is to dynamically generate the alerts/prometheus.yml file to have one job per type of containers monitored (webserver/mysql/redis/kibana etc). Edit: I misread mokshpooja answer, so up won't return anything since some containers output nothing. Too bad.. Thank you for the info. |
This comment has been minimized.
This comment has been minimized.
|
@trompx So here is how I am solving this based n @brian-brazil feedback. For services that do not output or are not a separate target-
For services with output or a target-
|
fabxc
added
kind/question
and removed
question
labels
Apr 28, 2016
This comment has been minimized.
This comment has been minimized.
commarla
commented
May 24, 2016
|
Hi, I am working on this. My exporter is scraped every 5 seconds. Is there a way to reduce this time ? How is it working? Thanks, |
This comment has been minimized.
This comment has been minimized.
|
That's due to staleness, see #398. |
This comment has been minimized.
This comment has been minimized.
commarla
commented
May 24, 2016
|
Thanks @brian-brazil, I found |
This comment has been minimized.
This comment has been minimized.
Stef3478
commented
Jun 9, 2016
•
|
Hi, I'm also working on this as well, when I use this: |
This comment has been minimized.
This comment has been minimized.
Stef3478
commented
Jun 10, 2016
|
Never mind, I'm using Thanks |
This comment has been minimized.
This comment has been minimized.
|
Seems like this is resolved. Closing. |
juliusv
closed this
Jul 23, 2016
This comment has been minimized.
This comment has been minimized.
fuzzyami
commented
Nov 27, 2016
This comment has been minimized.
This comment has been minimized.
helletheone
commented
May 29, 2017
|
Soo there is at this moment no realy solutions to monitor for example 100 Containers? right? |
This comment has been minimized.
This comment has been minimized.
andrewhowdencom
commented
May 30, 2017
|
@helletheone if you're looking more broadly for a snapshot of whether ${X] containers are unavailable, |
This comment has been minimized.
This comment has been minimized.
harakiri406
commented
Aug 31, 2017
|
Just leaving this here to show how I did it: |
This comment has been minimized.
This comment has been minimized.
hiscal2015
commented
Dec 13, 2017
|
@mokshpooja I have the same issue, what's your final solution? |
This comment has been minimized.
This comment has been minimized.
occelebi
commented
Jul 18, 2018
•
|
Any improvement to monitor multiple containers on host in one rule ? |
This comment has been minimized.
This comment has been minimized.
sohel2020
commented
Dec 13, 2018
|
https://github.com/stefanprodan/dockprom/blob/master/prometheus/alert.rules#L45 so how can I write a single alert rule for this two container? and sent right container name with alert description. |
mokshpooja commentedMar 24, 2016
Hi Guys,
I hope you can guide. I have my setup on AWS where I am trying to monitor several containers, using cAdvisor + Prometheus + Alert manager. What I want to do it launch an email alert (with service/container name) if a container goes down for some reason. Problem is that if a container dies there are no metrics collected about this container by cAdvisor. Thus any query results into "no data" since there are no matches for the query.
Eg: container_cpu_usage_seconds_total{com_docker_compose_service="service1"}<=0
Would not work since there is no data to match with.
Is there a work around to launch an alert if 1 container dies?