Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

labels are not kept #2313

Closed
Cas-pian opened this Issue Jan 3, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@Cas-pian
Copy link

Cas-pian commented Jan 3, 2017

What did you do?
I start a prometheus and alertmanger to monitor info from cadvisor with one alert rule like this:

ALERT fetch_down
        IF count(container_cpu_user_seconds_total{image="testimg"}) != 2
        FOR 15s
        LABELS {
                info="fetch_down"
        }
        ANNOTATIONS {
                summary = "all testService containers are down" ,
                description = "{{ $labels.job }} on {{ $labels.instance }} detected testService down !",
        }

What did you expect to see?
Description of alerts went to alertmanager should be " cadvisor on 10.0.0.3 got test img down!."
What did you see instead? Under which circumstances?
Alerts went to alertmanager I got from tcpdump has Desctioption: " on got test img down!"
But when I use this expression, I can got the right description:

IF container_cpu_user_seconds_total{image="testimg"} > 0

Environment

  • System information:
    Linux 4.4.0-34-generic x86_64

  • Prometheus version:
    prometheus, version 1.1.3 (branch: master, revision: ac374aa)
    build user: root@3e392b8b8b44
    build date: 20160916-11:36:30
    go version: go1.6.3

  • Alertmanager version:
    alertmanager, version 0.5.1 (branch: master, revision: 0ea1cac51e6a620ec09d053f0484b97932b5c902)
    build user: root@fb407787b8bf
    build date: 20161125-08:14:40
    go version: go1.7.3

  • Prometheus configuration file:

scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

  # cadvisor
  - job_name: 'cadvisor'
    scrape_interval: 5s

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_timeout: 4s

    static_configs:
      - targets: ['10.0.0.3:8081', '10.0.0.4:8081']
        labels:
          group: 'development'

@matthiasr

This comment has been minimized.

Copy link
Contributor

matthiasr commented Jan 3, 2017

The count aggregation aggregates these away. You can take the query in the IF clause and run it in the Prometheus server console to see what the result would be (remove the condition of there is nothing over it at the moment).

To preserve them, use count by(job,instance) (…).

Aside from this, consider using the container_last_seen timestamp for container-upness alerts. The CPU metrics may longer around for a while after the container is gone – but then they disappear (not go to 0). In your specific case you know the expected count so it's probably okay, just take care to verify that it fires when you think it should.

@Cas-pian

This comment has been minimized.

Copy link
Author

Cas-pian commented Jan 3, 2017

@matthiasr, Many thanks for your response. container_last_seen seems to be a good idea.
I have no idea of how to write the expression with count by to evalute the total number of this container and record the label of instance into description, it's it possible ?

@grobie

This comment has been minimized.

Copy link
Member

grobie commented Mar 5, 2017

Please use the prometheus-users mailing list for questions.

@grobie grobie closed this Mar 5, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.