Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert will not trigger #4231

Closed
jurgenweber opened this Issue Jun 7, 2018 · 12 comments

Comments

Projects
None yet
3 participants
@jurgenweber
Copy link

jurgenweber commented Jun 7, 2018

Bug Report

What did you do?
Added an alert.
What did you expect to see?
When the alert condition is met..... it alert.
What did you see instead? Under which circumstances?
No alert.
Environment

Using the helm chart:
https://github.com/kubernetes/charts/tree/master/stable/prometheus

  • System information:

    repository: prom/prometheus

  • Prometheus version:

    tag: v2.2.1

  • Alertmanager version:

    repository: prom/alertmanager
    tag: v0.14.0

  • Prometheus configuration file:

          - alert: BurstBalanceLow
            expr: aws_ebs_burst_balance_maximum < 90.0
            for: 1m
            labels:
              severity: critical
            annotations:
              description: 'Burst balance exceeds threshold (currently {{ $value|humanize }}%)'
              summary: Burst Balance low on {{ $labels.volume_id }}

I don't get it, it should be simple but it just does not alert. I have two EBS's volumes (data exported using https://github.com/kubernetes/charts/tree/master/stable/prometheus-cloudwatch-exporter) that are constantly under that 90 threshold and it never alerts.

http://take.ms/gphy5

Also other alerts work just fine.

Thanks

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jun 7, 2018

The github issues are focused on bug reports 🐞 and this looks more like a support request 👍

I would recommend looking through the official docs and examples as well as searching in the google groups or asking in the irc channel and I am sure there will be someone to help you out.

irc: #prometheus
group: https://groups.google.com/forum/#!forum/prometheus-users
docs: https://prometheus.io/docs/introduction/overview/

Feel free to reopen if you are 100% convinced that is not a support request and a bug in Prometheus.

@jurgenweber

This comment has been minimized.

Copy link
Author

jurgenweber commented Jun 7, 2018

no, its a bug report. This just plain doesn't work and I see nothing else aboutit anywhere else.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jun 8, 2018

@jurgenweber can you show some logs that might be related to the issue.
Also a screenshots of the alerts tab might give more info and also include the minimal Prometheus config at which we can replicate the issue.

You are probably familiar with the alerting works but here is the doc page just in case.
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

@simonpasquier is a bit more familiar with the alerting so might give us some more ideas.

@krasi-georgiev krasi-georgiev reopened this Jun 8, 2018

@jurgenweber

This comment has been minimized.

Copy link
Author

jurgenweber commented Jun 8, 2018

Hi

I tried debug logs, which are very noisy. I left it running for a while and tried to search on the name of the alert without result but I did not when searching for hte metric I sometimes see:

[prometheus-server-7c856f6d97-lsll4 prometheus-server] level=debug ts=2018-06-08T00:14:41.580927833Z caller=scrape.go:840 component="scrape manager" scrape_pool=prometheus-cloudwatch-exporter target=http://super-magician-prometheus-cloudwatch-exporter.devops.svc.cluster.local:80/metrics msg="Out of order sample" series="aws_ebs_burst_balance_maximum{job="aws_ebs",instance="",volume_id="vol-00b28190910ae9c46",}"

(I will keep watching the logs for another hour or so and see if anything interesting pops up).

Screen shot; http://take.ms/qbHEs, config is provided in the original post. Its a simple alert. Please note other alerts look fine, its not like its the first alert I have ever made. This one just will not trigger even thou it is true. So I feel like I am hitting some weird edge case of oddity. I have worked on researched on it for days (because it just seemed so simple, what is going on?!) and my counterpart worked on it also and came to the weird end of nothing I have.

Thanks

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jun 8, 2018

I meant the job config , I just wanted to see the scrape frequency as according to the docs this might have influence.

the screenshot shows different alert config than the original post, but I assume you tried with many different variations?

@jurgenweber

This comment has been minimized.

Copy link
Author

jurgenweber commented Jun 8, 2018

ah, right.

      # prometheus cloudwatch exporter
      - job_name: 'prometheus-cloudwatch-exporter'

        static_configs:
          - targets: ['super-magician-prometheus-cloudwatch-exporter.devops.svc.cluster.local:80']
            labels:
              group: 'devops'

The rest are defaults as per the helm chart.

yeah, many different variants... the screen shot is 'what we want' and then I tried to make it simpler, removing the ! (aws_ebs_burst_balance_maximum{volume_id!="vol-0c3627f137e583133"} < 80 for 5m vs aws_ebs_burst_balance_maximum < 90 for 1m), increasing the value to ensure there was always something that should be triggering it, etc. You will note there is the warning and critical alert we are after in the screen shot. Neither work.

@jurgenweber

This comment has been minimized.

Copy link
Author

jurgenweber commented Jun 8, 2018

maybe this screen short is a bit more meaningful/helpful; http://take.ms/2Iv1I

On the left is the alert... on the right a graph showing two items that are under that limit... and should be triggering it right now.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 8, 2018

You didn't mention that you were running a bleeding edge cloudwatch exporter binary, which includes as yet unreleased code. This is not a problem with Prometheus, you need to add an offset 10m to your alert.

@krasi-georgiev

This comment has been minimized.

Copy link
Member

krasi-georgiev commented Jun 8, 2018

@brian-brazil that is interesting. Is this offset because the scraped data is 10min behind?

The graph shows some gap at the end so I guess that confirms it.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 8, 2018

Is this offset because the scraped data is 10min behind?

Yes.

@jurgenweber

This comment has been minimized.

Copy link
Author

jurgenweber commented Jun 10, 2018

well, I am not to know that it is some 'bleeding edge' thing. It doesn't have a sign on it, I just found it all in my travels.

I will add an offset, thanks for your help.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.