Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send firing and resolved status alert to alertmanager in loop #3606

Closed
yylt opened this Issue Dec 21, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@yylt
Copy link

yylt commented Dec 21, 2017

What did you do?
i create an scrape job which to collect information from an unreachable node , and prometheus send alerts to alertmanager

What did you expect to see?
prometheus send firing statue alert or nothing to alertmanager always , and will never send resolved status alert to alertmanager.

What did you see instead? Under which circumstances?
in fact , what i see is same to this issue
prometheus/alertmanager#952

Environment
prometheus v1.8.2
alertmanage v0.11.0

  • System information:

    linux- 3.10.0

  • Prometheus configuration file:
    prometheus -alertmanager.url=http://alertmanager-service:9093 -web.listen-address=:9091 -config.file=/etc/prometheus/prometheus.yaml

global:
    scrape_interval: 1m
    scrape_timeout: 55s
    evaluation_interval: 1m
scrape_configs:
- job_name: test
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
     - 1.2.3.4:90
 
  • Alertmanager configuration file:
    inhibit_rules:
  • source_match:
    severity: 'critical'
    target_match:
    severity: 'warning'
    equal ['alertname', 'service']
  • Logs:
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 21, 2017

Can you tell us exactly where you believe the bug is?

@yylt

This comment has been minimized.

Copy link
Author

yylt commented Dec 21, 2017

when execute alertrule statments , prometheus just believe the data had been stored , but in test environment node-exporter will restart periodly intent to make prometheus lost some data , such as

 node_monitstatus{target='127.0.0.1'} = null 

in this case , prometheus will send inactive status alert which will update resloved time to alertmanager , as result , the alertmanager send alert[resolved] to receivers.

so i make some different , before execute alertrule statement i add some codes which parse statment and try to get data from node_statement{target='1.2.3.4'} , when result is empty , it will not go forward .

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 21, 2017

An alert needs to be active on every alert cycle, otherwise it is considered resolved. This is expected behaviour.

@yylt

This comment has been minimized.

Copy link
Author

yylt commented Dec 21, 2017

Thanks , but if the data which accquired from scrape_job is not exist in prometheus , I don't think it should be considered resolved in this stuation.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Dec 21, 2017

It sounds like you're looking for alerting with the absent function.

@strzelecki-maciek

This comment has been minimized.

Copy link

strzelecki-maciek commented Jan 31, 2018

I'll just add my 2 cents here. I have had a funny issue of "rogue-alert-resolver" prometheus instance.

I am sharing the alert definitions between nodes. The pair of alerts in question had a "job not up" definition over 2m and 4m.

Two nodes are scraping these jobs every minute.

The third node was scraping this job every 5 minutes and kept resolving the alerts every couple of minutes. It could not possibly have ever gotten a "job is up" state (the exporters were shut down), yet it kept resolving.

Seems like a similar issue. Obviously the job scrape interval needs to be smaller than the alert threshold. On the other hand this is an unexpected behaviour, when possibly a missing (time-misaligned?) data is returning OK state even tho its consistently getting DOWN state.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.