Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting - False alerts and Missing alerts #2938

Closed
parikhravish opened this Issue Jul 12, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@parikhravish
Copy link

parikhravish commented Jul 12, 2017

What did you do?
Alert generated for an event(Label sets). Followed by this the event repeated a few times, However since prometheus generates the alert id based on the label sets it made a determination that this was a continuation alert vs new alert and due to the repeat interval no new alert was sent.
What did you expect to see?
Alert sent whenever the event repeats it self, not when it is a continuation.
What did you see instead? Under which circumstances?
New event did not generate a new alert because the label sets were the same.
Environment
Docker

  • System information:
    Darwin ML07DTSG15KG8WL 15.6.0
    insert output of uname -srm here

  • Prometheus version:
    1.7.1
    insert output of prometheus -version here

  • Alertmanager version:
    0.7.1
    insert output of alertmanager -version here (if relevant to the issue)

  • Prometheus configuration file:

insert configuration here
  • Alertmanager configuration file:
insert configuration here (if relevant to the issue)
  • Logs:
insert Prometheus and Alertmanager logs relevant to the issue here
@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 12, 2017

@parikhravish Prometheus alerts are not event-based, they are state-based. As long as a particular label combination is present in the output of an alerting rule, that particular labeled alert will keep firing to the Alertmanager. Alertmanager then makes events (notifications) out of these states by applying grouping, waiting, and re-notification rules to the states.

Only if an alert element stops being present in the output of an alerting rule, Prometheus tells Alertmanager to explicitly resolve it. In that case, if an alert appears, disappears, then re-appears, you will get two notifications. But not if the alert state stays firing all the time.

@parikhravish

This comment has been minimized.

Copy link
Author

parikhravish commented Jul 12, 2017

Yes so for example in my case following is the situation:

Service A Working fine - No Alert
Service A Goes down - Alert issued (Taking into account the various intervals)
Service A Remains Down - No Alert Issued(Due to repeat Interval, Assuming this happened within the repeat interval)
Service A comes Back up - No Alert
Service A goes back down - No Alert Issued(Due to repeat Interval, Assuming this happened within the repeat interval) - This is where I was expecting an alert. Seems like the alert id is generated based on the label sets and alert manager thinks of this specific event as a continuation of the first instance of this event.

Basically I am trying to differentiate between an event continuing vs new event for the same service(Same label sets.)

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 12, 2017

@parikhravish I see. So this is about the Alertmanager side (the issue probably belongs there then). As far as I know, Alertmanager should send out a new notification if an alert was resolved in Alertmanager previously and then becomes active again. /cc @fabxc to confirm / deny.

@parikhravish

This comment has been minimized.

Copy link
Author

parikhravish commented Jul 13, 2017

Any updates on this? Should I reopen this on the alert manager side? @juliusv @fabxc

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 13, 2017

@parikhravish Yes, as far as I understand it, it should be an Alertmanager issue.

@mxinden

This comment has been minimized.

Copy link
Member

mxinden commented Jul 13, 2017

For documentation purposes, this was moved to prometheus/alertmanager#904. Please close here.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.