Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toggle ability to - send resolved notifications for inhibited alerts #2754

Open
dofinn opened this issue Nov 4, 2021 · 1 comment
Open

Comments

@dofinn
Copy link

dofinn commented Nov 4, 2021

What did you do?

Prom = Prometheus
AM = Alertmanager
PD= PagerDuty

In the below flow, there is theoretical inhibition that says “when alert2 is firing, alert1 should be suppressed”

  1. Prom: alert1 fires
  • AM gets alert1, routes to PD
  • PD receives alert1
  1. Prom: alert2 fires
  • AM gets alert2
  • AM suppresses alert1
  • AM routes alert2 to PD
  • PD receives alert2
  • PD now has two alerts
    • Alert1
    • Alert2
  1. Prom: alert1 resolves
  • AM resolves alert1
  • PD receives no notification as the alert is suppressed
  • PD: alert1 becomes orphaned
  1. Prom: alert2 resolves
  • AM resolves alert2
  • PD receives resolved notification for alert2
  • PD resolves alert
  • PD retains alert1 that is now orphaned

This is undesirable at scale where our pager/dashboards are becoming overwhelmed with orphaned alerts that can require extensive investigation to manually mark as resolved.

What did you expect to see?

When an alert that has fired and has been routed externally becomes inhibited, its resolved signal bypasses inhibition to notify the external source that it is resolved.

What did you see instead? Under which circumstances?

Orphaned alerts as per description.

Environment

OpenShift 4.[7,8,9].*

  • Alertmanager version:
❯ getComponentVersion prometheus-alertmanager 4.7.22                                                                                                                   
OCP @ 4.7.22
prometheus-alertmanager @ 0.21.0
❯ getComponentVersion prometheus-alertmanager 4.8.12 
OCP @ 4.8.12
prometheus-alertmanager @ 0.21.0
❯ getComponentVersion prometheus-alertmanager 4.9.4
OCP @ 4.9.4
prometheus-alertmanager @ 0.22.2
  • Prometheus version:
❯ getComponentVersion prometheus 4.7.22 
OCP @ 4.7.22
prometheus @ 2.23.0
❯ getComponentVersion prometheus 4.8.12
OCP @ 4.8.12
prometheus @ 2.26.1
❯ getComponentVersion prometheus 4.9.4 
OCP @ 4.9.4
prometheus @ 2.29.2
  • Alertmanager configuration file:
- target_match_re:
    alertname: alert1
  source_match:
    alertname: alert2
  equal:
  - namespace

Feature Request

Add an additional boolean configuration option similer to that of send_resolved.

# Whether or not to notify about resolved alerts.
[ send_resolved: <boolean> | default = true ]
# whether or not to notify about resolved alerts whom were firing prior to being inhibited
[inhibited_send_resolved: <boolean> | default = false ] 

Desired outcome: Any alert that has fired and been sent externally should be able to send their resolved status regardless of that alerts state (active,supressed)

How it could work:

  • alert1 fires
  • AM sends external notification
  • alert2 fires
  • AM sends alert2 external notification
  • AM inhibits alert1
  • alert1 resolves

Here we could explore two options:

  1. If alert1 resolves AND notified externally prior to becoming inhibted:
    -> send resolve regardless of inhibited status.
    ^ de-couples alert state

  2. If alert1 resolves AND notifiied externally prior to becoming inhibited:
    -> store resolved state in MEM
    -> send resolved state when inhibiter resolves
    ^ coupled alert state

Not knowing the codebase well, I prefer 1 as it is a truer representation of the state of the system.

@jan--f
Copy link
Contributor

jan--f commented Nov 4, 2021

The discussion in #226 is probably relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants