Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Muted alerts are not suppressed in API #3513

Open
grobinson-grafana opened this issue Sep 7, 2023 · 1 comment
Open

Muted alerts are not suppressed in API #3513

grobinson-grafana opened this issue Sep 7, 2023 · 1 comment

Comments

@grobinson-grafana
Copy link
Contributor

grobinson-grafana commented Sep 7, 2023

What

When an alert is silenced in Alertmanager its status is changed from active to suppressed, and the JSON responses for /api/v2/alerts contain the the IDs(s) of all silences that suppressed it:

[
  {
    "annotations": {},
    "endsAt": "2023-09-07T10:43:50.291+01:00",
    "fingerprint": "3fff2c2d7595e046",
    "receivers": [
      {
        "name": "test"
      }
    ],
    "startsAt": "2023-09-07T10:38:50.291+01:00",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [
        "ed411cae-78ea-47bc-90f5-1e474d6f526e"
      ],
      "state": "suppressed"
    },
    "updatedAt": "2023-09-07T10:38:50.291+01:00",
    "labels": {
      "foo": "bar"
    }
  }
]

However, the same is not said for active and mute time intervals:

ts=2023-09-07T09:54:39.161Z caller=notify.go:902 level=debug component=dispatcher msg="Notifications not sent, route is within mute time"
  {
    "annotations": {},
    "endsAt": "2023-09-07T10:59:09.159+01:00",
    "fingerprint": "3fff2c2d7595e046",
    "receivers": [
      {
        "name": "test"
      }
    ],
    "startsAt": "2023-09-07T10:54:09.159+01:00",
    "status": {
      "inhibitedBy": [],
      "silencedBy": [],
      "state": "active"
    },
    "updatedAt": "2023-09-07T10:54:09.159+01:00",
    "labels": {
      "foo": "bar"
    }
  }
]

I expected to see "state": "suppressed" and "mutedBy", just like we have "inhibitedBy" and "silencedBy".

Background

Alertmanager supports a number of features to suppress alerts. These are:

  1. Silences
  2. Inhibition rules
  3. Active time intervals
  4. Mute time intervals

However, while silences and inhibition rules target alerts, active and mute time intervals target routes. This means that a silenced or inhibited alert is suppressed irrespective of how it's routed, but for active and mute time intervals an alert can be active when matched to one route and suppressed when matched to another. If continue matching is also set to true, then an alert can be both active and suppressed at the same time.

Problems

This presents a number of problems for the Alertmanager API:

  1. /api/v2/alerts contains a list of Silence IDs and alert rule fingerprints that are silencing or inhibiting the alert. It does not show which groups the alert is silenced or inhibited in as it does not need to. Both silences and inhibition rules work irrespective of routes, so the alert is silenced and inhibited in all groups, even when continue matching is set to true.
  2. Active and mute time intervals are per route, which means an alert can be active when matched to one route and suppressed when matched to another. However, /api/v2/alerts does not work with groups. Instead, there is /api/v2/alerts/groups. Should /api/v2/alerts show which groups an alert is suppressed in when the rest of the endpoint does not return information about groups?
  3. What should the state be of an alert that is active in one group but suppressed in another? Should it be active, suppressed, or something else?

Proposal

  1. I propose that we DO NOT update /api/v2/alerts to show alerts as suppressed from active and mute time intervals. There are two reasons for this: a) it does not feel right to add information about groups when /api/v2/alerts does not work with groups b) there is an existing endpoint available that has this data called /api/v2/alerts/groups.
  2. We update /api/v2/alerts/groups to show if an alert is suppressed from only the active and mute time intervals in the route that created the group, and no other routes.
  3. In the long term we should remove status from /api/v2/alerts, which includes state, inhibitedBy and suppressedBy. The state of an alert cannot be fully known until it has been routed and put into a group, and so status should be removed from this endpoint. This is a breaking change.

PRs

The following PRs should be reviewed and merged in the order specified:

  1. #3513: Rewrite TestTimeActiveStage tests #3795
  2. #3513: Rewrite TestTimeMuteStage tests #3794
  3. #3513: TimeMuter returns the names of time intervals #3791
  4. #3513: Add GroupMarker interface #3792
  5. #3513: Mark muted alerts #3793
  6. #3513: Show muted alerts in the Alert Groups API #3797
@yarix
Copy link

yarix commented Sep 7, 2023

in case that there is a mute and the alert should also be silenced then::
the alert should be(/stay) in state "suppressed" and the json should contain the list of "silencedBy".
as for "muteBy" - it sounds like a nice addition to the json, helping to understand the state of the alert.

@grobinson-grafana grobinson-grafana changed the title No marker for mutes? Muted alerts are not suppressed in API Mar 4, 2024
gotjosh pushed a commit that referenced this issue Apr 30, 2024
* TimeMuter returns the names of time intervals

This commit updates the TimeMuter interface to also return the names
of the time intervals that muted the alerts.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
gotjosh pushed a commit that referenced this issue Apr 30, 2024
* Add GroupMarker interface

This commit adds a new GroupMarker interface that marks the status
of groups. For example, whether an alert is muted because or one
or more active or mute time intervals.

It renames the existing Marker interface to AlertMarker to avoid
confusion.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
TheMeier pushed a commit to TheMeier/alertmanager that referenced this issue May 3, 2024
…theus#3791)

* TimeMuter returns the names of time intervals

This commit updates the TimeMuter interface to also return the names
of the time intervals that muted the alerts.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
TheMeier pushed a commit to TheMeier/alertmanager that referenced this issue May 3, 2024
* Add GroupMarker interface

This commit adds a new GroupMarker interface that marks the status
of groups. For example, whether an alert is muted because or one
or more active or mute time intervals.

It renames the existing Marker interface to AlertMarker to avoid
confusion.

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
gotjosh pushed a commit that referenced this issue May 13, 2024
* Mark muted groups

This commit updates TimeMuteStage and TimeActiveStage to mark groups
as muted when its alerts are muted by an active or mute time interval,
and remove any existing markers when outside all active and mute
time intervals.

Signed-off-by: George Robinson <george.robinson@grafana.com>

* Remove unlock to defer

Signed-off-by: George Robinson <george.robinson@grafana.com>

---------

Signed-off-by: George Robinson <george.robinson@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants