Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v8.1.x] Alerting: Fix alert flapping in the internal alertmanager #38829

Merged
merged 1 commit into from Sep 2, 2021

Conversation

grafanabot
Copy link
Contributor

Backport dd502f2 from #38648

* Alerting: Fix alert flapping in the alertmanager

fixes a bug that caused Alerts that are evaluated at low intervals (sub 1 minute), to flap in the Alertmanager.
Mostly due to a combination of `EndsAt` and resend delay.

The Alertmanager uses `EndsAt` as a heuristic to know whenever it should resolve a firing alert, in the case that it hasn't heard
back from the alert generation system.

Because grafana sent the alert with an `EndsAt` which is equal to the `For` of the alert itself,
and we had a hard-coded 1 minute re-send delay (only applicable to firing alerts) this meant that a firing alert would resolve in the Alertmanager before we re-notify that it still firing.

This commit, increases the `EndsAt` by 3x the the resend delay or alert interval (depending on which one is higher). The resendDelay has been decreased to 30 seconds.

(cherry picked from commit dd502f2)
@grafanabot grafanabot requested a review from a team as a code owner September 2, 2021 15:27
@grafanabot grafanabot added this to the 8.1.3 milestone Sep 2, 2021
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gotjosh gotjosh merged commit 3760a3a into v8.1.x Sep 2, 2021
@gotjosh gotjosh deleted the backport-38648-to-v8.1.x branch September 2, 2021 16:03
stevepostill pushed a commit to Reveal-International/grafana that referenced this pull request Nov 3, 2021
…-github to revdev

* commit '0d046c46b480ffc5fd602142a2b0e5dfc765b620': (63 commits)
  "Release: Updated versions in package to 8.1.3" (grafana#38965)
  Change grabpl version to 2.3.4 (grafana#38967)
  PieChart: Display "No data" when there is no data (grafana#38808) (grafana#38960)
  track signature files + add warn log (grafana#38938) (grafana#38958)
  Docs: Clarify delta value (grafana#38824) (grafana#38916)
  Postgres/MySQL/MSSQL: Fix region annotations not displayed correctly (grafana#38936) (grafana#38953)
  uPlot: Fix default value for plot legend visibility (grafana#36660) (grafana#38930)
  Prometheus: Fix validate selector in metrics browser (grafana#38921) (grafana#38926)
  Dashboard: Forces panel re-render when exiting panel edit (grafana#38913) (grafana#38919)
  [v8.1.x] Dashboard: Fix UIDs are not preserved when importing/creating dashboards thru importing .json file (grafana#38892)
  OAuth: add docs for disableAutoLogin param (grafana#38752) (grafana#38896)
  LibraryPanels: Prevents duplicate repeated panels from being created (grafana#38804) (grafana#38863)
  Adding missing information, more than just the manual backport. (grafana#38837)
  Elasticsearch: Prevent pipeline aggregations to show up in terms order by options (grafana#38448) (grafana#38830)
  Build: Upgrade grabpl to 2.4.2 (grafana#38820) (grafana#38828)
  Alerting: Fix alert flapping in the internal alertmanager (grafana#38648) (grafana#38829)
  [v8.1.x] Chore: Update to alpine:3.14.2 (grafana#38821)
  Update Dockerfile (grafana#38785) (grafana#38815)
  Deprecate browser access mode for the Graphite data source. (grafana#38783) (grafana#38809)
  Live: prepend orgId when publishing from HTTP (grafana#38775) (grafana#38793)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants