Alerting: Allow more time before Alertmanager expire-resolves alerts #77094
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is this feature?
We saw an issue recently where long queries (thereby dropped ruler ticks) were causing flapping, due to the alert timing out in alertmanager and getting force-resolved.
Prometheus itself ran into the same issue intermittently prometheus/prometheus#5277
Though, through a different vector; there, alerts were failing to make it to the alertmanager.
Still, this change allows more "slack" in the event of ruler delays, and synchronizes us back with Prometheus. The original choice of
3
seems to be prometheus inspired, let's follow the trend there.Which issue(s) does this PR fix?:
n/a
Special notes for your reviewer:
Please check that: