-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerting: Fix database unavailable removes rules from scheduler #49874
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please see my comments I'm happy to take another look.
// If the database is unavailable or the query returns an error then return | ||
// the alert rules from the most recent tick | ||
if err := sch.ruleStore.GetAlertRulesForScheduling(ctx, &q); err != nil { | ||
sch.log.Error("failed to get most recent alert rules", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about including the number of alerts we're about to return from the cache?
We also need a changelog entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduler has two methods that provide the ability to update\delete rule (methods DeleteAlertRule and UpdateAlertRule). I think we should update the cache when DeleteAlertRule is called
2545019
to
cedf93a
Compare
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/20556 |
1d9014e
to
47e3d01
Compare
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/20721 |
47e3d01
to
b0ccef4
Compare
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/20730 |
b0ccef4
to
7671952
Compare
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/20739 |
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/21204 |
a23bca2
to
68cb26f
Compare
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/21206 |
68cb26f
to
3263d06
Compare
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/21388 |
4845265
to
bff7491
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Drone build failed: https://drone.grafana.net/grafana/grafana-enterprise/21442 |
(cherry picked from commit c83f843)
What this PR does / why we need it:
This pull request fixes a bug in Grafana where intermittent failure of database, network between Grafana and the database, or error in querying the database would cause all alert rules to be unscheduled in Grafana. This pull request fixes this so alert rules are not changed unless the query is successful.
Which issue(s) this PR fixes:
Fixes #49855
Special notes for your reviewer:
Release notice breaking change
This change fixes a bug in Grafana where intermittent failure of database, network between Grafana and the database, or error in querying the database would cause all alert rules to be unscheduled in Grafana. Following this change scheduled alert rules are not updated unless the query is successful.
The
get_alert_rules_duration_seconds
metric has been renamed toschedule_query_alert_rules_duration_seconds
.