-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerting: In migration, fallback to '1s' for malformed min interval #78614
Alerting: In migration, fallback to '1s' for malformed min interval #78614
Conversation
// The reason for this is because of a bug in legacy alerting which allows arbitrary variables to be used | ||
// as the min interval, even though those variables do not work and will cause the legacy alert | ||
// to fail with `interval calculation failed: time: invalid duration`. | ||
if _, err := interval.GetIntervalFrom(ds, simpleJson, time.Millisecond*1); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a scenario where interval
and intervalMs
could be already set? interval
being a variable and intervalMs
being a valid duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I've seen in my tests, and if I can believe the comment above GetIntervalFrom():
grafana/pkg/services/ngalert/migration/cond_trans.go
Lines 343 to 347 in 82f3127
// interval.GetIntervalFrom has two problems (but they do not affect us here): | |
// - it returns the min-interval, so it should be called interval.GetMinIntervalFrom | |
// - it falls back to model.intervalMs. it should not, because that one is the real final | |
// interval-value calculated by the browser. but, in this specific case (old-alert), | |
// that value is not set, so the fallback never happens. |
Then intervalMs
shouldn't be set in legacy alerting. That being said, I did go back and forth a bit on whether I should set the interval
to an empty string instead of 1s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one small question.
The reason for this is because of a bug in legacy alerting which allows arbitrary variables to be used as the min interval, even though those variables do not work and will cause the legacy alert to fail with `interval calculation failed: time: invalid duration`. Instead of failing the migration, we opt to 'fix' these queries during migration.
0c4bcdd
to
5d331fe
Compare
What is this feature?
During legacy migration, when we encounter an alert datasource query with a min interval (
interval
field in the query model) that is not parseable, instead of failing the migration we fallback to a min interval of1s
and continue.Note that this doesn't mean an
intervalMs
of1s
will be used, theintervalMs
(actual interval of the query) will still be calculated based on ~max((to-from)/maxDataPoints, minInterval)
wheremaxDataPoints=1500
.Why do we need this feature?
There is a bug in legacy alerting (existing for a few major versions) which allows arbitrary dashboard variables to be used as the min interval, even though those variables do not work and will cause the legacy alert to fail with
interval calculation failed: time: invalid duration
.Legacy alerts with these broken intervals are likely rare, so failing the migration is not unreasonable. However, since legacy alerts with these variables used to be functional at some point, it seems low risk and an altogether benefit to fix them during migration.
Who is this feature for?
Users migrating from legacy alerting to Grafana Alerting.
Which issue(s) does this PR fix?:
Fixes #78606
Special notes for your reviewer:
Little bit of "boy scout rule" here as I chose to fix datasource not being passed to
calculateInterval
in adjacent code. It should have little to no functional effect, but is technically correct based on how legacy alerting works:grafana/pkg/services/alerting/conditions/query.go
Line 261 in 025b2f3