Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: In migration, fallback to '1s' for malformed min interval #78614

Merged
merged 2 commits into from
Nov 24, 2023

Conversation

JacobsonMT
Copy link
Member

What is this feature?

During legacy migration, when we encounter an alert datasource query with a min interval (interval field in the query model) that is not parseable, instead of failing the migration we fallback to a min interval of 1s and continue.

Note that this doesn't mean an intervalMs of 1s will be used, the intervalMs (actual interval of the query) will still be calculated based on ~max((to-from)/maxDataPoints, minInterval) where maxDataPoints=1500.

Why do we need this feature?

There is a bug in legacy alerting (existing for a few major versions) which allows arbitrary dashboard variables to be used as the min interval, even though those variables do not work and will cause the legacy alert to fail with interval calculation failed: time: invalid duration.

Legacy alerts with these broken intervals are likely rare, so failing the migration is not unreasonable. However, since legacy alerts with these variables used to be functional at some point, it seems low risk and an altogether benefit to fix them during migration.

Who is this feature for?

Users migrating from legacy alerting to Grafana Alerting.

Which issue(s) does this PR fix?:

Fixes #78606

Special notes for your reviewer:

Little bit of "boy scout rule" here as I chose to fix datasource not being passed to calculateInterval in adjacent code. It should have little to no functional effect, but is technically correct based on how legacy alerting works:

calculatedInterval, err := calculateInterval(timeRange, queryModel, datasource)

@JacobsonMT JacobsonMT added this to the 10.3.x milestone Nov 24, 2023
@JacobsonMT JacobsonMT requested review from a team, rwwiv, yuri-tceretian, grobinson-grafana and JohnnyQQQQ and removed request for a team November 24, 2023 01:57
// The reason for this is because of a bug in legacy alerting which allows arbitrary variables to be used
// as the min interval, even though those variables do not work and will cause the legacy alert
// to fail with `interval calculation failed: time: invalid duration`.
if _, err := interval.GetIntervalFrom(ds, simpleJson, time.Millisecond*1); err != nil {
Copy link
Member

@JohnnyQQQQ JohnnyQQQQ Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a scenario where interval and intervalMs could be already set? interval being a variable and intervalMs being a valid duration?

Copy link
Member Author

@JacobsonMT JacobsonMT Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen in my tests, and if I can believe the comment above GetIntervalFrom():

// interval.GetIntervalFrom has two problems (but they do not affect us here):
// - it returns the min-interval, so it should be called interval.GetMinIntervalFrom
// - it falls back to model.intervalMs. it should not, because that one is the real final
// interval-value calculated by the browser. but, in this specific case (old-alert),
// that value is not set, so the fallback never happens.

Then intervalMs shouldn't be set in legacy alerting. That being said, I did go back and forth a bit on whether I should set the interval to an empty string instead of 1s.

Copy link
Member

@JohnnyQQQQ JohnnyQQQQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one small question.

@JacobsonMT JacobsonMT added the area/alerting/migration Issues relating to legacy alerting migration label Nov 24, 2023
The reason for this is because of a bug in legacy alerting which allows
arbitrary variables to be used as the min interval, even though those
variables do not work and will cause the legacy alert to fail with
`interval calculation failed: time: invalid duration`.

 Instead of failing the migration, we opt to 'fix' these queries during
 migration.
@JacobsonMT JacobsonMT force-pushed the jacobsonmt/migration-query-interval-variables branch from 0c4bcdd to 5d331fe Compare November 24, 2023 16:14
@JacobsonMT JacobsonMT merged commit 4b439b7 into main Nov 24, 2023
14 checks passed
@JacobsonMT JacobsonMT deleted the jacobsonmt/migration-query-interval-variables branch November 24, 2023 16:27
@aangelisc aangelisc modified the milestones: 10.3.x, 10.2.3 Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add to changelog area/alerting/migration Issues relating to legacy alerting migration area/alerting Grafana Alerting area/backend no-backport Skip backport of PR
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Alerts with a variable as interval cannot be migrated
3 participants