-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerting: Don't record annotations for mapped NoData transitions, when NoData is mapped to OK #77164
Conversation
(t.State.StateReason == models.StateReasonNoData && t.PreviousStateReason == "") { | ||
return false | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be simplified and check for rule.NoDataState == models.Ok
should be dropped because it leaks the knowledge about mapping outside the state management domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we 100% sure that Normal(NoData) can only happen as a result of NoData->OK mapping, and there is absolutely no other vector where this state can appear?
State management already knows about several aspects of the rule configuration, so the only tradeoff here is safety. If a source of Normal (NoData) is added in the future that doesn't come from this mapping, the model can break down a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently yes, the only vector for Normal (NoData)
is mapping NoData to OK.
To your point, the check for Normal (Paused) -> Normal
doesn't use rule.IsPaused
because StateReasonPaused
identifies the rule state as paused. Likewise, StateReasonNoData
does the same for a NoData eval. If this model breaks down a bit it would be because the semantics of StateReasonNoData
have changed, and that change should include making sure code using the constant is aware of the new semantics.
require.True(t, shouldRecordAnnotation(r, pauseBackward), "Normal(Paused) -> Normal(NoData) should be true") | ||
require.True(t, shouldRecordAnnotation(r, errorForward), "Normal(NoData) -> Normal(Error) should be true") | ||
require.True(t, shouldRecordAnnotation(r, errorBackward), "Normal(Error) -> Normal(NoData) should be true") | ||
require.True(t, shouldRecordAnnotation(r, missingSeriesBackward), "Normal(MissingSeries) -> Normal(NoData) should be true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can drop all transitions from to Error\NoData and keep MissingSeries, Paused and Updated
Taking over from @alexweav while he's out. I'm pushing up a change to avoid checking |
91d7e5c
to
663d2f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What is this feature?
Mapping to NoData to OK is akin to saying that you don't care about NoData for a certain rule. It's often used in the event of intermittently failing queries.
However, alerting still emits annotations when such a mapped rule goes between OK(NoData) and OK. On flaky datasources, which is the most common situation for this, this causes annotation spam on attached panels, making graphs unreadable.
Since the user already configured the engine in a way that indicates that they don't care about these transitions, let's also not emit annotations for them.
This only applies to panel annotations and the old state history. The new state history will still record these transitions.
Why do we need this feature?
Prevent annotation spam/unreadable graphs on intermittent queries, when the user already indicated that they want to ignore NoData and treat it as OK.
Which issue(s) does this PR fix?:
Fixes #76861
Special notes for your reviewer:
Please check that: