New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alerting update eval engine to return errors and no data as separate models #59973
Alerting update eval engine to return errors and no data as separate models #59973
Conversation
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
type EvaluationResult struct { | ||
Error error | ||
// NoData contains the DatasourceUID for RefIDs that returned no data. | ||
NoData *NoDataResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead have the following? I'm not sure we get much from having NoDataResult
?
NoData *NoDataResult | |
NoData map[string][]string |
}) | ||
} | ||
return evalResults | ||
result.NoData = &NoDataResult{DatasourceToRefID: datasourceUIDsToRefIDs(execResults.NoData)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request was removed from the 9.4.0 milestone because 9.4.0 is currently being released. |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request was removed from the 9.5.0 milestone because 9.5.0 is currently being released. |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 2 weeks. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
Background
Alerting engine uses Grafana expression service to evaluate queries and transform the response. The engine transforms expression service's result to a list of results that contains the original value, labels, and the state (Alerting, Normal etc)
Execution of alert rule query can finish with generally 3 types of results:
null
. The null value is treated as NoData but it is specific to only the dimension (alert instance) that is identified by the set of labels, or in other words, it is complete no data.All those results are returned to the state manager as a list of
eval.Result
, where every element is treated as a separate state in the state manager and is identified by the set of labels. In the case of error the list will contain only one element with stateError
. In the case of "global" no-data - a single element with state NoData. It is important to note that in both cases the set of labels is empty.As mentioned above, the state manager processes every result from the list of results individually. In the case of Error or NoData states, it checks alert rule specification and determines what needs to be done with that "abnormal" result. The rule specification provides 3 mapping options:
DatasourceNoData
orDatasourceError
.Reference to the documentation
grafana/docs/sources/alerting/alerting-rules/create-grafana-managed-rule.md
Lines 72 to 86 in 0e4108f
According to the documentation, if the abnormal state is mapped to either OK or Alerting it should switch the current state to OK or Alerting (or Pending depending on For setting). However, that is not true in the general case. The problem is that the abnormal result is still treated by the state manager as an individual dimension (aka state, aka instance). As I mentioned before, the abnormal result usually does not have any labels or due to its abnormality, the set of labels can be different than the current states. This causes the state manager to create a new state instead of switching existing instances to the desired state.
Therefore the outcome of mapping abnormal results to Normal and Alerting state is not what user expects and the documentation declares.
What is this feature?
This PR does two things:
Why do we need this feature?
This fixes the bug of mapping Error|NoData results to OK|Alerting states.