Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: Show alert rule thresholds in panels #79971

Closed
wants to merge 15 commits into from

Conversation

gillesdemey
Copy link
Member

@gillesdemey gillesdemey commented Jan 3, 2024

What is this feature?

This feature adds visualisations of thresholds for dashboard panels when a Grafana alert is linked to it.

There are several caveats here worth mentioning;

  1. It will only visualise the thresholds for the first alert linked to the panel, subsequent linked alert rules will be ignored and thresholds are not merged.
  2. refIDs for the panel's queries have to match the refID of the alert rule linked to it. If they don't match it will either not show any thresholds or show incorrect ones.
  3. Since it uses fieldConfig.overrides it will always take precedence over any existing manually configured threshold(s). The PanelQueryRunner currently has no way to check for any manually configured thresholds.
  4. If any transformations are added to the data frames, the thresholds won't show
image

Why do we need this feature?

In legacy alerting the threshold of the alert was embedded in the panel's alert rule definition. Since we've decoupled alert rules from panels in the new Grafana alerting users had to manually configure thresholds on the panel.

This was both a duplicated effort and since there is only a weak reference between a panel and alert rules it also introduced the possibility of threshold definitions to drift between the panels visualisation and the alert rule definition.

Which issue(s) does this PR fix?:

Fixes #78748

Special notes for your reviewer:

It will use the fieldConfig.overrides so the data frames can update the panel's visualisation options and show the thresholds.

A user might have existing manual thresholds configured that we don't know about. I really haven't found any other way to update the panel's options from the PanelQueryRunner without refactoring a bunch here.

Setting this to draft for now, still needs some tests added and some need fixing.

Any pointers for alternative implementations are welcome!

Please check that:

@gillesdemey gillesdemey added area/alerting Grafana Alerting pr/early-feedback WIP state but looking for early feedback area/frontend labels Jan 3, 2024
@gillesdemey gillesdemey added this to the 10.4.x milestone Jan 3, 2024
@grafana-delivery-bot grafana-delivery-bot bot modified the milestones: 10.4.x, 10.3.x Jan 3, 2024
@gillesdemey gillesdemey marked this pull request as ready for review January 5, 2024 00:14
@gillesdemey gillesdemey requested review from grafanabot and a team as code owners January 5, 2024 00:14
@gillesdemey gillesdemey requested review from tskarhed, JoaoSilvaGrafana, dprokop, kaydelaney and konrad147 and removed request for a team January 5, 2024 00:14
);

// combineLatest will allow partial results depending on which API call resolves first
return combineLatest([fetchPromRules, fetchRulerRules]).pipe(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines +124 to +148
if (isAlertingRule(rule) && rule.annotations && rule.annotations[Annotation.panelID]) {
this.hasAlertRules[dashboard.uid] = true;
const panelId = Number(rule.annotations[Annotation.panelID]);
const state = promAlertStateToAlertState(rule.state);

// there can be multiple alerts per panel, so we make sure we get the most severe state:
// alerting > pending > ok
if (!panelIdToAlertState[panelId]) {
panelIdToAlertState[panelId] = {
state,
id: Object.keys(panelIdToAlertState).length,
panelId,
dashboardId: dashboard.id,
};
} else if (state === AlertState.Alerting && panelIdToAlertState[panelId].state !== AlertState.Alerting) {
panelIdToAlertState[panelId].state = AlertState.Alerting;
} else if (
state === AlertState.Pending &&
panelIdToAlertState[panelId].state !== AlertState.Alerting &&
panelIdToAlertState[panelId].state !== AlertState.Pending
) {
panelIdToAlertState[panelId].state = AlertState.Pending;
}
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is existing code but how about extracting this function for clarity?

Copy link
Contributor

@konrad147 konrad147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@soniaAguilarPeiron soniaAguilarPeiron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!🚀

@torkelo
Copy link
Member

torkelo commented Feb 15, 2024

don't merge this, PanelQueryRunner / old dashboard architecture is in a code freeze condition. Unless there is a matching feature PR for scenes this feature will be removed as we migrate to scenes in the next few weeks

@gillesdemey gillesdemey marked this pull request as draft February 15, 2024 17:28
@gillesdemey
Copy link
Member Author

Setting this back to draft

Copy link
Member

@torkelo torkelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you save the alert rule can't alerting update the dashboard panel config? So this happens on save instead every query refresh?

Or when we open the dashboard we do it once as part of the dashboard load?

try {
dashData = getDashboardQueryRunner().getResult(panel.id);
} catch (err) {
dashData = emptyResult();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we joining in the dashboard query result here again? it is already joined with the query result here:

panelData = mergePanelAndDashData(observable, getDashboardQueryRunner().getResult(panel.id));

@torkelo
Copy link
Member

torkelo commented Feb 16, 2024

@gillesdemey we had an idea of using a plain transformation here, a transformation can be async (it's observable). It can modify the data coming in and set the field config to include the thrresholds config

we could start with a manually added transformation (that users add). named "Sync alert rule thresholds"

@gillesdemey
Copy link
Member Author

I think that's a great idea! It would definitely fit better with the scenes model 👍

@aangelisc aangelisc modified the milestones: 10.4.x, 11.0.x Feb 20, 2024
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

@github-actions github-actions bot added the stale Issue with no recent activity label Mar 22, 2024
@ashharrison90 ashharrison90 modified the milestones: 11.0.x, 11.1.x Mar 26, 2024
@github-actions github-actions bot removed the stale Issue with no recent activity label Mar 27, 2024
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update or ping for review. Thank you for your contributions!

Copy link
Contributor

This pull request has been automatically closed because it has not had any further activity in the last 2 weeks. Thank you for your contributions!

@github-actions github-actions bot closed this May 10, 2024
@kevinwcyu kevinwcyu modified the milestones: 11.1.x, 11.1.0 Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add to changelog area/alerting Grafana Alerting area/frontend stale Issue with no recent activity
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Reinstate alert thresholds
8 participants