Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: Make Loki & Prometheus instant vector by default #66797

Merged
merged 32 commits into from
Apr 27, 2023

Conversation

gillesdemey
Copy link
Member

@gillesdemey gillesdemey commented Apr 18, 2023

What is this feature?

This feature makes the following changes to Alerting:

Non-visually

  • Adds support for getDefaultQuery() for datasources
  • Makes Prometheus and Loki queries use instant responses by default in Alerting

Visually

  • Removes the table and stat visualisations as these were broken for dataframes with lots of labels
  • Automatically shows the most useful visualization based on the dataframe type (timeseries vs numeric-multi)
  • Uses the <GraphContainer /> from the explore view since it has a few features that are useful in Alerting
    • Slicing the set of data frames / series to a reasonable default
    • Toggling series works out-of-the-box
    • Bar / Points vizualizations
    • Loading states
  • Uses the existing <ExpressionResult /> component to render numeric-multi dataframes
  • Updates the <ExpressionResult /> to syntax color labels
  • Adds threshold visualisations to <GrafanaRuleQueryViewer />

And makes the following changes to other parts of Grafana that aren't Alerting:

  • Updates the <GraphContainer /> to support thresholds
  • Adds a onDataSourceLoaded() hook to <QueryEditorRow />

image

Why do we need this feature?

This would reduce to TCO for Loki and Mimir by no longer querying a range of data for each alert rule evaluation.

Which issue(s) does this PR fix?:

Fixes #

Special notes for your reviewer:

Setting this to draft for now until I can figure out how to write some tests for some of the most important bits here.

still need to figure out how to make this work when the datasource is the default datasource
also need to figure out how we can remove the default expressions when instant datasource is selected
@github-actions
Copy link
Contributor

Backend code coverage report for PR #66797
No changes

@github-actions
Copy link
Contributor

github-actions bot commented Apr 19, 2023

Frontend code coverage report for PR #66797

Plugin Main PR Difference
explore 86.34% 86.34% 0%
loki 84.69% 84.69% 0%

@gillesdemey
Copy link
Member Author

I'll double check with the alerting BE team if the feature flag is relevant to us, the numeric-multi dataframes are returned by the /api/v1/eval endpoint and not from the Loki / Prom data sources directly.

Copy link
Member

@ivanahuckova ivanahuckova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed loki part of change and added feedback. Let me know what do you think!

public/app/plugins/datasource/loki/datasource.ts Outdated Show resolved Hide resolved
@gillesdemey gillesdemey force-pushed the alerting/loki-prometheus-instant-vector branch from f213a30 to eb6f806 Compare April 24, 2023 14:23
@@ -261,6 +260,9 @@ export function getThresholdsForQueries(queries: AlertQuery[]) {
// now also sort the threshold values, if we don't then they will look weird in the time series panel
// TODO this doesn't work for negative values for now, those need to be sorted inverse
thresholds[refId].config.steps.sort((a, b) => a.value - b.value);

// also make sure we remove any "undefined" values from our steps in case the threshold config is incomplete
thresholds[refId].config.steps = thresholds[refId].config.steps.filter((step) => step.value !== undefined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change needed for this PR changes? or you added as an improvement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly need, it fixes a small bug I found while testing my changes where an incomplete threshold definition (like is between 0 and <undefined> would break the time series visualization and show a blank canvas.

Copy link
Member

@soniaAguilarPeiron soniaAguilarPeiron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!🚀
I added some non-blocking comments

@konrad147
Copy link
Contributor

Great job. Visualizations are handled much better now!
I have one general concern about the use of instant vectors - Using them, we're not able to visualize the query as a time series chart which might be helpful in many scenarios (esp., how to select a sensible threshold value)
In the future, we could try to use an instant query for alert definition and range just for the preview

{({ width }) => (
<div style={{ width }}>
{isTimeSeriesData ? (
<GraphContainer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like PanelRenderer has built-in error handling. As we no longer use it we need to handle error on our own

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added error handling back in again; using the statusMessage of the <PanelChrome /> in <VizWrapper />

image

@gillesdemey
Copy link
Member Author

Yea valid concern; unfortunately we have to choose between;

Instant vector – better for querying as fewer data is transferred and is faster, but not time series viz
Range – more data transfer, slower but nice time series viz

Unless we figure out how to rewrite queries I don't see how we could have both :(

Copy link
Contributor

@konrad147 konrad147 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants