Alerting: Support hysteresis command expression #75189

yuri-tceretian · 2023-09-20T17:55:38Z

What is this feature?
This PR adds support for a new type of command introduced in #70998. The hysteresis command requires information about the results of the previous evaluation. This PR updates the state manager and scheduler to facilitate this information for the evaluation engine:

state manager is updated to keep a fingerprint of the result instance for each State. It differs from CacheID or Labelshash because it is calculated from the labels of the raw result.
scheduler wraps the state manager and the rule into an intermediate structure AlertingResultsFromRuleState and provides it to the rule evaluator.
The evaluator analyzes the expression tree (parsed rule's query) and if it contains HysteresisCommand, it populates it with the data taken from the AlertingResultsFromRuleState.

The structure AlertingResultsFromRuleState implements the logic that determines what metric\dimension\state is considered as "loaded":

state Pending or Alerting
state reason is empty. This is to avoid execution of exceptional states such as Error\NoData as Alerting.

Why do we need this feature?
To support hysteresis expression in alerting. This will help users reduce flapping on the alert rules when the metric value frequently crosses threshold boundaries.

Who is this feature for?
Alerting

Which issue(s) does this PR fix?:

Related #6202

Special notes for your reviewer:

PR can be reviewed by commit. In some commits I provide the explanations for chosen approach.
The majority of the changes are tests. The core backend changes are in just two commits da01d9a and ef96be8
the feature is behind the flag recoveryThreshold. if it is disabled the expr engine does not create HysteresisCommand struct and code in this PR does not do anything.

UI screen recording

hysteresis_UI.mp4

When viewing the rule

Please check that:

It works as expected from a user's perspective.
If this is a pre-GA feature, it is behind a feature toggle.
The docs are updated, and if this is a notable improvement, it's added to our What's New doc.

public/app/features/alerting/unified/components/expressions/Expression.tsx

ephemeral-instances-bot · 2023-12-06T17:30:19Z

Your instance can be accessed at: https://ephemeral1511182175189yuritce.grafana-dev.net
The instance is not using the CDN assets.
How to access / How to update instance config / How to build a specific branch

soniaAguilarPeiron · 2023-12-07T12:39:30Z

last commit : skip hysteresis values being validated when unchecking hysteresis.

fix-hysteresis-validation.mp4

soniaAguilarPeiron · 2023-12-07T12:39:43Z

/deploy-to-hg

# Conflicts: # pkg/expr/threshold_test.go

soniaAguilarPeiron · 2023-12-14T14:43:10Z

/deploy-to-hg

soniaAguilarPeiron · 2023-12-15T09:02:44Z

My last commit is a fix for the style for hysteresis fields in dashboard panel expressions.

Before the change:

After the change:

soniaAguilarPeiron · 2023-12-18T14:58:40Z

After a conversation with @yuri-tceretian we agreed on hiding hsyteresis in panel expressions.

# Conflicts: # pkg/tests/api/alerting/testing.go

JacobsonMT · 2024-01-04T15:35:28Z

/deploy-to-hg

ephemeral-instances-bot · 2024-01-04T15:36:31Z

Preparing your instance. A comment containing your instance's url will be added to this PR when the instance is ready.
Your instance will be ready in ~10 minutes.
Check the GitHub actions tab to follow the workflow progress
Slack channel: #proj-ephemeral-hg-instances
Building instance with yuri-tceretian/expr-hysteresis-alerting oss branch and main enterprise branch. How to choose a branch

ephemeral-instances-bot · 2024-01-04T15:47:00Z

Your instance can be accessed at: https://ephemeral1511182175189yuritce.grafana-dev.net
The instance is not using the CDN assets.
How to access / How to update instance config / How to build a specific branch

JacobsonMT

Great job! Confirmed working as expected, there are parts I'm not 100% sure on the implementation but considering this is behind a feature flag it's not necessary to dwell on those details at this stage. 🚀

Backend: * Update the Grafana Alerting engine to provide feedback to HysteresisCommand. The feedback information is stored in state.Manager as a fingerprint of each state. The fingerprint is persisted to the database. Only fingerprints that belong to Pending and Alerting states are considered as "loaded" and provided back to the command. - add ResultFingerprint to state.State. It's different from other fingerprints we store in the state because it is calculated from the result labels. - add rule_fingerprint column to alert_instance - update alerting evaluator to accept AlertingResultsReader via context, and update scheduler to provide it. - add AlertingResultsFromRuleState that implements the new interface in eval package - update getExprRequest to patch the hysteresis command. * Only one "Recovery Threshold" query is allowed to be used in the alert rule and it must be the Condition. Frontend: * Add hysteresis option to Threshold in UI. It's called "Recovery Threshold" * Add test for getUnloadEvaluatorTypeFromCondition * Hide hysteresis in panel expressions * Refactor isInvalid and add test for it * Remove unnecesary React.memo * Add tests for updateEvaluatorConditions --------- Co-authored-by: Sonia Aguilar <soniaaguilarpeiron@gmail.com>

yuri-tceretian self-assigned this Sep 20, 2023

grafana-delivery-bot bot added this to the 10.2.x milestone Sep 20, 2023

grafana-pr-automation bot added area/frontend type/docs area/backend/db/migration area/backend labels Sep 20, 2023

yuri-tceretian force-pushed the yuri-tceretian/expr-hysteresis-alerting branch from d64d02a to 5e1b397 Compare September 20, 2023 19:52

yuri-tceretian force-pushed the yuri-tceretian/expr-hysteresis-alerting branch from 72aa808 to 24d23fd Compare October 10, 2023 14:56

yuri-tceretian marked this pull request as ready for review October 11, 2023 09:30

yuri-tceretian requested review from a team as code owners October 11, 2023 09:30

yuri-tceretian requested a review from a team October 11, 2023 09:30

yuri-tceretian requested a review from a team as a code owner October 11, 2023 09:30

yuri-tceretian requested review from gillesdemey, VikaCep, konrad147, soniaAguilarPeiron, rwwiv, JacobsonMT, grobinson-grafana, papagian, zserge and yangkb09 and removed request for a team October 11, 2023 09:30

yuri-tceretian added add to changelog no-backport Skip backport of PR add to what's new labels Oct 11, 2023

soniaAguilarPeiron reviewed Oct 16, 2023

View reviewed changes

public/app/features/alerting/unified/components/expressions/Expression.tsx Outdated Show resolved Hide resolved

yuri-tceretian mentioned this pull request Dec 6, 2023

SSE: Add utility methods for HysteresisCommand #79157

Merged

3 tasks

Skip validation when unchecking hysteresis checkbox

e009d71

grafana deleted a comment from ephemeral-instances-bot bot Dec 7, 2023

Merge branch 'up/main' into yuri-tceretian/expr-hysteresis-alerting

3d15919

# Conflicts: # pkg/expr/threshold_test.go

grafana deleted a comment from ephemeral-instances-bot bot Dec 14, 2023

Fix hysteresis fields style in dashboard panel

3cb7c60

Hide hysteresis in panel expressions

8c78706

grafana deleted a comment from ephemeral-instances-bot bot Jan 3, 2024

yuri-tceretian added 2 commits January 3, 2024 15:20

Merge branch 'up/main' into yuri-tceretian/expr-hysteresis-alerting

c1f9ddc

# Conflicts: # pkg/tests/api/alerting/testing.go

use new sendRequest function

52370ca

grafana deleted a comment from ephemeral-instances-bot bot Jan 4, 2024

JacobsonMT approved these changes Jan 4, 2024

View reviewed changes

yuri-tceretian merged commit f6a4674 into main Jan 4, 2024
15 checks passed

yuri-tceretian deleted the yuri-tceretian/expr-hysteresis-alerting branch January 4, 2024 16:47

gillesdemey mentioned this pull request Jan 5, 2024

feature request: hysteresis for alerts #6202

Closed

yuri-tceretian mentioned this pull request Jan 5, 2024

Alerting: Enable recovery threshold feature by default #80088

Merged

3 tasks

summerwollin modified the milestones: 10.3.x, 10.3.0 Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerting: Support hysteresis command expression #75189

Alerting: Support hysteresis command expression #75189

yuri-tceretian commented Sep 20, 2023 •

edited

Loading

ephemeral-instances-bot bot commented Dec 6, 2023

soniaAguilarPeiron commented Dec 7, 2023

soniaAguilarPeiron commented Dec 7, 2023

soniaAguilarPeiron commented Dec 14, 2023

soniaAguilarPeiron commented Dec 15, 2023

soniaAguilarPeiron commented Dec 18, 2023

JacobsonMT commented Jan 4, 2024

ephemeral-instances-bot bot commented Jan 4, 2024

ephemeral-instances-bot bot commented Jan 4, 2024

JacobsonMT left a comment

Alerting: Support hysteresis command expression #75189

Alerting: Support hysteresis command expression #75189

Conversation

yuri-tceretian commented Sep 20, 2023 • edited Loading

ephemeral-instances-bot bot commented Dec 6, 2023

soniaAguilarPeiron commented Dec 7, 2023

soniaAguilarPeiron commented Dec 7, 2023

soniaAguilarPeiron commented Dec 14, 2023

soniaAguilarPeiron commented Dec 15, 2023

soniaAguilarPeiron commented Dec 18, 2023

JacobsonMT commented Jan 4, 2024

ephemeral-instances-bot bot commented Jan 4, 2024

ephemeral-instances-bot bot commented Jan 4, 2024

JacobsonMT left a comment

Choose a reason for hiding this comment

yuri-tceretian commented Sep 20, 2023 •

edited

Loading