-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix(aci): support notifications for non-dual written metric detectors #103938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,13 +4,13 @@ | |
| from typing import NotRequired, TypedDict | ||
| from urllib import parse | ||
|
|
||
| import sentry_sdk | ||
| from django.db.models import Max | ||
| from django.urls import reverse | ||
| from django.utils.translation import gettext as _ | ||
|
|
||
| from sentry import features | ||
| from sentry.constants import CRASH_RATE_ALERT_AGGREGATE_ALIAS | ||
| from sentry.incidents.endpoints.serializers.utils import get_fake_id_from_object_id | ||
| from sentry.incidents.logic import GetMetricIssueAggregatesParams, get_metric_issue_aggregates | ||
| from sentry.incidents.models.alert_rule import AlertRule, AlertRuleThresholdType | ||
| from sentry.incidents.models.incident import ( | ||
|
|
@@ -239,7 +239,8 @@ def incident_attachment_info( | |
| if alert_rule_id is None: | ||
| raise ValueError("Alert rule id not found when querying for AlertRuleDetector") | ||
| except AlertRuleDetector.DoesNotExist: | ||
| raise ValueError("Alert rule detector not found when querying for AlertRuleDetector") | ||
| # the corresponding metric detector was not dual written | ||
| alert_rule_id = get_fake_id_from_object_id(alert_context.action_identifier_id) | ||
|
|
||
| workflow_engine_params = title_link_params.copy() | ||
|
|
||
|
|
@@ -248,8 +249,11 @@ def incident_attachment_info( | |
| group_open_period_id=metric_issue_context.open_period_identifier | ||
| ) | ||
| workflow_engine_params["alert"] = str(open_period_incident.incident_identifier) | ||
| except IncidentGroupOpenPeriod.DoesNotExist as e: | ||
| sentry_sdk.capture_exception(e) | ||
| except IncidentGroupOpenPeriod.DoesNotExist: | ||
| # the corresponding metric detector was not dual written | ||
| workflow_engine_params["alert"] = str( | ||
| get_fake_id_from_object_id(metric_issue_context.open_period_identifier) | ||
| ) | ||
|
Comment on lines
251
to
+256
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: API endpoints fail with 404 for metric alert links containing fake IDs due to missing conversion logic. 🔍 Detailed AnalysisFake IDs generated for metric alerts are embedded in URLs and passed to API endpoints. The 💡 Suggested FixImplement logic in 🤖 Prompt for AI AgentDid we get this right? 👍 / 👎 to inform future reviews.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| title_link = build_title_link(alert_rule_id, organization, workflow_engine_params) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,167 @@ | ||
| import uuid | ||
|
|
||
| import pytest | ||
|
|
||
| from sentry.incidents.endpoints.serializers.utils import get_fake_id_from_object_id | ||
| from sentry.incidents.grouptype import MetricIssue | ||
| from sentry.incidents.logic import CRITICAL_TRIGGER_LABEL | ||
| from sentry.incidents.models.alert_rule import AlertRuleDetectionType, AlertRuleThresholdType | ||
| from sentry.incidents.models.incident import IncidentStatus | ||
| from sentry.incidents.typings.metric_detector import AlertContext, MetricIssueContext | ||
| from sentry.integrations.metric_alerts import incident_attachment_info | ||
| from sentry.models.groupopenperiod import GroupOpenPeriod | ||
| from sentry.testutils.cases import BaseIncidentsTest, TestCase | ||
|
|
||
| pytestmark = pytest.mark.sentry_metrics | ||
|
|
||
|
|
||
| def incident_attachment_info_with_metric_value(incident, new_status, metric_value): | ||
| return incident_attachment_info( | ||
| organization=incident.organization, | ||
| alert_context=AlertContext.from_alert_rule_incident(incident.alert_rule), | ||
| metric_issue_context=MetricIssueContext.from_legacy_models( | ||
| incident, new_status, metric_value | ||
| ), | ||
| ) | ||
|
|
||
|
|
||
| class IncidentAttachmentInfoTest(TestCase, BaseIncidentsTest): | ||
| def setUp(self) -> None: | ||
| super().setUp() | ||
| self.group = self.create_group(type=MetricIssue.type_id) | ||
|
|
||
| def test_returns_correct_info_with_workflow_engine_dual_write(self) -> None: | ||
| """ | ||
| This tests that we look up the correct incident and alert rule during dual write ACI migration. | ||
| """ | ||
| alert_rule = self.create_alert_rule() | ||
| date_started = self.now | ||
| incident = self.create_incident( | ||
| self.organization, | ||
| title="Incident #1", | ||
| projects=[self.project], | ||
| alert_rule=alert_rule, | ||
| status=IncidentStatus.CLOSED.value, | ||
| date_started=date_started, | ||
| ) | ||
| trigger = self.create_alert_rule_trigger(alert_rule, CRITICAL_TRIGGER_LABEL, 100) | ||
| self.create_alert_rule_trigger_action( | ||
| alert_rule_trigger=trigger, triggered_for_incident=incident | ||
| ) | ||
| metric_value = 123 | ||
| referrer = "metric_alert_custom" | ||
| notification_uuid = str(uuid.uuid4()) | ||
|
|
||
| detector = self.create_detector(project=self.project) | ||
| self.create_alert_rule_detector(alert_rule_id=alert_rule.id, detector=detector) | ||
|
|
||
| open_period = ( | ||
| GroupOpenPeriod.objects.filter(group=self.group).order_by("-date_started").first() | ||
| ) | ||
| assert open_period is not None | ||
| self.create_incident_group_open_period(incident, open_period) | ||
|
|
||
| metric_issue_context = MetricIssueContext.from_legacy_models( | ||
| incident, IncidentStatus.CLOSED, metric_value | ||
| ) | ||
| # Setting the open period identifier to the open period id, since we are testing the lookup | ||
| metric_issue_context.open_period_identifier = open_period.id | ||
| metric_issue_context.group = self.group | ||
| assert metric_issue_context.group is not None | ||
|
|
||
| data = incident_attachment_info( | ||
| organization=incident.organization, | ||
| alert_context=AlertContext( | ||
| name=alert_rule.name, | ||
| # Setting the action identifier id to the detector id since that's what the NOA does | ||
| action_identifier_id=detector.id, | ||
| threshold_type=AlertRuleThresholdType(alert_rule.threshold_type), | ||
| detection_type=AlertRuleDetectionType(alert_rule.detection_type), | ||
| comparison_delta=alert_rule.comparison_delta, | ||
| sensitivity=alert_rule.sensitivity, | ||
| resolve_threshold=alert_rule.resolve_threshold, | ||
| alert_threshold=None, | ||
| ), | ||
| metric_issue_context=metric_issue_context, | ||
| notification_uuid=notification_uuid, | ||
| referrer=referrer, | ||
| ) | ||
|
|
||
| assert data["title"] == f"Resolved: {alert_rule.name}" | ||
| assert data["status"] == "Resolved" | ||
| assert data["text"] == "123 events in the last 10 minutes" | ||
| # We still build the link using the alert_rule_id and the incident identifier | ||
| assert ( | ||
| data["title_link"] | ||
| == f"http://testserver/organizations/baz/alerts/rules/details/{alert_rule.id}/?alert={incident.identifier}&referrer={referrer}&detection_type=static¬ification_uuid={notification_uuid}" | ||
| ) | ||
| assert ( | ||
| data["logo_url"] | ||
| == "http://testserver/_static/{version}/sentry/images/sentry-email-avatar.png" | ||
| ) | ||
|
|
||
| def test_returns_correct_info_placeholder_incident(self) -> None: | ||
| """ | ||
| Test that we use the fake incident ID to build the title link if no IGOP entry exists (if the detector was not dual written). | ||
| """ | ||
| alert_rule = self.create_alert_rule() | ||
| date_started = self.now | ||
| incident = self.create_incident( | ||
| self.organization, | ||
| title="Incident #1", | ||
| projects=[self.project], | ||
| alert_rule=alert_rule, | ||
| status=IncidentStatus.CLOSED.value, | ||
| date_started=date_started, | ||
| ) | ||
| trigger = self.create_alert_rule_trigger(alert_rule, CRITICAL_TRIGGER_LABEL, 100) | ||
| self.create_alert_rule_trigger_action( | ||
| alert_rule_trigger=trigger, triggered_for_incident=incident | ||
| ) | ||
| metric_value = 123 | ||
| referrer = "metric_alert_custom" | ||
| notification_uuid = str(uuid.uuid4()) | ||
|
|
||
| detector = self.create_detector(project=self.project) | ||
|
|
||
| open_period = ( | ||
| GroupOpenPeriod.objects.filter(group=self.group).order_by("-date_started").first() | ||
| ) | ||
| assert open_period is not None | ||
|
|
||
| metric_issue_context = MetricIssueContext.from_legacy_models( | ||
| incident, IncidentStatus.CLOSED, metric_value | ||
| ) | ||
| # Setting the open period identifier to the open period id, since we are testing the lookup | ||
| metric_issue_context.open_period_identifier = open_period.id | ||
| metric_issue_context.group = self.group | ||
| assert metric_issue_context.group is not None | ||
|
|
||
| # XXX: for convenience, we populate the AlertContext with alert rule/incident information. In this test, | ||
| # we're just interested in how the method handles missing AlertRuleDetectors/IGOPs. | ||
| data = incident_attachment_info( | ||
| organization=incident.organization, | ||
| alert_context=AlertContext( | ||
| name=alert_rule.name, | ||
| # Setting the action identifier id to the detector id since that's what the NOA does | ||
| action_identifier_id=detector.id, | ||
| threshold_type=AlertRuleThresholdType(alert_rule.threshold_type), | ||
| detection_type=AlertRuleDetectionType(alert_rule.detection_type), | ||
| comparison_delta=alert_rule.comparison_delta, | ||
| sensitivity=alert_rule.sensitivity, | ||
| resolve_threshold=alert_rule.resolve_threshold, | ||
| alert_threshold=None, | ||
| ), | ||
| metric_issue_context=metric_issue_context, | ||
| notification_uuid=notification_uuid, | ||
| referrer=referrer, | ||
| ) | ||
|
|
||
| fake_alert_rule_id = get_fake_id_from_object_id(detector.id) | ||
| fake_incident_identifier = get_fake_id_from_object_id(open_period.id) | ||
|
|
||
| # Build the link using the fake alert_rule_id and the fake incident identifier | ||
| assert ( | ||
| data["title_link"] | ||
| == f"http://testserver/organizations/baz/alerts/rules/details/{fake_alert_rule_id}/?alert={fake_incident_identifier}&referrer={referrer}&detection_type=static¬ification_uuid={notification_uuid}" | ||
| ) | ||
|
Comment on lines
+164
to
+167
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Forgive me if I missed some context, but will users actually see these links? They won't work, right? Or will we be sending different links for single written detectors that work?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, they will see these links. Mia has worked on redirects, so they will link to the monitor. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug:
AlertRuleDetector.objects.values_list(...).get()returnsNonefor issue rule detectors, causingValueErrorinstead of graceful fallback.Severity: HIGH | Confidence: High
🔍 Detailed Analysis
The code in
metric_alerts.pyattempts to retrievealert_rule_idfromAlertRuleDetector. If anAlertRuleDetectorexists but hasrule_idset andalert_rule_idisNULL(indicating it's linked to an issue rule detector), the.get()call will returnNone. The subsequentif alert_rule_id is None:check then raises aValueError, preventing the intended graceful fallback to a fake ID. This occurs when a detector is misconfigured or partially migrated, leading to an unexpected crash instead of a graceful fallback.💡 Suggested Fix
Modify the code to handle
alert_rule_idbeingNonewhenrule_idis present, allowing the graceful fallback mechanism to activate.🤖 Prompt for AI Agent
Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID:
3334006There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue alerts don't go through this code. Silly bot.