Skip to content

Conversation

@cathteng
Copy link
Member

Same as #103000

But adds in fetching the detector correctly for Activities. Detector is specifically needed to send activity notifications correctly for metric issue resolution

@cathteng cathteng requested review from a team as code owners November 10, 2025 21:54
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 10, 2025
return Detector.objects.get(project_id=group.project_id, type=group.issue_type.slug)
except Detector.DoesNotExist:
# return issue stream detector
return Detector.objects.get(project_id=group.project_id, type=IssueStreamGroupType.slug)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Error handling: Logging noise and crashes

The get_detector_by_group function uses logger.exception() for expected control flow when DetectorGroup.DoesNotExist is raised. This logs full stack traces to error tracking for normal fallback behavior, polluting monitoring systems. Additionally, the final fallback Detector.objects.get(project_id=group.project_id, type=IssueStreamGroupType.slug) can raise an unhandled Detector.DoesNotExist exception if no issue stream detector exists for the project, causing crashes instead of graceful error handling.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should crash if there's no detector at the end

@codecov
Copy link

codecov bot commented Nov 10, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
29667 1 29666 243
View the top 1 failed test(s) by shortest run time
tests.sentry.workflow_engine.endpoints.test_organization_test_fire_action.TestFireActionsEndpointTest::test_pagerduty_action
Stack Traces | 4.21s run time
#x1B[1m#x1B[.../workflow_engine/endpoints/test_organization_test_fire_action.py#x1B[0m:103: in test_pagerduty_action
    assert pagerduty_data["payload"]["summary"].startswith(
#x1B[1m#x1B[31mE   AssertionError: assert False#x1B[0m
#x1B[1m#x1B[31mE    +  where False = <built-in method startswith of str object at 0x7f35fc45c150>('[Cool Gibbon]:')#x1B[0m
#x1B[1m#x1B[31mE    +    where <built-in method startswith of str object at 0x7f35fc45c150> = '[Golden Hare]: This is an example None exception'.startswith#x1B[0m

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

pass

try:
return Detector.objects.get(project_id=group.project_id, type=group.issue_type.slug)
Copy link
Member

@ceorourke ceorourke Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other types of detectors (besides the issue stream type handled later) are there only 1 per project? Is it just the error detector type or are there others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance detectors will probably be 1 per project(?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so there's an assumption that anything found for DetectorGroup is for metric issues, then it tries the other types and finally falls back to issue stream group types? Would an uptime or cron detector potentially hit a MultipleObjectsReturned? Maybe it'd be safer to pass a list of types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think uptime and crons hit DetectorGroup too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it so that this is only used for activity notifications (e.g. resolve for metric issues)

We only process updates if the detector_id is attached to the StatusChangeMessage

detector_id = status_change_message.get("detector_id")
if detector_id is None:
# We should not hit this case, it's should only occur if there is a bug
# passing it from the workflow_engine to the issue platform.
metrics.incr("workflow_engine.tasks.error.no_detector_id")
return

If detector_id exists, I believe it must have existed when the occurrence to create the issue was sent to issue platform, meaning we would have already written the DetectorGroup row

if is_new and occurrence.evidence_data and "detector_id" in occurrence.evidence_data:
associate_new_group_with_detector(group, occurrence.evidence_data["detector_id"])

@cathteng cathteng requested a review from saponifi3d November 10, 2025 23:16
@cathteng cathteng force-pushed the cathy/aci/remove-detector-action-trigger branch from c64e0f8 to 065ac2a Compare November 11, 2025 22:48
def trigger(self, event_data: WorkflowEventData) -> None:
from sentry.workflow_engine.processors.detector import get_detector_by_group

detector = get_detector_by_group(event_data.group)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered tracking how often a Detector is found here?
We know coverage isn't 100%, so while this is probably going to work for newer groups, maybe not older ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I should update this to also use either get_detector_by_group or get_detector_by_event depending on whether the event is a GroupEvent or Activity

We will only send the activity through workflow engine if a detector_id exists in issue platform

if detector_id is None:
# We should not hit this case, it's should only occur if there is a bug
# passing it from the workflow_engine to the issue platform.
metrics.incr("workflow_engine.tasks.error.no_detector_id")
return
process_workflow_activity.delay(
activity_id=activity.id,
group_id=group.id,
detector_id=detector_id,
)

I am assuming that if that's the case, then we would have already written the DetectorGroup row

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking is good, yes

Copy link
Member

@kcons kcons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is that we have known incompleteness of DetectorGroup that we were investigating yesterday, so moving to rely on it exclusively for action triggering will be a bit of a regression.
For error it's nbd because there's an easy fallback, but for others it seems worth being confident on coverage of DetectorGroup before making it load-bearing.

)
raise Detector.DoesNotExist("Detector not found for event data")

raise TypeError(f"Cannot determine the detector from {type(event_data.event)}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes more sense as an else above.

"group_id": event_data.group.id,
},
)
raise Detector.DoesNotExist("Detector not found for event data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems slightly better to re-raise, as we've already sent an exception to sentry, and if we reraise and someone tries to send it again it should be dropped by dedup, vs raising a new one which will result in potential double-reporting.

try:
return DetectorGroup.objects.get(group=group).detector
except DetectorGroup.DoesNotExist:
logger.exception(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll probably block deploy. We know DetectorGroup coverage to be incomplete for metric and error issues.
I think it's okay to report that, just making sure we're aware.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this will fail to fetch detectors for most metric issue groups.


try:
return Detector.objects.get(project_id=group.project_id, type=group.issue_type.slug)
except Detector.DoesNotExist:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, this sort of logic wouldn't need to depend on our awareness of current group type conventions..
it'd be something like

if detector_type := group.issue_type.singleton_detector_type:
return Detector.objects.get(....)

that is, each type explicitly is configured to use a detectors or not. The scheme for detector association being knowable based on GroupType seems like a nice thing to have. "Which groups use the issue stream detector?" would be answerable with a grep.

Copy link
Contributor

@mifu67 mifu67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're already aware of this, but just calling out that we shouldn't merge this until the backfill is done, or metric actions will break.

try:
return DetectorGroup.objects.get(group=group).detector
except DetectorGroup.DoesNotExist:
logger.exception(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this will fail to fetch detectors for most metric issue groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants