Skip to content

ref(seer): Refactor night shift into modules and use search backend#112635

Merged
chromy merged 8 commits intomasterfrom
trevor-e/ref/night-shift-llm-proxy
Apr 10, 2026
Merged

ref(seer): Refactor night shift into modules and use search backend#112635
chromy merged 8 commits intomasterfrom
trevor-e/ref/night-shift-llm-proxy

Conversation

@trevor-e
Copy link
Copy Markdown
Member

@trevor-e trevor-e commented Apr 9, 2026

Refactors night shift from a single file into a package with separate modules for cron scheduling, simple triage, agentic triage, and shared models.

Switches fixability_score_strategy from a direct ORM query to using search.backend.query() with the recommended sort. This gives us the same ranking algorithm used in the issues list UI (recency, spike detection, severity, user impact, event volume) as a pre-filter, then re-ranks by fixability score in-memory.

Other changes:

  • Adds reset_snuba_data flag to SnubaTestCase so tests can opt out of dropping ClickHouse data between runs
  • Adds bin/seer/trigger-night-shift script for local testing
  • Registers a new Snuba referrer for the search query
  • Uses pydantic model_validate_json for LLM response parsing in agentic triage

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 9, 2026
Comment thread src/sentry/tasks/seer/night_shift/agentic_triage.py
@pytest.fixture(autouse=True)
def initialize(self, reset_snuba, call_snuba):
def initialize(self, request, call_snuba):
if self.reset_snuba_data:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was driving me insane. Running tests should not reset your entire devbox's data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥲 🥲 🥲

Comment thread src/sentry/tasks/seer/night_shift/agentic_triage.py
Comment thread src/sentry/tasks/seer/night_shift/agentic_triage.py
@trevor-e trevor-e marked this pull request as ready for review April 10, 2026 02:47
@trevor-e trevor-e requested review from a team as code owners April 10, 2026 02:47
Comment on lines +59 to +63
search_filters=[
SearchFilter(SearchKey("status"), "=", SearchValue([GroupStatus.UNRESOLVED])),
SearchFilter(SearchKey("issue.seer_last_run"), "=", SearchValue("")),
],
referrer=Referrer.SEER_NIGHT_SHIFT_FIXABILITY_SCORE_STRATEGY.value,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The new search filter for issue.seer_last_run only checks one of two timestamp fields, allowing previously processed issues to be re-triaged, unlike the old logic which checked both.
Severity: HIGH

Suggested Fix

Modify the search query to ensure it checks both seer_autofix_last_triggered and seer_explorer_autofix_last_triggered for null values, restoring the original behavior. This will prevent issues processed by either Seer pathway from being re-selected for triage.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/tasks/seer/night_shift/simple_triage.py#L59-L63

Potential issue: The refactored `fixability_score_strategy` function changes the logic
for filtering previously processed Seer issues. The new `SearchFilter` for
`issue.seer_last_run` resolves to checking only one of two timestamp fields
(`seer_autofix_last_triggered` or `seer_explorer_autofix_last_triggered`) based on a
feature flag. The previous implementation correctly checked for null values in both
fields. This regression means that an issue processed by one Seer pathway can be
incorrectly re-selected for triage by the other, leading to duplicated work and wasted
compute resources.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, we can re-run an explorer client run even if they had a legacy run.

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7d9b2a5. Configure here.

limit=NIGHT_SHIFT_ISSUE_FETCH_LIMIT,
search_filters=[
SearchFilter(SearchKey("status"), "=", SearchValue([GroupStatus.UNRESOLVED])),
SearchFilter(SearchKey("issue.seer_last_run"), "=", SearchValue("")),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seer last run filter checks only one field

Medium Severity

The old ORM query filtered on both seer_autofix_last_triggered__isnull=True AND seer_explorer_autofix_last_triggered__isnull=True, ensuring groups triggered by either mechanism were excluded. The new issue.seer_last_run search filter maps to a ScalarCondition that only checks one of those fields based on the organizations:autofix-on-explorer feature flag. This means groups previously processed via the other code path could be re-selected as candidates, leading to duplicate processing.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7d9b2a5. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this is fine and we can re-run an explorer client run even if they had a legacy run.

Copy link
Copy Markdown
Contributor

@chromy chromy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

referrer="night_shift.triage",
prompt=_build_triage_prompt(candidates),
system_prompt="",
temperature=0.0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temperature=0.0 on purpose here?

@chromy chromy merged commit dca7d73 into master Apr 10, 2026
77 checks passed
@chromy chromy deleted the trevor-e/ref/night-shift-llm-proxy branch April 10, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants