Skip to content

feat(supergroups): lightweight RCA prototype#110191

Merged
yuvmen merged 4 commits intomasterfrom
cvxluo/spvvxrzymyty
Mar 13, 2026
Merged

feat(supergroups): lightweight RCA prototype#110191
yuvmen merged 4 commits intomasterfrom
cvxluo/spvvxrzymyty

Conversation

@cvxluo
Copy link
Contributor

@cvxluo cvxluo commented Mar 9, 2026

Summary

Adds a lightweight Explorer RCA that runs in parallel alongside the main explorer-based autofix, for debugging and evaluation purposes.

  • SeerExplorerClient gains max_iterations support — allows capping the number of agent iterations per run
  • trigger_lightweight_rca() — fires a low-intelligence, limited-iteration (3 max) Explorer RCA on an issue, gated by its own feature flag (projects:supergroup-lightweight-rca)
  • Wired into _trigger_autofix_task — lightweight RCA triggers only on the explorer code path (alongside trigger_autofix_explorer), so it only fires for Seer customers using explorer-based autofix. Failures are caught and logged without blocking the main flow.

Why this is invisible to users

Lightweight runs use category_key="lightweight_rca", while the frontend and API query category_key="autofix" — so these runs never appear in the UI or API responses.

Feature flags

Flag Scope Purpose
projects:supergroup-lightweight-rca Project Gates lightweight RCA triggering
organizations:seer-explorer + organizations:autofix-on-explorer Organization Required for the explorer path where lightweight RCA lives

Relates to ID-1385

cvxluo and others added 2 commits March 5, 2026 15:27
Add max_iterations field to ExplorerChatRequest TypedDict and SeerExplorerClient.

When set, it's passed through to the Seer API to limit agent iterations,

enabling lightweight/fast Explorer runs.


Co-authored-by: Claude <noreply@anthropic.com>
…er RCA

Creates a function to trigger low-cost, fast Explorer RCA runs with

max_iterations=3 and intelligence_level=low. Results are stored with

category_key=lightweight_rca for later quality evaluation against full RCA.


Co-authored-by: Claude <noreply@anthropic.com>
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 9, 2026
Call trigger_lightweight_rca() inside _trigger_autofix_task alongside
trigger_autofix_explorer(), so it only fires on the explorer code path
for Seer customers. Failures are caught and logged without blocking the
main autofix flow. Lightweight runs remain invisible to users via their
separate category_key="lightweight_rca".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yuvmen yuvmen marked this pull request as ready for review March 11, 2026 22:23
@yuvmen yuvmen requested a review from a team as a code owner March 11, 2026 22:23
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Lightweight RCA runs are meant to be invisible, but AutofixOnCompletionHook
sends external webhooks (seer.root_cause_completed) and triggers supergroup
embeddings on completion. Pass on_completion_hook=None instead to avoid
unintended side effects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@linear-code
Copy link

linear-code bot commented Mar 12, 2026

stopping_point=stopping_point,
)
try:
trigger_lightweight_rca(group)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the lightweight runs be exposed to the user through the history? not sure if there's a way to trigger an explorer run and keep it hidden to the user

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like today in Sentry whenever we get runs we use the "autofix" category so we should be ok. The Seer API technically though could return them if asked without a category, I dont see a place in our product where we ever just expose all Explorer runs, though we might at some point... might be safer to have some way to hide it in Seer as experimental or somehing so it doesnt get accidentally exposed, but it doesnt seem like a must right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm i thought if you open explorer (cmd /) and then do /resume you could see all your runs. not working for me though. agreed that i'm not too concerned about it right now, but might be worth asking the explorer team

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting was not aware, ill ask them if there are surfraces we are unaware of

Copy link
Member

@yuvmen yuvmen Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be fine it seems, the /resume filters on user_id which autofix runs (and our new ones) dont set so they dont show up. Asked them about it and nothing else came up so I think were good

project=group.project,
user=None,
intelligence_level="low",
max_iterations=3,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: kush's comment in slack, i'm thinking about it and 3 iterations does seem quite low. maybe restricting the number of tools / making the tool call returns smaller will be a better optimization, but since we probably want to test this out and optimize from there, this isn't blocking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from Kush's comment this might be blocking, I am looking into it on the Seer side trying to understand, might have something to change there before we merge this


short_id = group.qualified_short_id or str(group.id)
title = group.title or "Unknown error"
culprit = group.culprit or "unknown"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, what is culprit on issues? do we pass this to normal explorer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea its the same as the regular one:

culprit=group.culprit or "unknown",

@yuvmen
Copy link
Member

yuvmen commented Mar 13, 2026

@kddubey now that #5257 is merged do you think we can move forward with this?

@yuvmen yuvmen requested a review from kddubey March 13, 2026 17:27
@yuvmen yuvmen merged commit 72961bf into master Mar 13, 2026
60 checks passed
@yuvmen yuvmen deleted the cvxluo/spvvxrzymyty branch March 13, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants