Skip to content

feat(seer): Poll search agent using UUID instead of numeric run ID#115307

Open
trevor-e wants to merge 32 commits into
masterfrom
telkins/seer-run-outbox-frontend
Open

feat(seer): Poll search agent using UUID instead of numeric run ID#115307
trevor-e wants to merge 32 commits into
masterfrom
telkins/seer-run-outbox-frontend

Conversation

@trevor-e
Copy link
Copy Markdown
Member

@trevor-e trevor-e commented May 11, 2026

Summary

  • Updates the frontend polling hook (useAskSeerPolling) to use sentry_run_id (UUID) from the start response instead of the numeric run_id
  • Backward compatible: falls back to run_id when sentry_run_id is not present (feature flag off)

Test plan

  • TypeScript typecheck passes
  • ESLint passes
  • E2E tested locally: start response returns both IDs, frontend polls using UUID, results display correctly

trevor-e and others added 27 commits May 7, 2026 15:55
Register an outbox category and cell receiver to handle SeerRun
creation via the hybrid cloud outbox system.
…flag

Rename idempotency_key to external_idempotency_key in the SeerRun outbox
receiver to match the field name Seer's request models expect. Register
the organizations:seer-run-mirror feature flag for future write-site gating.
Wrap response.json() in the SeerRun outbox receiver in a JSONDecodeError guard. A 2xx response with a malformed body would otherwise raise uncaught and trap the outbox row in indefinite retry. Treat it as terminal, matching how 4xx is handled.

Also declare external_idempotency_key on AgentChatRequest and SearchAgentStartRequest and cast the receiver bodies to those TypedDicts so the call signatures type-check.
The previous fix imported JSONDecodeError via sentry.utils.json (which re-exports simplejson). urllib3 BaseHTTPResponse.json() raises stdlib json.JSONDecodeError, an unrelated class. The except clause never matched, so a 2xx with malformed body would still propagate uncaught and trap the outbox row in indefinite retry. Switch to stdlib json with a noqa for the S003 rule.
Match the immediate-neighbor convention (.get() + try/except DoesNotExist) for the missing-row check, and place handle_seer_run_create at the end of cell.py so newest receivers append rather than prepend.
Match against SeerRunType(run.type) instead of the raw str field, then call assert_never on the default branch. mypy now flags any new SeerRunType variant that does not have a case in handle_seer_run_create at compile time.
Behind the organizations:seer-run-mirror flag, the search-agent endpoint
now creates a SeerRun + CellOutbox row in a transaction. The receiver
fires on commit (flush=True), makes the HTTPS call to Seer with
run.uuid as external_idempotency_key, and fills in seer_run_state_id.
Synchronous flush preserves the existing endpoint contract: the
endpoint still returns run_id from the response.

The other write sites (start_run, autofix, PR review, replay) follow in
their own PRs.
Push the seer-run-mirror flag check inside send_search_agent_start_request
so the endpoint has a single call site that returns run_id directly.
Eliminates the parallel start_search_agent_via_outbox helper and the
flag-aware branching in the endpoint, and inlines the body construction
that previously lived in _build_search_agent_body.
…al handling

Wrap payload extraction and SeerRunType parsing in a single try/except so
malformed outbox rows mark the run FAILED instead of crashing the receiver
and stalling the queue. Extract a small _mark_seer_run_failed helper
shared by the three terminal-failure sites (invalid payload, 4xx response,
2xx with malformed JSON body).
The previous refactor that folded the flag dispatch into
send_search_agent_start_request dropped the search_agent.missing_run_id
error log along with its organization/project/response_data context.
Restore it inside send_search_agent_start_request before the SeerApiError
is raised.
Add a proper create_seer_run factory method to Factories/Fixtures so
SeerRun test instances use the standard test helper pattern. Remove
test_passes_idempotency_key which tested an implementation detail
(single-line dict merge) already covered by the happy-path tests.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
… guard

Two review fixes in the SeerRun outbox receiver:
- PR_REVIEW raised NotImplementedError, which the outbox treats as
  transient and retries forever. Mark the run FAILED and return instead
  until PR_REVIEW dispatch is wired.
- The idempotency early-return used truthiness on seer_run_state_id,
  which would re-issue the Seer request for the (legal) value 0. Compare
  against None explicitly.
A structurally valid 2xx Seer response that lacks a run_id field won't self-heal on retry — same terminal class as the malformed-JSON case immediately above. Mark the run FAILED and return instead of raising RuntimeError, which the outbox treats as transient and would retry indefinitely.
dict(viewer_context or {}) coerced None into an empty dict, which the receiver would then forward to _resolve_viewer_context as a non-None ViewerContext with null fields — triggering JWT signing instead of being skipped. Preserve None when the caller passes None so the downstream skip path stays intact for future write sites.
urllib3's BaseHTTPResponse.json() raises UnicodeDecodeError for non-UTF-8 bodies in addition to json.JSONDecodeError. Both are terminal: a non-UTF-8 binary body from a misbehaving proxy won't self-heal on retry. Catch both in the same except clause.
Remove the two terminal-case comments that just restated the function flow, and move _mark_seer_run_failed below handle_seer_run_create per the public-then-private convention.
Caller saw an opaque OutboxFlushError when the synchronous drain failed; the endpoint's existing SeerApiError handler is a better fit. Same translation pattern token_exchange/{manual_,}refresher.py uses for the same reason. The async outbox retry will heal the mirror state separately.
When the synchronous drain fails, the SeerRun row is already committed and the async outbox retry will eventually heal it. Surface the run uuid as a retry_token in the 500 response so future frontend logic can resume that same run instead of creating a duplicate via a fresh idempotency key. No client changes today; the field is forward-compatible.
If response.json() returns a list/scalar/null instead of an object, data.get('run_id') would raise AttributeError and stall the outbox shard on retries. Add an isinstance check inside the existing try so non-dict bodies route through the same invalid_json_body terminal path as undecodable bodies.
The async outbox drain (flush=False) relied on a cron that runs every
minute, which is too slow for an interactive feature. Use flush=True so
the Seer call happens immediately on transaction commit. Catch
OutboxFlushError gracefully and still return the SeerRun UUID so the
frontend can poll.
Use sentry_run_id (UUID) from the start response for polling when
available, avoiding exposure of sequential Seer run IDs to the frontend.
@github-actions github-actions Bot added the Scope: Frontend Automatically applied to PRs that change frontend components label May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

📊 Type Coverage Diff

✅ No new type safety issues introduced. Coverage: 93.51%

Base automatically changed from telkins/seer-run-outbox to master May 13, 2026 16:07
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

Rename effectiveRunId to newRunId and assert that at least one of
sentry_run_id or run_id is present, matching the prior contract where
run_id was always required.

Co-Authored-By: Claude <noreply@anthropic.com>
Avoids confusion with the model's autoincrement ID. Matches the field
name already used in the autofix and explorer chat responses.

Co-Authored-By: Claude <noreply@anthropic.com>
@trevor-e trevor-e marked this pull request as ready for review May 13, 2026 23:43
@trevor-e trevor-e requested review from a team as code owners May 13, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant