fix: backfill rollout status fields from logs when polling completes by SunnySoldier357 · Pull Request #451 · eval-protocol/python-sdk

SunnySoldier357 · 2026-05-06T00:11:51Z

Summary

Fireworks Tracing Tests is red on main and on every PR — see run 25409077579. The failure is a regression introduced by #446:

ERROR:root:❌ Rollout failed (non-retryable error encountered): InternalError()
    assert row.rollout_status.message == "test error"
E   AssertionError: assert '' == 'test error'

The lightweight /status endpoint on the tracing gateway is a point-read on the Status Spanner table, which only stores RolloutId, AccountId, StatusCode. Message, Details, and Extras still live exclusively on the Logs table. After #446 dropped the /logs backfill (commit 20b0f23), the SDK was constructing Status(code=..., message="", details=[]) on every completed rollout and EvalProtocolError(message="") on every failure — which is what the propagate-status integration test catches.

This PR restores the two-phase polling shape from the original design of #446:

Poll /status for the status code (cheap point-read, runs every poll_interval).
On a terminal (non-RUNNING) code, do exactly one async_search_logs call to backfill message / details / extras from the matching log row.

That's still ~1000× cheaper on the Logs table than the pre-#446 polling loop, because the search runs once per rollout completion instead of every poll interval.

The cleanest long-term fix is on the gateway side: extend either the Status Spanner schema (and spanner_reader.get_status) to also persist these fields, or have the /status handler do an internal Logs read on terminal codes and inline them into the response. Either change would let the SDK go back to a single read per completion. Filing a follow-up for that.

Test plan

Fireworks Tracing Tests workflow goes green on this branch.
test_remote_rollout_and_fetch_fireworks (happy path) still passes.
test_remote_rollout_and_fetch_fireworks_propagate_status sees Code.INTERNAL with message == "test error".

Made with Cursor

Note

Medium Risk
Changes terminal-status handling in RemoteRolloutProcessor by adding a one-time logs lookup and merging returned extras, which affects error propagation and could introduce edge cases if log entries are missing or mismatched.

Overview
Restores two-phase rollout polling in RemoteRolloutProcessor: continue polling /status for the status code, then on a terminal code perform a single async_search_logs query to backfill Status.message, Status.details, and execution extras.

When backfilling, it selects the log entry whose embedded status code matches the terminal /status code (avoiding intermediate RUNNING checkpoints) and filters out noisy extras keys before merging into row.execution_metadata.extra.

^{Reviewed by Cursor Bugbot for commit 3298857. Bugbot is set up for automated code reviews on this repo. Configure here.}

The lightweight `/status` endpoint on the tracing gateway only returns the status code; `Message`, `Details`, and `Extras` still live on the Logs table. After PR #446 stopped reading from `/logs` on terminal status, the SDK was constructing `Status(code=..., message="", details=[])` for every completed rollout and `EvalProtocolError(message="")` for failures, which broke `tests/remote_server/test_remote_fireworks_propagate_status.py` (`assert row.rollout_status.message == "test error"`). Restore the two-phase polling shape from the original PR: poll `/status` for the code, and on a terminal (non-RUNNING) code do one `async_search_logs` call to backfill `message`/`details`/`extras` from the matching log row. This is still ~1000x cheaper on the Logs table than the pre-#446 polling loop because the search runs once per rollout completion instead of every poll interval. Made-with: Cursor Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a4b13417d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Bugbot pointed out that the backfill loop could pick an earlier RUNNING/partial status log instead of the terminal one when a rollout emits multiple status-bearing logs. The reported `code` was always correct (it came from /status), but `message`/`details`/`extras` could be attached from the wrong row and the raised exception would carry misleading text. Match the log row's status code to the terminal code returned by /status so the backfill is deterministic. Made-with: Cursor

SunnySoldier357 self-assigned this May 6, 2026

SunnySoldier357 force-pushed the sandeep/ep-status-polling-fix branch from 1a4b134 to 30f662f Compare May 6, 2026 00:13

chatgpt-codex-connector Bot reviewed May 6, 2026

View reviewed changes

Comment thread eval_protocol/pytest/remote_rollout_processor.py

SunnySoldier357 requested a review from benjibc May 6, 2026 01:00

benjibc approved these changes May 7, 2026

View reviewed changes

SunnySoldier357 merged commit 251ed86 into main May 7, 2026
17 checks passed

SunnySoldier357 deleted the sandeep/ep-status-polling-fix branch May 7, 2026 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: backfill rollout status fields from logs when polling completes#451

fix: backfill rollout status fields from logs when polling completes#451
SunnySoldier357 merged 2 commits intomainfrom
sandeep/ep-status-polling-fix

SunnySoldier357 commented May 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SunnySoldier357 commented May 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SunnySoldier357 commented May 6, 2026 •

edited by cursor Bot

Loading