Return replay retry state from orchestration recovery#1728
Return replay retry state from orchestration recovery#1728juliusmarminge merged 3 commits intomainfrom
Conversation
- Distinguish replay progress from retry eligibility - Cover retry and no-op replay cases in tests
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Caller treats object return value as boolean
- Changed the truthiness check on completeReplayRecovery() to access .shouldReplay on the returned ReplayRecoveryCompletion object, so replay recovery is only triggered when shouldReplay is true.
Or push these changes by commenting:
@cursor push 4d69809910
Preview (4d69809910)
diff --git a/apps/web/src/routes/__root.tsx b/apps/web/src/routes/__root.tsx
--- a/apps/web/src/routes/__root.tsx
+++ b/apps/web/src/routes/__root.tsx
@@ -440,7 +440,7 @@
return;
}
- if (!disposed && recovery.completeReplayRecovery()) {
+ if (!disposed && recovery.completeReplayRecovery().shouldReplay) {
void recoverFromSequenceGap();
}
};You can send follow-ups to the cloud agent here.
ApprovabilityVerdict: Needs human review This PR introduces new runtime behavior for orchestration recovery: exponential backoff delays and capped retry attempts. While well-tested and authored by the module's owner, the changes affect when and how often the app retries during sequence gap recovery, warranting human review. You can customize Macroscope's approvability policy. Learn more. |
- wait briefly when replay recovery makes no progress - retry sequence-gap recovery after replay completion
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Unbounded retry loop when replay consistently makes no progress
- Reset highestObservedSequence to latestSequence when replay makes no progress, so stale observations no longer permanently satisfy the observedAhead condition and cause infinite retries.
Or push these changes by commenting:
@cursor push 89821febb3
Preview (89821febb3)
diff --git a/apps/web/src/orchestrationRecovery.test.ts b/apps/web/src/orchestrationRecovery.test.ts
--- a/apps/web/src/orchestrationRecovery.test.ts
+++ b/apps/web/src/orchestrationRecovery.test.ts
@@ -65,7 +65,7 @@
});
});
- it("retries replay when no progress was made but higher live sequences were observed", () => {
+ it("does not retry replay when no progress was made even if higher live sequences were previously observed", () => {
const coordinator = createOrchestrationRecoveryCoordinator();
coordinator.beginSnapshotRecovery("bootstrap");
@@ -75,11 +75,11 @@
expect(coordinator.completeReplayRecovery()).toEqual({
replayMadeProgress: false,
- shouldReplay: true,
+ shouldReplay: false,
});
expect(coordinator.getState()).toMatchObject({
latestSequence: 3,
- highestObservedSequence: 5,
+ highestObservedSequence: 3,
pendingReplay: false,
inFlight: null,
});
diff --git a/apps/web/src/orchestrationRecovery.ts b/apps/web/src/orchestrationRecovery.ts
--- a/apps/web/src/orchestrationRecovery.ts
+++ b/apps/web/src/orchestrationRecovery.ts
@@ -130,6 +130,9 @@
replayStartSequence !== null && state.latestSequence > replayStartSequence;
replayStartSequence = null;
state.inFlight = null;
+ if (!replayMadeProgress) {
+ state.highestObservedSequence = state.latestSequence;
+ }
const replayResolution = resolveReplayNeedAfterRecovery();
return {
replayMadeProgress,You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 5c18bad. Configure here.
- Add retry tracking for stalled orchestration replay - Reset retry budget when the replay frontier changes - Log when replay recovery stops after exhausting retries - Co-authored-by: codex <codex@users.noreply.github.com>
…-retry-state Merge upstream: Return replay retry state from orchestration recovery (pingdotgg#1728)


Summary
completeReplayRecovery()to return structured replay outcome data instead of a bare boolean.Testing
apps/web/src/orchestrationRecovery.test.tsto assert the new completion shape and retry cases.bun fmtbun lintbun typecheckbun run testNote
Medium Risk
Changes orchestration replay recovery control flow and retry behavior in the app root event router, which can impact client/server state synchronization during sequence gaps. Risk is mitigated by added unit tests but could still affect recovery edge cases (e.g., no-progress replays, disposal timing).
Overview
Replay recovery now returns structured completion data instead of a boolean.
completeReplayRecovery()returns{ replayMadeProgress, shouldReplay }, separating “did the replay advance the sequence” from “do we still need another replay.”Adds bounded, backoff-based retries for no-progress replays. New
deriveReplayRetryDecisiontracks consecutive no-progress attempts for the same replay frontier, retries with exponential delays (100ms base) up to a max (3), resets the budget when the frontier changes, and logs a warning when stopping early.Updates callers and tests.
EventRouteruses the new completion shape and retry decision (including clearing the tracker on replay failure), andorchestrationRecovery.test.tsis expanded to assert the new return type and retry/stop behavior.Reviewed by Cursor Bugbot for commit cb1880f. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Return structured replay retry state with exponential backoff from orchestration recovery
completeReplayRecoverynow returns aReplayRecoveryCompletionobject ({ replayMadeProgress, shouldReplay }) instead of a boolean, and no longer unconditionally clearspendingReplayon no-progress completion.deriveReplayRetryDecisionin orchestrationRecovery.ts to compute whether to retry a replay and with what delay: immediate retry on progress, capped exponential backoff (base 100ms, up to 3 attempts) when there is no progress on the same frontier, and budget reset when the frontier changes.completeReplayRecoverytriggered an immediate retry unconditionally; now retries are budgeted and backoff-gated when no progress is observed.Macroscope summarized cb1880f.