fix(workflow-engine): prevent loop state from being persisted during rollback replay by NathanFlurry · Pull Request #4386 · rivet-dev/rivet

NathanFlurry · 2026-03-08T22:09:08Z

Description

Fixes a bug where loop entry state (iteration count and state value) was being incorrectly persisted during rollback replay. Previously, the loop's progress was unconditionally updated every historyPruneInterval iterations, even when the engine was replaying history during a rollback. This corrupted the saved loop progress by advancing the persisted iteration counter while merely replaying previously-completed iterations.

The fix separates two concerns: state persistence (which should only happen outside rollback mode) and history pruning (which is safe and desirable regardless of mode).

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

rollback.test.ts: Verifies that when a loop iteration fails and triggers a rollback, the persisted loop iteration count reflects only the progress made before the failure (not the replayed iterations).
sleep.test.ts: Verifies that after a sleep inside a loop, resuming the workflow does not re-execute previous loop iterations.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective

…rollback replay

railway-app · 2026-03-08T22:09:17Z

🚅 Deployed to the rivet-pr-4386 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
mcp-hub	✅ Success (View Logs)	Web	Mar 8, 2026 at 10:10 pm
ladle	❌ Build Failed (View Logs)	Web	Mar 8, 2026 at 10:09 pm
website	🕒 Building (View Logs)	Web	Mar 8, 2026 at 10:09 pm
frontend-inspector	🕒 Building (View Logs)	Web	Mar 8, 2026 at 10:09 pm
frontend-cloud	🕒 Building (View Logs)	Web	Mar 8, 2026 at 10:09 pm

claude · 2026-03-08T22:11:32Z

Code Review

Summary

This PR fixes a real bug: during rollback replay, the loop's persisted iteration and state were being overwritten with replayed values every historyPruneInterval iterations, corrupting the checkpoint. The fix is well-targeted and the separation of concerns (state persistence vs. history pruning) is the right approach.

Core Fix (`context.ts`)

Correctness — looks good.

The original code bundled two unrelated concerns into the same historyPruneInterval block:

Updating the persisted state/iteration on the entry.
Collecting and flushing loop history deletions.

The fix correctly pulls (1) out into a !rollbackMode guard and leaves (2) to run unconditionally. This is exactly right: pruning stale history entries during replay is safe and desirable; updating the persisted iteration counter during replay is not.

Subtle behavior change worth noting.

Before the fix, entry.dirty = true (with updated state/iteration) was only set every historyPruneInterval iterations. After the fix, entry.dirty = true is set on every non-rollback iteration. In practice this doesn't change flush frequency (flushes still only happen at the prune interval via deferredFlush and at loop break), but it does mean the latest state/iteration will be included in any flush that does happen to run. This is strictly more correct — it removes a class of subtle state-loss between prune intervals. Still, worth being aware this is a secondary behavior change.

At loop break — already correct (pre-existing).

The entry.kind.data.iteration = iteration and entry.dirty = true inside the "break" branch (lines 773–778) run unconditionally, including in rollback mode. That looks intentional: when the loop completes (even during a rollback replay path), the final output and metadata need to be persisted.

Tests

rollback.test.ts — the core regression test.

The test is clear and targets the right behavior. One minor observation: it asserts expect(rollbacks).toContain("outside") but doesn't verify the loop-internal rollbacks (loop-0, loop-1). Adding those checks would make it easier for future readers to understand exactly what was (and wasn't) rolled back. Not a blocker, just a suggestion.

The type-narrowing guard pattern used here:

expect(loopEntry?.kind.type).toBe("loop");
if (loopEntry?.kind.type !== "loop") {
  throw new Error("Expected loop entry");
}

is a clean way to satisfy TypeScript without suppressing the assertion — good pattern.

sleep.test.ts — covers a related replay path.

Testing both "yield" and "live" modes is the right call. The yield path (two separate runWorkflow calls across a real timer wait) is the more realistic scenario and the test handles it well with the 30ms wait.

The live mode path is less obviously exercising the same code path (since sleep resolution happens inline), but it still validates the seen side-effect array stays at ["first"].

Minor Nits

The updated comment // Periodically defer the flush so the next iteration can overlap with loop pruning and any pending dirty state writes. is clearer than the original. No issues.
The rollback test reaches Loop.continue(2) in the second run (after shouldFail = false), but the workflow hits ctx.loop which has no Loop.break. The test only asserts on the rejection path, so the Loop.continue(2) branch is never reached — the test effectively just validates the failure + rollback case. This is intentional and fine.

Overall

The fix is minimal, correct, and well-tested. The two tests together cover both the rollback corruption case and the sleep-resume case. This is ready to merge.

fix(workflow-engine): prevent loop state from being persisted during …

7cabb81

…rollback replay

railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4386 March 8, 2026 22:09 Destroyed

NathanFlurry merged commit b8de8f6 into main Mar 8, 2026
13 of 18 checks passed

NathanFlurry deleted the queue-next-timeout branch March 8, 2026 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflow-engine): prevent loop state from being persisted during rollback replay#4386

fix(workflow-engine): prevent loop state from being persisted during rollback replay#4386
NathanFlurry merged 1 commit intomainfrom
queue-next-timeout

NathanFlurry commented Mar 8, 2026

Uh oh!

railway-app bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude bot commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented Mar 8, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

railway-app bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 8, 2026

Code Review

Summary

Core Fix (context.ts)

Tests

Minor Nits

Overall

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railway-app bot commented Mar 8, 2026 •

edited

Loading

Core Fix (`context.ts`)