Skip to content

fix(rivetkit): fix run workflow teardown#4239

Closed
NathanFlurry wants to merge 1 commit into02-19-feat_rivetkit_caninvokefrom
02-19-fix_rivetkit_fix_run_workflow_teardown
Closed

fix(rivetkit): fix run workflow teardown#4239
NathanFlurry wants to merge 1 commit into02-19-feat_rivetkit_caninvokefrom
02-19-fix_rivetkit_fix_run_workflow_teardown

Conversation

@NathanFlurry
Copy link
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@NathanFlurry NathanFlurry mentioned this pull request Feb 19, 2026
11 tasks
@railway-app
Copy link

railway-app bot commented Feb 19, 2026

🚅 Deployed to the rivet-pr-4239 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud ❌ Build Failed (View Logs) Web Feb 19, 2026 at 10:49 pm
frontend-inspector 🕒 Building (View Logs) Web Feb 19, 2026 at 10:49 pm
mcp-hub ✅ Success (View Logs) Web Feb 19, 2026 at 10:49 pm
ladle ❌ Build Failed (View Logs) Web Feb 19, 2026 at 10:48 pm
website 🕒 Building (View Logs) Web Feb 19, 2026 at 10:48 pm

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4239 February 19, 2026 22:48 Destroyed
Copy link
Member Author

NathanFlurry commented Feb 19, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@graphite-app
Copy link
Contributor

graphite-app bot commented Feb 19, 2026

Merge activity

  • Feb 19, 10:48 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Feb 19, 10:49 PM UTC: CI is running for this pull request on a draft pull request (#4240) due to your merge queue CI optimization settings.
  • Feb 19, 10:50 PM UTC: Merged by the Graphite merge queue via draft PR: #4240.

graphite-app bot pushed a commit that referenced this pull request Feb 19, 2026
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
@graphite-app graphite-app bot closed this Feb 19, 2026
@graphite-app graphite-app bot deleted the 02-19-fix_rivetkit_fix_run_workflow_teardown branch February 19, 2026 22:50
@claude
Copy link

claude bot commented Feb 19, 2026

PR Review: fix(rivetkit): fix run workflow teardown

Note: This PR is closed (not merged). Review provided for completeness.

Summary

This PR fixes a real problem: when an actor with a workflow needed to sleep or stop, it would always wait the full runStopTimeout (up to 2000ms in the test) because the workflow's run function returned a never-resolving promise. The fix replaces that with a proper await handle.result that terminates when the workflow engine is done or evicted.


What's Working Well

Core fix is correct. The old pattern of new Promise<never>(() => {}) forced the actor's #waitForRunHandler to always wait until runStopTimeout. The new await handle.result approach allows the run function to terminate naturally when handle.evict() is called via the abort signal.

Abort signal race condition is handled correctly. The pre-check for abortSignal.aborted before adding the listener avoids a missed event if the signal is already aborted by the time workflow() runs. Good defensive code.

#stopCalled guards in instance/mod.ts are correct. They prevent false "unexpected exit" crashes when the run handler terminates as part of a normal stop.

Indentation cleanup in the run function signature is a welcome fix.


Issues to Address

1. Noisy error logging on every normal stop (medium)

When an actor with a workflow sleeps or is destroyed, the sequence is:

  1. abortController.abort() fires
  2. handle.evict() is called
  3. handle.result rejects with an eviction error
  4. The catch block logs "workflow run failed" at log.error level
  5. The error is rethrown and eventually swallowed in the #stopCalled guard

This means every time a workflow actor goes to sleep, a spurious error line appears in the logs. The old waitUntil path caught this silently. Suggested fix in workflow/mod.ts:

} catch (error) {
    if (!runCtx.abortSignal.aborted) {
        runCtx.log.error({
            msg: "workflow run failed",
            error: stringifyError(error),
        });
    }
    throw error;
}

2. Missing comment on sleepTimeout: 75 in test fixture (minor)

The workflowStopTeardownActor uses sleepTimeout: 75 and runStopTimeout: 2_000. The contrast is intentional (proving the fix works), but a short comment would help future readers understand why these specific values were chosen.

3. sleepAts assertion could be tighter (minor)

The assertions wakeAts.length >= 2 and sleepAts.length >= 1 are consistent, but sleepAts.length >= wakeAts.length - 1 would more directly express the expected pairing relationship.


Test Coverage

The new test effectively validates the fix: checking that the first sleep happens within 1000ms would clearly fail without the fix (where sleep would take >=2000ms due to runStopTimeout). The test is correctly gated behind skip?.sleep.


Bottom Line

The fix is sound and solves a real performance problem. The main issue worth addressing before merge is the log.error call in the catch block — it will produce misleading error logs on every normal workflow actor sleep in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments