Skip to content

fix(rivetkit): minor sleep-cleanup follow-ups#4761

Draft
NathanFlurry wants to merge 1 commit intosleep-cleanup/15-stale-prevent-sleep-refsfrom
sleep-cleanup/16-minor-fixes
Draft

fix(rivetkit): minor sleep-cleanup follow-ups#4761
NathanFlurry wants to merge 1 commit intosleep-cleanup/15-stale-prevent-sleep-refsfrom
sleep-cleanup/16-minor-fixes

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 24, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review: fix(rivetkit): minor sleep-cleanup follow-ups

Overview

This PR contains four targeted cleanups to the sleep/destroy lifecycle:

  1. `context.rs` — Consolidates destroy teardown to call `mark_destroy_requested()` instead of inlining the same sequence.
  2. `sleep.rs` — Improves two tests: exhaustive `CanSleep` match and a more realistic cancellation-propagation test.
  3. `task.rs` — Bug fix: serialization timeout cap now honors user-configured `sleepGracePeriod`.
  4. `native.ts` — Fix ordering of `destroy()` vs `markNativeDestroyRequested` to surface errors before mutating state.

`context.rs` — Consolidation via `mark_destroy_requested()`

Good change. The refactoring correctly consolidates the inline sequence (`cancel_sleep_timer`, `flush_on_shutdown`, `destroy_completed.store(false)`, `abort_signal.cancel()`) into the shared `mark_destroy_requested()` helper (verified at L465-470: it contains all four operations). This eliminates a drift risk.

Minor comment nit: The comment says "`destroy_requested` is already true from the swap above; the redundant `store(true)` inside is harmless." However, `mark_destroy_requested()` stores `false` into `destroy_completed`, not `true`. The "store(true)" in the comment refers to the `destroy_requested` flag — technically accurate but potentially confusing for a reader who looks at the implementation and sees a `store(false)` call. Consider clarifying: "`destroy_requested` is already true from the check above; `mark_destroy_requested` will set it again, which is harmless."


`sleep.rs` — Test improvements

Cancellation-propagation test (`shutdown_deadline_token_aborts_select_awaiting_task`): Strong improvement. The old test only checked `token.is_cancelled()`, which any trivially correct implementation satisfies. The new test spawns a real `tokio::select!` task that blocks on `pending::<()>()`, meaning it only exits when cancellation propagates to the cloned token. The `yield_now().await` pre-check is a good guard against false positives.

Exhaustive match in `set_prevent_sleep_is_a_deprecated_noop`: The compile-time protection is clever — adding a new `CanSleep` variant without updating this match will be a build error rather than a silent pass. However, there is a test quality regression: the original code asserted `can_sleep().await == CanSleep::Yes` (verifying the specific expected behavior), while the new exhaustive match accepts any return value. The test no longer verifies that `set_prevent_sleep(true)` has no effect on the outcome. Consider combining both goals:

```rust
let result = ctx.can_sleep().await;
// Exhaustive match guards against reintroducing a PreventSleep variant at compile time.
match result {
CanSleep::Yes => {} // expected: prevent_sleep is a no-op
CanSleep::NotReady
| CanSleep::NoSleep
| CanSleep::ActiveHttpRequests
| CanSleep::ActiveKeepAwake
| CanSleep::ActiveInternalKeepAwake
| CanSleep::ActiveRunHandler
| CanSleep::ActiveDisconnectCallbacks
| CanSleep::ActiveConnections
| CanSleep::ActiveWebSocketCallbacks => panic!("expected CanSleep::Yes, got {result:?}"),
}
```


`task.rs` — Timeout cap fix

Genuine bug fix. An actor with a `sleepGracePeriod` longer than `SERIALIZE_STATE_SHUTDOWN_SANITY_CAP` (15s) would silently cap its `serializeState` timeout at 15s even though the user allocated more budget, leading to empty-delta saves within the user's own grace window. The `.max()` fix is correct.

Improved log message: Adding `actor_id` and `cap_ms` to the timeout error is a good observability improvement consistent with the structured logging conventions.

One concern: The cap is now raised for all stop reasons (destroy/stop, not only sleep) since `effective_sleep_grace_period()` is read unconditionally. If a destroy is triggered and the actor has a large `sleepGracePeriod`, the destroy path inherits the larger serialization timeout. This may be intentional (large state writes need the budget regardless of stop reason), but worth explicitly verifying — destroy already has its own independent timeout (`on_destroy_timeout`) that is separate from the sleep grace period.


`native.ts` — Destroy ordering fix

Correct fix. Calling `callNativeSync(() => this.#ctx.destroy())` before `markNativeDestroyRequested(ctx)` ensures that if the native destroy throws (`actor/starting` or `actor/stopping`), the `destroyCompletion` promise is never created and `destroyGate` is not left with stale state. The prior ordering could leak an unresolved Promise if the native call threw synchronously. The comment is clear and the reasoning is sound.


Summary

Change Assessment
`context.rs` consolidation Correct; minor comment clarity nit
`sleep.rs` cancellation test Strong improvement over the original
`sleep.rs` exhaustive match Lost behavioral assertion — suggest keeping `CanSleep::Yes` as the expected arm
`task.rs` timeout cap fix Real bug fix; verify semantics for non-sleep stop reasons
`native.ts` destroy ordering Correct fix

Overall this is a solid set of targeted cleanups. The main actionable items are restoring the `CanSleep::Yes` assertion in the prevent-sleep test, and confirming the destroy timeout cap semantics are intentional.

@github-actions
Copy link
Copy Markdown
Contributor

Preview packages published to npm

Install with:

npm install rivetkit@pr-4761

All packages published as 0.0.0-pr.4761.c7ed968 with tag pr-4761.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-c7ed968
docker pull rivetdev/engine:full-c7ed968
Individual packages
npm install rivetkit@pr-4761
npm install @rivetkit/react@pr-4761
npm install @rivetkit/rivetkit-napi@pr-4761
npm install @rivetkit/workflow-engine@pr-4761

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant