fix(rivetkit-core): return stopping not starting when sleep/destroy called mid-shutdown#4757
Conversation
Code Review - PR 4757 (sleep/destroy mid-shutdown fix)This is a focused, well-scoped fix. The root cause is clearly identified: sleep.started is used as a proxy for two distinct states and the old code could not distinguish "never started" from "already shutting down", returning a misleading actor/starting error to callers who called sleep() or destroy() mid-shutdown. Correctness - sleep() fix (context.rs): The disambiguation logic is sound. When started=false, reading sleep_requested || destroy_requested is sufficient to detect the in-flight-shutdown case. The swap(true) later in sleep() still guards against a second sleep() call in the same codepath, so there is no double-accept risk. Correctness - destroy() fix (context.rs): The new guard allows destroy() to proceed when the actor has already entered sleep-shutdown (sleep_requested=true, started=false). The comment explaining "destroy is a stronger signal" is accurate and matches the existing sleep_then_destroy_signal_tasks_do_not_leak_after_teardown test. If the lifecycle machine has cleared started and both request flags are still false but mark_destroy_requested() has already been called, the guard correctly passes because mark_destroy_requested sets destroy_requested = true. Test coverage: The new test sleep_after_grace_clears_started_returns_stopping_not_starting directly covers the sleep() regression. It correctly simulates the real lifecycle sequence: set started=true, call sleep() (sets sleep_requested=true), clear started=false (simulating transition_to(SleepGrace)), then assert second sleep() returns actor/stopping not actor/starting. Missing parallel test for destroy(): There is a test for destroy() called before startup (destroy_before_started_errors_with_actor_starting) and double_destroy_errors_with_actor_stopping, but no test for the symmetric case that was also changed: calling destroy() mid-shutdown (after started is cleared). A test like destroy_after_grace_clears_started_succeeds_as_escalation would give confidence that the extension is exercised and does not accidentally accept destroy() in a fully-terminated state. Test placement: Per CLAUDE.md, Rust tests should live under tests/, not inline in src/. The new test is added to the inline mod tests at the bottom of src/actor/sleep.rs rather than to tests/modules/sleep.rs. Existing lifecycle tests are also inline, so this may be an accepted exception because new_for_sleep_tests and set_sleep_started are pub(crate) helpers only reachable from the same crate. If these helpers can be made reachable from tests/, the new test should move there for consistency. Ordering consistency: The existing sleep() path uses SeqCst for all three atomic loads/swaps. The new destroy() guard also uses SeqCst. Ordering is consistent. Summary: The fix is correct and the test demonstrates the exact bug. Actionable feedback: (1) Add a parallel test for destroy() mid-shutdown where started=false and sleep_requested=true should allow destroy() to succeed, not return Starting. (2) Consider moving the new test and related sleep lifecycle tests to tests/modules/sleep.rs per CLAUDE.md guidance. |
…alled mid-shutdown
c7f9133 to
ace5f49
Compare
1939547 to
68ddecb
Compare
Preview packages published to npmInstall with: npm install rivetkit@pr-4757All packages published as Engine binary is shipped via Docker images: docker pull rivetdev/engine:slim-2ad2d5d
docker pull rivetdev/engine:full-2ad2d5dIndividual packagesnpm install rivetkit@pr-4757
npm install @rivetkit/react@pr-4757
npm install @rivetkit/rivetkit-napi@pr-4757
npm install @rivetkit/workflow-engine@pr-4757 |

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: