Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 20, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 20, 2025 7:59am
rivet-inspector Ignored Ignored Preview Nov 20, 2025 7:59am
rivet-site Ignored Ignored Preview Nov 20, 2025 7:59am
rivetkit-serverless Skipped Skipped Nov 20, 2025 7:59am

Copy link
Member Author

NathanFlurry commented Nov 20, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 20, 2025

Code Review

Summary

This PR fixes a critical bug where actors with CrashPolicy::Sleep were being incorrectly put to sleep even on clean shutdowns (when a runner shuts down gracefully). The fix changes the match logic to handle both crash and clean shutdown scenarios appropriately.

Analysis

Previous Behavior (Bug):

  • The code matched on (input.crash_policy, failed) tuple
  • For CrashPolicy::Sleep, only the (CrashPolicy::Sleep, true) arm was handled
  • When failed = false (clean shutdown, e.g., runner shutdown), the code fell through to the wildcard _ pattern at line 695
  • This wildcard arm calls SetCompleteInput and returns StoppedResult::Destroy, which destroys the actor instead of putting it to sleep

New Behavior (Fix):

  • Now matches only on input.crash_policy
  • The CrashPolicy::Sleep arm handles both failed = true (crash) and failed = false (clean shutdown)
  • Both scenarios now correctly put the actor to sleep
  • Different log messages distinguish between crash and clean shutdown scenarios

Code Quality: ✅ Excellent

Strengths:

  1. Correct logic fix: Aligns behavior with the intended semantics of CrashPolicy::Sleep - actors should sleep regardless of whether they crashed or stopped cleanly
  2. Improved observability: Added helpful logging to distinguish between crash-induced sleep vs clean shutdown sleep
  3. Clean implementation: Simplified the match statement by removing the unnecessary tuple pattern
  4. Follows conventions: Uses structured logging with ?actor_id as per CLAUDE.md guidelines

Potential Issues: ⚠️ Minor Considerations

  1. Test Coverage: There are no test files in the pegboard package. This critical actor lifecycle logic would benefit from integration tests covering:

    • Actor with CrashPolicy::Sleep stopping due to crash (StopCode != Ok)
    • Actor with CrashPolicy::Sleep stopping cleanly (StopCode::Ok)
    • Actor with CrashPolicy::Sleep stopping due to runner shutdown
    • Verification that state.sleeping = true is set in all cases
    • Verification that actors are NOT destroyed
  2. Documentation: Consider adding a comment explaining the semantic difference between:

    • CrashPolicy::Restart: Reschedule immediately (regardless of clean/crash)
    • CrashPolicy::Sleep: Put to sleep (regardless of clean/crash)
    • CrashPolicy::Destroy: Destroy (handled by wildcard)

    This would help future maintainers understand the intent.

Performance: ✅ No Impact

The change simplifies the match logic and has no performance implications.

Security: ✅ No Concerns

No security implications from this change.

Recommendation: ✅ Approve with Suggestion

The fix is correct and addresses a critical bug. The code quality is excellent. I recommend:

  1. Short-term: Merge this fix as-is since it resolves the immediate issue
  2. Medium-term: Add integration tests to prevent regression
  3. Optional: Add explanatory comments about crash policy semantics

Great catch on this bug! The PR title accurately describes the fix, and the implementation is clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants