fix(pegboard): fix actors getting destroyed on runner shutdown #3499

NathanFlurry · 2025-11-20T07:59:05Z

No description provided.

vercel · 2025-11-20T07:59:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Nov 20, 2025 7:59am
rivet-inspector	Ignored	Preview	Nov 20, 2025 7:59am
rivet-site	Ignored	Preview	Nov 20, 2025 7:59am
rivetkit-serverless	Skipped		Nov 20, 2025 7:59am

NathanFlurry · 2025-11-20T07:59:22Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2025-11-20T08:00:46Z

Code Review

Summary

This PR fixes a critical bug where actors with CrashPolicy::Sleep were being incorrectly put to sleep even on clean shutdowns (when a runner shuts down gracefully). The fix changes the match logic to handle both crash and clean shutdown scenarios appropriately.

Analysis

Previous Behavior (Bug):

The code matched on (input.crash_policy, failed) tuple
For CrashPolicy::Sleep, only the (CrashPolicy::Sleep, true) arm was handled
When failed = false (clean shutdown, e.g., runner shutdown), the code fell through to the wildcard _ pattern at line 695
This wildcard arm calls SetCompleteInput and returns StoppedResult::Destroy, which destroys the actor instead of putting it to sleep

New Behavior (Fix):

Now matches only on input.crash_policy
The CrashPolicy::Sleep arm handles both failed = true (crash) and failed = false (clean shutdown)
Both scenarios now correctly put the actor to sleep
Different log messages distinguish between crash and clean shutdown scenarios

Code Quality: ✅ Excellent

Strengths:

Correct logic fix: Aligns behavior with the intended semantics of CrashPolicy::Sleep - actors should sleep regardless of whether they crashed or stopped cleanly
Improved observability: Added helpful logging to distinguish between crash-induced sleep vs clean shutdown sleep
Clean implementation: Simplified the match statement by removing the unnecessary tuple pattern
Follows conventions: Uses structured logging with ?actor_id as per CLAUDE.md guidelines

Potential Issues: ⚠️ Minor Considerations

Test Coverage: There are no test files in the pegboard package. This critical actor lifecycle logic would benefit from integration tests covering:
- Actor with CrashPolicy::Sleep stopping due to crash (StopCode != Ok)
- Actor with CrashPolicy::Sleep stopping cleanly (StopCode::Ok)
- Actor with CrashPolicy::Sleep stopping due to runner shutdown
- Verification that state.sleeping = true is set in all cases
- Verification that actors are NOT destroyed
Documentation: Consider adding a comment explaining the semantic difference between:
- CrashPolicy::Restart: Reschedule immediately (regardless of clean/crash)
- CrashPolicy::Sleep: Put to sleep (regardless of clean/crash)
- CrashPolicy::Destroy: Destroy (handled by wildcard)
This would help future maintainers understand the intent.

Performance: ✅ No Impact

The change simplifies the match logic and has no performance implications.

Security: ✅ No Concerns

No security implications from this change.

Recommendation: ✅ Approve with Suggestion

The fix is correct and addresses a critical bug. The code quality is excellent. I recommend:

Short-term: Merge this fix as-is since it resolves the immediate issue
Medium-term: Add integration tests to prevent regression
Optional: Add explanatory comments about crash policy semantics

Great catch on this bug! The PR title accurately describes the fix, and the implementation is clean.

fix(pegboard): fix actors getting destroyed on runner shutdown

0f48573

NathanFlurry mentioned this pull request Nov 20, 2025

chore(rivetkit): fix hibernation implementation #3494

Open

This was referenced Nov 20, 2025

fix(pegboard-gateway): write hws immediately on socket hibernate #3496

Draft

fix(rivetkit): fix graceful runner shutdown #3497

Draft

fix(pegboard-gateway): ping hibernating requests both during open hws connections and during hibernation #3498

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pegboard): fix actors getting destroyed on runner shutdown #3499

fix(pegboard): fix actors getting destroyed on runner shutdown #3499

NathanFlurry commented Nov 20, 2025

Uh oh!

vercel bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

NathanFlurry commented Nov 20, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(pegboard): fix actors getting destroyed on runner shutdown #3499

Are you sure you want to change the base?

fix(pegboard): fix actors getting destroyed on runner shutdown #3499

Conversation

NathanFlurry commented Nov 20, 2025

Uh oh!

vercel bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Nov 20, 2025

Code Review

Summary

Analysis

Code Quality: ✅ Excellent

Potential Issues: ⚠️ Minor Considerations

Performance: ✅ No Impact

Security: ✅ No Concerns

Recommendation: ✅ Approve with Suggestion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Nov 20, 2025 •

edited

Loading

NathanFlurry commented Nov 20, 2025 •

edited

Loading