Skip to content

fix(pegboard): restore hibernating requests on serverful start#4904

Closed
NathanFlurry wants to merge 1 commit intodriver-test-complaints/fix-shutdown-drainfrom
driver-test-complaints/restore-serverful-hibernating-requests
Closed

fix(pegboard): restore hibernating requests on serverful start#4904
NathanFlurry wants to merge 1 commit intodriver-test-complaints/fix-shutdown-drainfrom
driver-test-complaints/restore-serverful-hibernating-requests

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 3, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 3, 2026

PR Review: fix(pegboard): restore hibernating requests on serverful start

Overview

This PR fixes a bug where hibernating request IDs weren't being restored when a serverful actor starts during envoy reconnect. It introduces a new hibernating_requests module in pegboard-envoy with a reusable hydrate_command_wrapper function, and applies it in two places:

  1. conn.rs — the replay path: hydrates missed commands when an envoy reconnects and replays queued startup commands
  2. tunnel_to_ws_task.rs — the live path: hydrates commands as they arrive via the tunnel

The approach mirrors the existing logic in the deprecated pegboard-runner, now factored into a shared module.


Code Quality

  • The new module is clean and well-scoped. The if let CommandStartActor guard is safe — other command types pass through with Ok(()) immediately.
  • Splitting the hydration logic into its own module avoids duplication between the two call sites and matches the pattern already present in pegboard-runner/src/tunnel_to_ws_task.rs.
  • Error propagation in both call sites is correct: hydration failure aborts the loop and triggers shutdown_conn_actors in the conn replay path.

Issues

Stale comments in engine/packages/pegboard/src/workflows/actor/runtime.rs

Six locations in the actor v1 runtime still carry the old comment:

// Empty because request ids are ephemeral. This is intercepted by guard and
// populated before it reaches the runner

The actor2 runtime comment was updated correctly in this PR, but actor v1 was not. Per CLAUDE.md conventions, comments should not document deltas and should reflect the current architecture. These should be updated to match the actor2 wording:

"Request ids are ephemeral. Pegboard-envoy refreshes this on the WebSocket send path immediately before the actor start reaches envoy."

Affected lines: 729–730, 753–754, 845–846, 868–869, 980–981, 1003–1004


Performance Notes

The existing // TODO: Parallelize comment in tunnel_to_ws_task.rs (line 127) applies here: each CommandStartActor in a batch now triggers a separate FDB range scan sequentially. This is a pre-existing concern, not introduced by this PR, but the new DB call makes batched starts slightly more expensive.


Minor Suggestions

The doc comment on hydrate_command_wrapper mentions why stored serverful commands are empty, but doesn't explain why the live tunnel path also needs hydration (i.e., that the guard no longer fills this in on the envoy path). A brief note there would help future readers understand why hydration is needed in both call sites.


Summary

The fix is correct and the code is well-structured. The only item worth addressing before merge is updating the stale actor v1 comments to keep the architecture description consistent across the codebase. No blocking correctness issues.

@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/restore-serverful-hibernating-requests branch from 3444072 to 4ade2d1 Compare May 3, 2026 07:12
@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/close-sqlite-on-shutdown branch from 6aed029 to 9bf8eb7 Compare May 3, 2026 07:12
@NathanFlurry NathanFlurry marked this pull request as draft May 3, 2026 07:19
@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/close-sqlite-on-shutdown branch from 9bf8eb7 to c4e495b Compare May 3, 2026 07:34
@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/restore-serverful-hibernating-requests branch from 4ade2d1 to 4931c2e Compare May 3, 2026 07:34
@NathanFlurry NathanFlurry changed the base branch from driver-test-complaints/close-sqlite-on-shutdown to graphite-base/4904 May 3, 2026 21:03
@NathanFlurry NathanFlurry force-pushed the driver-test-complaints/restore-serverful-hibernating-requests branch from 4931c2e to f6d0558 Compare May 3, 2026 21:03
@NathanFlurry NathanFlurry force-pushed the graphite-base/4904 branch from c4e495b to ec8b474 Compare May 3, 2026 21:03
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4904 to driver-test-complaints/fix-shutdown-drain May 3, 2026 21:03
This was referenced May 4, 2026
@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant