fix(rivetkit): restore hibernatable sockets and hydrate serverless starts#4658
Conversation
PR Review: fix(rivetkit): restore hibernatable sockets and hydrate serverless starts
OverviewThis PR fixes two related hibernation issues in the engine actor driver:
It also refactors the test suite to use retry-based polling (readAfterSleepCycle) instead of fixed waitFor delays, and updates gateway routing test paths from /api/hello to /request/api/hello. actor-driver.tsGood patterns:
Issues and suggestions: 1. Error path leaves dangling binding in bindHibernatableConnectSocket - If routeWebSocket throws, the binding is stored in hibernatableConnectBindings with no listeners. Consider wrapping in try/catch and calling deleteHibernatableConnectBinding + websocket.close on failure. 2. changed mutation inside Promise.all - Setting changed = true inside async callbacks is safe but non-obvious. Consider deriving it after Promise.all resolves: commands.some((c, i) => c !== decoded.val[i]). 3. O(n) scan in rebind methods - Both rebindHibernatableConnectSockets and rebindDynamicHibernatableRunnerWebSockets filter all bindings for a single actorId. With many actors this is O(n). Consider indexing by actorId. 4. Stale proxyToActorWs between rebinds - When isRestoringHibernatable and event.reason === dynamic.runtime.disposed, onProxyClose returns early. The old proxyToActorWs may remain open between rebinds. 5. Logging inconsistency - One log site uses err: instead of error: (around the dynamic hibernating requests restore block). All other log sites use error:. Test changes (actor-sleep.ts, actor-conn-hibernation.ts)Good patterns:
Minor observations: 1. waitForHibernatableRegistration magic number - The 100ms wait is undocumented. A brief comment explaining what it is waiting for would help future readers. 2. readAfterSleepCycle retries on all errors - The function always retries maxAttempts times regardless of error type. For permanent failures this wastes time. 3. alarms keep actor awake test semantic change - The original asserted exactly sleepCount === 1, verifying that the alarm delayed the first sleep. The new version asserts sleepCount >= 1. Probably fine, but a comment noting this trade-off would be helpful. gateway-routing.tsAll URL changes consistently add a /request/ prefix segment. No issues, but the PR description could note what router change this corresponds to. |
26f98bc to
fde1e0b
Compare
ccc38b5 to
8293235
Compare
fde1e0b to
789b9cd
Compare
8293235 to
1ba6a3b
Compare
1ba6a3b to
de87a8b
Compare

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: