-
Notifications
You must be signed in to change notification settings - Fork 135
fix(engine): ensure first keepalive tick is not skipped to prevent timeout on ws hibernation #3507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
Code Review: WebSocket Hibernation Keepalive FixSummaryThis PR addresses a critical bug where the first keepalive tick was being skipped in the WebSocket hibernation mechanism, which could cause premature timeouts. The fix ensures that keepalive updates happen immediately and consistently throughout the hibernation period. ✅ Positive Changes
|
8d11332 to
79e5409
Compare
…meout on ws hibernation
79e5409 to
7e275b0
Compare
Code ReviewSummaryThis PR fixes a WebSocket hibernation timeout issue by ensuring the first keepalive tick is not skipped and refactors the MessageId type from a binary format to a structured type. Overall, the changes are solid and address a real issue, but there are a few concerns to address. Critical Issues1. Missing Space in Comparison Operator (packages/pegboard-gateway/src/shared_state.rs:521)if hs_elapsed> hibernation_timeout {Missing space before if hs_elapsed > hibernation_timeout {2. Race Condition Risk (packages/pegboard-gateway/src/lib.rs:581-590)The removal of the upfront Before: The hibernating request entry was created immediately, ensuring it existed even if the actor rewoke quickly. After: If there are pending messages, the function returns early without creating the entry, and the keepalive task (which creates the entry) hasn't started yet. Risk: If an actor hibernates, immediately rewakes due to pending messages, then hibernates again quickly, the second hibernation might not have a valid keepalive entry yet. Recommendation: Consider keeping the Design Questions3. MessageId Schema Change ImplicationsThe change from Questions:
4. Keepalive Timing BehaviorThe PR removes the initial Analysis:
Question: Is the immediate first tick intentional? The comment says "first keepalive tick is not skipped" but doesn't explain why we want it to fire immediately. Consider adding a comment explaining the reasoning (e.g., to establish the hibernating request entry ASAP). Code Quality5. Debug Logging Added (packages/pegboard-gateway/src/keepalive_task.rs:45-50)✅ Good addition for debugging hibernation issues. 6. Removed Unused File (engine/sdks/typescript/runner/src/tunnel-id.ts)✅ Clean removal of 104 lines of now-unused code. 7. Consistent ID Formatting✅ Good refactoring to use 8. Import OrganizationMinor inconsistency: Some files have imports reordered (e.g., Performance Considerations9. String Allocation in Hot PathIn tracing::debug!(
hs_elapsed=%hs_elapsed.as_secs_f64(),
timeout=%hibernation_timeout.as_secs_f64(),
"checking hibernating state elapsed time"
);Impact: This runs in the GC loop which runs every 15 seconds. The impact is likely negligible, but consider using the Security ConsiderationsNo security concerns identified. The changes are primarily internal refactoring and timing adjustments. Test CoverageMissing: No test changes or new tests added. Recommendations:
Style & ConventionsFollowing CLAUDE.md conventions:
RecommendationsMust Fix:
Should Consider: Nice to Have: Overall: Approve with minor changes required. The core fix is sound, but please address the spacing issue and clarify the race condition safety before merging. 🤖 Generated with Claude Code |
Merge activity
|
…meout on ws hibernation (#3507)

No description provided.