-
Notifications
You must be signed in to change notification settings - Fork 135
fix(pegboard-gateway): ping hibernating requests both during open hws connections and during hibernation #3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-19-fix_rivetkit_fix_graceful_runner_shutdown
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
Code ReviewI've reviewed this PR and have the following feedback: OverviewThis PR extracts the keepalive task into a separate module and ensures it runs both during active WebSocket connections and during hibernation. The goal is to prevent hibernating requests from being garbage collected. Positive Changes
Issues and Concerns1. Documentation typo (Minor)Line 13 in keepalive_task.rs - Missing space in comment. Should be "/// Only ran for hibernating requests." instead of "///Only ran for hibernating requests." 2. Potential unnecessary work during active connections (Performance)Lines 480-507 in lib.rs - The keepalive task now runs during active WebSocket connections when can_hibernate is true. Previously, it only ran during actual hibernation (handle_websocket_hibernation). Question: Is it necessary to ping the hibernating_request UDB entry while the connection is still active? This adds database writes during normal operation. The original implementation only ran the keepalive task during actual hibernation (lines 603-616), which seems more efficient. Recommendation: Consider whether the keepalive task should only run during actual hibernation, not during active connections. If it must run during active connections, please document why this change is necessary in the PR description or code comments. 3. Task spawning pattern (Code Quality)Lines 485-491 in lib.rs - The keepalive task is spawned inside a tokio::join! future, which is different from how the other three tasks are spawned (lines 398-418). For consistency, consider spawning it outside the join block conditionally and then awaiting it inside, similar to how tunnel_to_ws, ws_to_tunnel, and ping tasks are handled. 4. Result matching could be more robust (Correctness)Lines 511-523 in lib.rs - The result matching logic prefers the first non-aborted result, but doesn't explicitly handle cases where multiple tasks return non-aborted results. Consider being more explicit about the precedence for different LifecycleResult variants (ServerClose vs ClientClose vs Aborted). 5. Clone variables optimization (Minor Performance)Lines 424-428 in lib.rs - Several clones are performed that may not be necessary since most of these types are cheap to clone (Ids are Copy types, ctx is an Arc), but consider capturing them directly in the async block if possible. Testing Recommendations
Security/SafetyNo security concerns identified. The changes maintain proper error handling and resource cleanup. VerdictThe core fix looks sound, but there are some architectural questions (particularly issue 2 about running during active connections) that should be addressed before merging. The code quality could be improved with more consistent patterns and better documentation of the design decisions. |
4f192f3 to
a11d203
Compare
5552b74 to
815c918
Compare
Code Review - PR #3498SummaryThis PR refactors the keepalive task for hibernating requests by extracting it into a separate module and ensuring it runs both during active WebSocket connections and during hibernation. Overall, the changes look good and follow the repository's patterns well. Positive Observations✅ Good refactoring: Extracting the keepalive logic into ✅ Consistent patterns: The new task follows the same pattern as ✅ Proper lifecycle management: The keepalive task is correctly spawned and cleaned up in both ✅ Jitter implementation: Good use of jitter (0-128ms) to prevent thundering herd problems ✅ Import organization: Follows the repository convention of keeping imports at the top of the file Issues & Suggestions1. Typo in documentation (Minor)Location: /// Periodically pings writes keepalive in UDB. This is used to restore hibernating request IDs on
Should be: /// Periodically writes keepalive pings in UDB. This is used to restore hibernating request IDs on
2. Formatting issue in documentation (Minor)Location: There's a missing space: ///Only ran for hibernating requests.
Should be: /// Only ran for hibernating requests.
3. Inconsistent lifecycle result handling (Moderate)Location: The lifecycle result matching logic shows asymmetry. When multiple tasks complete successfully but with different results, the code only checks the first two positions: (Ok(res), Ok(LifecycleResult::Aborted), _, _) => Ok(res),
(Ok(LifecycleResult::Aborted), Ok(res), _, _) => Ok(res),
// Unlikely case
(res, _, _, _) => res,Potential issue: If Suggestion: Consider a more comprehensive pattern that checks all positions for non-Aborted results: match (tunnel_to_ws_res, ws_to_tunnel_res, ping_res, keepalive_res) {
// Prefer error
(Err(err), _, _, _) => Err(err),
(_, Err(err), _, _) => Err(err),
(_, _, Err(err), _) => Err(err),
(_, _, _, Err(err)) => Err(err),
// Find first non-aborted result
(Ok(res), _, _, _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
(_, Ok(res), _, _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
(_, _, Ok(res), _) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
(_, _, _, Ok(res)) if !matches!(res, LifecycleResult::Aborted) => Ok(res),
// All aborted (unlikely)
(res, _, _, _) => res,
}4. Error handling observation (Low)Location: The let mut ping_interval = tokio::time::interval(Duration::from_millis(
(ctx.config()
.pegboard()
.hibernating_request_eligible_threshold()
/ 2)
.try_into()
.context("hibernating_request_eligible_threshold too large for interval")?,
));However, this is optional since the threshold would need to be absurdly large to cause issues. 5. Missing context on abort handling (Low)Location: The abort handling via _ = keepalive_abort_rx.changed() => {
tracing::debug!("keepalive task aborted");
return Ok(LifecycleResult::Aborted);
}Performance Considerations✅ The keepalive interval calculation ( ✅ Using ✅ Jitter range (0-128ms) is reasonable for preventing synchronization Security Considerations✅ No security concerns identified Test Coverage
VerdictRecommendation: Approve with minor fixes The changes are well-structured and solve the stated problem effectively. The only critical item is the documentation typo, which is trivial to fix. The lifecycle result handling observation is worth considering but not blocking since the current logic works for the common case. Great work on maintaining consistency with existing patterns! 🎯 |
PR Review: Keepalive Task RefactoringSummaryThis PR extracts keepalive logic into a dedicated task module and ensures hibernating requests are pinged both during active WebSocket connections and during hibernation. The refactoring improves code organization and fixes a potential bug where keepalive pings weren't being sent during active connections. Code Quality & Best Practices ✅Strengths:
Minor Issues:
Potential Bugs 🐛Medium Priority:
Performance Considerations ⚡Good:
Potential Improvement:
Security Concerns 🔒No major security issues identified. Minor observations:
Test Coverage 🧪Critical Gap:
Recommendations:
Architecture & Design 🏗️Excellent:
Adherence to CLAUDE.md Standards ✅
Summary & RecommendationsApprove with minor changes recommended:
The core logic is sound and this is a good refactoring. The issues identified are relatively minor but should be addressed to improve robustness. Priority order:
|
… connections and during hibernation
815c918 to
986289f
Compare
a11d203 to
0ed702c
Compare
Code ReviewOverviewThis PR refactors the keepalive logic for hibernating WebSocket requests by extracting it into a dedicated task module and running it alongside other lifecycle tasks. The change improves architecture and ensures hibernating requests are properly maintained both during active connections and during hibernation. Positive Aspects ✅
Issues and Suggestions 🔍1. Typo in Comment (Minor)Location: ///Only ran for hibernating requests.
Issue: Missing space after Fix: Should be 2. Import Organization (Style)Location: The diff shows imports were reordered: use rivet_guard_core::{
WebSocketHandle,
custom_serve::{CustomServeTrait, HibernationResult},
errors::{ServiceUnavailable, WebSocketServiceUnavailable},
proxy_service::{ResponseBody, is_ws_hibernate},
request_context::RequestContext,
websocket_handle::WebSocketReceiver,
};Issue: While this appears to be an auto-formatting change, the CLAUDE.md explicitly states:
Recommendation: Verify if these import reorderings were intentional or accidental. If accidental, they should be reverted to avoid conflicts with the team's formatting workflow. 3. Potential Integer Overflow (Low Risk)Location: let mut ping_interval = tokio::time::interval(Duration::from_millis(
(ctx.config()
.pegboard()
.hibernating_request_eligible_threshold()
/ 2)
.try_into()?,
));Issue: The code divides Consideration:
Recommendation: Current implementation is acceptable, but consider adding a debug assertion or comment explaining the assumption. 4. Task Lifecycle Abort Pattern Inconsistency (Minor)Location: In let _ = keepalive_abort_tx.send(());
let _ = keepalive_handle.await;Compare to the old implementation in the same function: keepalive_handle.abort();Observation: The new pattern is cleaner and more consistent with other tasks in Recommendation: Consider whether you want to log if the keepalive task panicked: if let Err(e) = keepalive_handle.await {
if e.is_panic() {
tracing::error!(error = ?e, "keepalive task panicked");
}
}5. Keepalive Task Only Runs for
|
Code ReviewI've reviewed this PR and overall the changes look good! The refactoring properly extracts the keepalive logic into a dedicated module and correctly ensures it runs both during active WebSocket connections and hibernation. Strengths
Issues Found1. Typo in documentation (keepalive_task.rs:10)
2. Missing space in documentation (keepalive_task.rs:13)
3. Comment reference outdated (keepalive_task.rs:30-31)
Recommendations
Performance & Security
Overall Assessment: Solid refactoring with minor documentation fixes needed. Core logic is correct and properly integrated. Great work! 🎉 |

No description provided.