-
Notifications
You must be signed in to change notification settings - Fork 135
fix(gateway): prevent gc from removing hibernating in flight req, check actor started after sub when hibernating #3505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
Code ReviewI've reviewed PR #3505 and here's my feedback: SummaryThis PR fixes issues with hibernating WebSocket connections by:
Positive Changes✅ Race condition fix (lib.rs:639-652): The addition of fetching actor info after subscribing prevents a race condition where an actor could become ready between the subscription and the select. This is good defensive programming. ✅ Keepalive mechanism (keepalive_task.rs:49-57): Using ✅ GC timeout logic (shared_state.rs:503-505): Adding Issues Found🔴 Critical: Debug logging left in production code shared_state.rs:178, 373 tracing::info!(?message_id, ?payload, "---------------");
tracing::info!(?pending_msg.payload, ?pending_msg.message_index, "------2---------");These debug logs should be removed before merging. They appear to be temporary debugging statements. Recommendation: Remove these or convert to 🟡 Minor: Inconsistent logging format shared_state.rs:178 Recommendation: Use 🟡 Code quality: Parameter renaming lib.rs:270, 277, 568, 572 Recommendation: Fix the log message: 🟢 Minor: Comment typo keepalive_task.rs:11 Performance & Security✅ No performance concerns identified Test Coverage
Architecture Considerations✅ The two-phase approach (subscribe then check) correctly handles the race condition RecommendationsMust fix before merge:
Should fix: Nice to have: Overall, this is a solid fix for important race conditions in WebSocket hibernation. The logic is sound, but the debug logging needs to be cleaned up before merging. |
2cb75a1 to
a1a7bc8
Compare
PR Review: fix(gateway): prevent gc from removing hibernating in flight req, check actor started after sub when hibernatingSummaryThis PR addresses two critical race conditions in the pegboard gateway's hibernation handling:
✅ StrengthsArchitecture & Design
Code Quality
|
a1a7bc8 to
b1dae1e
Compare
8b84eb1 to
4f58c50
Compare
…ck actor started after sub when hibernating
b1dae1e to
b6dfb0d
Compare
Code ReviewThis PR addresses important issues with WebSocket hibernation handling in the pegboard gateway. Overall, the changes look solid and address real race conditions and lifecycle management issues. Here's my detailed feedback: ✅ Strengths
🔍 Issues & RecommendationsCritical - Debug Logging Left in CodeLine 188 in shared_state.rs: tracing::info!(?message_id, ?payload, "---------------");Line 383 in shared_state.rs: tracing::info!(?pending_msg.payload, ?pending_msg.message_index, "------2---------");These appear to be debug logs that should be removed before merging. They:
Recommendation: Remove these or change to Minor - Inconsistent Debug Log (Line 191 in shared_state.rs)tracing::debug!(?message_id, ?payload, "shared state send message");This debug log was added but duplicates information from the info logs above. Consider consolidating. Code Quality - Configuration Error HandlingLine 490 in shared_state.rs: Duration::from_millis(self.hibernation_timeout.try_into().unwrap_or(90_000))The
Performance - Redundant Variable AssignmentLines 422-436 in lib.rs: let keepalive = if can_hibernate {
Some(tokio::spawn(keepalive_task::task(
self.shared_state.clone(),
// ...
)))
} else {
None
};Then later checked with: let Some(keepalive) = keepalive else {
return Ok(LifecycleResult::Aborted);
};This pattern is correct but could be simplified. The task spawn could be deferred until needed in the join. Documentation - Missing Comment ContextThe new 🔒 Security Considerations
🧪 Testing RecommendationsThe PR should include tests for:
📋 Other Notes
SummaryThis is a solid fix for critical issues. The main blocker is the debug logging left in production code. Once that's addressed, this should be good to merge. Action Items:
|
Code Review - PR #3505SummaryThis PR fixes two critical issues with hibernating WebSocket connections in the pegboard gateway:
Positive Aspects ✅Architecture & Design
Code Quality
Issues & Concerns
|
PR Review: Fix Gateway Hibernation IssuesSummaryThis PR addresses two critical issues with hibernating WebSocket requests in the pegboard gateway:
Code Quality ✅Positive Changes:
Code Style:
Potential Issues & Concerns1. Debug Logging Left in Production Code
|
4f58c50 to
8b84eb1
Compare
b6dfb0d to
31af31a
Compare
Code ReviewI've reviewed this PR and identified several issues that should be addressed before merging. SummaryThis PR fixes two critical race conditions in the pegboard gateway's WebSocket hibernation handling:
🔴 Critical Issues1. Debug Logging Left in Production Code
tracing::info!(?message_id, ?payload, "shared state send message");
// ...
tracing::info!(?pending_msg.payload, ?pending_msg.message_index, "------2---------");These appear to be debug logs that need to be removed/fixed before merging:
Recommendation: Remove line 383 entirely. For line 188, either remove or change to 🟡 Issues to Address2. Silent Error Handling in Type Conversion
Duration::from_millis(self.hibernation_timeout.try_into().unwrap_or(90_000))The Recommendations:
3. Inconsistent Error Handling Pattern
if let Some(hs) = &mut req.hibernation_state {
hs.last_ping = Instant::now();
} else {
tracing::warn!("should not call keepalive_hws for non-hibernating ws");
}This logs a warning but returns let Some(hs) = &mut req.hibernation_state else {
bail!("cannot keepalive non-hibernating websocket");
};
hs.last_ping = Instant::now();This would match the pattern used elsewhere (e.g., lines 414-416). 4. Minor Typo in Log Message
tracing::debug!("exiting hibernating due to pending messages");Should be "exiting hibernation" (not "hibernating"). ✅ Positive Aspects
📋 RecommendationsBefore merge:
Nice to have:
🧪 Testing SuggestionsConsider adding tests for:
SummaryThe core fixes are architecturally sound and address real race conditions in the hibernation logic. The main blocker is the debug logging that shouldn't be in production code. Once the critical issues are addressed, this is good to merge. Overall: Approve with changes ⭐⭐⭐⭐ (4/5) |
PR Review: Fix Gateway Hibernating Request HandlingSummaryThis PR fixes critical issues with hibernating WebSocket request handling in the pegboard gateway, specifically preventing garbage collection from removing hibernating in-flight requests and adding proper actor state checks during hibernation. The changes are well-structured and address real race conditions. Code Quality & Best Practices✅ Strengths
|
31af31a to
b6dfb0d
Compare
Pull Request ReviewSummaryThis PR fixes two important race conditions in the pegboard gateway's hibernation logic:
Code Quality ✅Strengths:
Areas for Improvement: 1. Debug Log Cleanup (shared_state.rs:383)tracing::info\!(?pending_msg.payload, ?pending_msg.message_index, "------2---------");This appears to be leftover debug logging with a placeholder message. Should either be removed or converted to proper debug logging with a meaningful message. 2. Type Conversion Safety (shared_state.rs:490)Duration::from_millis(self.hibernation_timeout.try_into().unwrap_or(90_000))The fallback value (90 seconds) doesn't match the field type. Since
Potential Bugs 🐛1. Inconsistent Naming (keepalive_task.rs:11-12)Comment says "pings writes keepalive" - should be "periodically writes keepalive". Minor typo. 2. Race Condition Check Logic (lib.rs:311-314)ensure\!(
\!after_hibernation || \!new,
"should not be creating a new in flight entry after hibernation"
);Good defensive check! However, the error message could be more actionable. Consider logging additional context (actor_id, request_id) before the ensure! to help debug if this ever triggers. Performance Considerations ⚡Good Practices:
Potential Optimization: Security Concerns 🔒Positive:
Note: Test Coverage 🧪Missing Test Scenarios:
Recommended test additions: // Test that hibernating requests survive GC when keepalive is active
// Test that hibernating requests are GC'd after hibernation_timeout elapses
// Test the race condition fix: actor ready signal arrives during subscriptionArchitecture & Design 🏗️Strengths:
Suggestions:
async fn check_actor_ready(&self) -> Result<bool> {
if let Some(actor) = self.ctx.op(/*...*/).await? {
Ok(actor.runner_id.is_some())
} else {
Ok(false)
}
}Logging Improvements ✨Excellent changes:
Consistency Note: RecommendationsHigh Priority:
Medium Priority:
Low Priority:
VerdictApprove with minor changes requested ✅ This is a solid fix for important race conditions. The core logic is sound and well-implemented. The main issues are:
The changes follow the repository's conventions (conventional commits, structured logging, proper error handling) and integrate well with the existing architecture. Generated with Claude Code |
Merge activity
|
…ck actor started after sub when hibernating (#3505)

No description provided.