-
Notifications
You must be signed in to change notification settings - Fork 134
fix(guard): handle actor stopped during ws/req #3481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-13-chore_rename_retry_-_hibernate_in_runner_protocol
Are you sure you want to change the base?
fix(guard): handle actor stopped during ws/req #3481
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Pull Request ReviewSummaryThis PR fixes a race condition where actor stops during WebSocket connections or HTTP requests weren't being properly handled, leading to requests hanging indefinitely. The fix adds subscriptions to the ✅ Positive Aspects
🔍 Code Quality Observations1. Duplicate Subscription PatternIn 2. Runner WebSocket Behavior Change (Important)In 3. Comment AccuracyLine 385 comment changed from "Send reclaimed messages" to "Send pending messages" - good improvement in clarity! ✅ 🐛 Potential Issues1. Subscription Timing in Hibernation FlowThe 2. Error Propagation During CleanupWhen sending the WebSocket close message fails, only an error log is produced. Should this error be propagated? If the close message fails to send, the runner might not clean up properly. 🔒 Security ConsiderationsNo security issues identified. The changes properly handle error cases and don't introduce new attack vectors. ✅ ⚡ Performance Considerations
🧪 Test CoverageMajor Gap: No tests are included with this PR for critical race condition fixes. Recommendations:
🎯 Recommendations SummaryMust Address:
Should Consider:
Final VerdictThis is a solid bug fix that addresses a real race condition. The implementation is clean and follows existing patterns. However, the lack of tests for such critical race condition fixes is concerning. Recommended Action: Approve with the requirement that tests be added before merging, especially for the WebSocket-specific behavior changes. Great work on identifying and fixing this race condition! 🚀 |
6c5e080 to
dc9084a
Compare
e83ac9b to
00cd5f2
Compare
PR Review: fix(guard): handle actor stopped during ws/reqSummaryThis PR adds handling for actor stopped events during WebSocket and HTTP request processing in the pegboard gateway, ensuring graceful cleanup when actors stop mid-request. The changes also improve WebSocket lifecycle management in the runner. Code Quality ✅Strengths:
Detailed Analysis1. pegboard-gateway/src/lib.rs - Actor Stopped HandlingLines 155-158, 290-293: Subscription creation let mut stopped_sub = self
.ctx
.subscribe::<pegboard::workflows::actor::Stopped>(("actor_id", self.actor_id))
.await?;✅ Good: Subscribes to actor stopped events before processing requests Lines 220-223, 356-359, 436-444: Select arms for stopped events _ = stopped_sub.next() => {
tracing::debug!("actor stopped while waiting for request response");
return Err(ServiceUnavailable.build());
}✅ Good: Appropriate error types returned for each context 2. pegboard-gateway/src/shared_state.rs - Pending Messages CheckLines 347-357: New pub async fn has_pending_websocket_messages(&self, request_id: RequestId) -> Result<bool> {
let Some(req) = self.in_flight_requests.get_async(&request_id).await else {
bail!("request not in flight");
};
if let Some(hs) = &req.hibernation_state {
Ok(!hs.pending_ws_msgs.is_empty())
} else {
Ok(false)
}
}✅ Good: Simple, focused method Lines 609-616: Early wake optimization in // Immediately rewake if we have pending messages
if self
.shared_state
.has_pending_websocket_messages(unique_request_id.into_bytes())
.await?
{
return Ok(HibernationResult::Continue);
}✅ Excellent: Prevents unnecessary hibernation when messages are pending 3. pegboard-runner/src/lib.rs - WebSocket Lifecycle ChangesLines 246-279: Simplified close message handling // Send close messages to all remaining active requests
let active_requests = conn.tunnel_active_requests.lock().await;
for (request_id, req) in &*active_requests {
// Websockets are not ephemeral like requests. If the runner ws closes they are not informed;
// instead they wait for the actor itself to stop.
if req.is_ws {
continue;
}
// ... send ToServerResponseAbort
}✅ Good: Clear distinction between WebSocket and HTTP request lifecycle Removed Code Analysis:
Potential Issues & SuggestionsMinor Issues
// Send pending messagesPrevious comment said "Send reclaimed messages" - good fix! ✅
// Subscribe before reading body
let mut stopped_sub = self.ctx.subscribe::<...>(...).await?;
let body_bytes = req.into_body().collect().await
.context("failed to read body")?
.to_bytes();
Questions for Discussion
Performance Considerations✅ Good: The early wake check (lines 609-616) prevents unnecessary hibernation overhead Security Concerns✅ No security issues identified
Test Coverage
Recommendations
VerdictApproval: ✅ Approved with Suggestions This is a solid PR that improves reliability by handling actor stopped events gracefully. The code follows existing patterns, has appropriate error handling, and includes a nice optimization for hibernation. The suggestions above are minor improvements that could be addressed in follow-up work. Strengths:
Action Items:
|
Code Review: PR #3481 - fix(guard): handle actor stopped during ws/reqSummaryThis PR adds proper handling for actor stop events during WebSocket and HTTP request processing in the pegboard gateway. The changes ensure graceful cleanup when actors stop while requests are in-flight. Code Quality ✅Strengths:
Potential Issues
|
PR Review: fix(guard): handle actor stopped during ws/reqOverviewThis PR addresses an important issue where the gateway doesn't properly handle actor stop events during WebSocket and HTTP request processing. The implementation subscribes to actor Code Quality & Best Practices✅ Strengths
🔍 Potential Issues1. Logic Error in
|

No description provided.