-
Notifications
You must be signed in to change notification settings - Fork 134
Fix clustering. #546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix clustering. #546
Conversation
try_announced was being used instead of announced. This caused it to exit early and silently stop running the cluster code.
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughRefactors Cluster::run control flow: when acting as root, await run_combined() and then bail with a “combined connection closed” error. In tokio::select!, each branch now adds error context and, on success, explicitly bails with a connection-closed message. Origin discovery switches from try_announced() to awaited announced(). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Cluster::run
participant RR as run_remote
participant RRs as run_remotes
participant RC as run_combined
participant OD as origins.announced()
rect rgb(235, 245, 255)
note over C: Startup and selection
C->>RR: maybe connect to root (branch)
C->>RRs: maybe accept remotes (branch)
C->>RC: root mode combined (branch)
C->>OD: await announced()
end
alt RR completes successfully
RR-->>C: Ok(())
note over C: Now bails: "connection to root closed"
else RR errors
RR-->>C: Err(e)
note over C: Propagate with context "failed to connect to root"
end
alt RRs completes successfully
RRs-->>C: Ok(())
note over C: Now bails: "connection to remotes closed"
else RRs errors
RRs-->>C: Err(e)
note over C: Propagate with context "failed to connect to remotes"
end
alt RC completes successfully
RC-->>C: Ok(())
note over C: Now bails: "combined connection closed"
else RC errors
RC-->>C: Err(e)
note over C: Propagate with context "failed to run combined"
end
note over OD: Switched from try_announced() to awaited announced()
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Claude finished @kixelated's task —— View job Code Review Complete ✅Reviewing PR for bugs and security issues:
Review Findings👍 Looks good! This is a correct fix for a critical bug. The Issue Fixed:
The Fix:
Other Changes Reviewed:
No security issues, logic errors, or other bugs found. The fix resolves the silent clustering failure described in the PR. |
|
Claude finished @kixelated's task —— View job Code Review Complete ✅Reviewing PR for bugs and security issues:
Review Findings👍 Looks good! This is a correct fix for a critical bug. The Issue Fixed:
The Fix:
Other Changes Reviewed:
No security issues, logic errors, or other bugs found. The fix resolves the silent clustering failure described in the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
rs/moq-relay/src/cluster.rs (3)
106-120: Select branches correctly add context and bail on success (good); consider including node detail and de-duplicating patterns.Nice tightening of semantics: errors gain context; successful branch completion becomes an Err to uphold the invariant.
Minor polish: include the target in messages for quicker triage, and reduce repetition.
Apply this minimal diff for clearer diagnostics:
- res = self.clone().run_remote(&connect, token.clone(), noop) => { - res.context("failed to connect to root")?; - anyhow::bail!("connection to root closed"); + res = self.clone().run_remote(&connect, token.clone(), noop) => { + res.with_context(|| format!("failed to connect to root {connect}"))?; + anyhow::bail!("connection to root {connect} closed"); }Optional follow-up: extract a small helper to wrap the “await -> context -> bail with message” pattern to avoid repeating it across the three arms.
156-156: Switch to awaited announced() fixes the early-exit bug.This addresses the core issue described in the PR: the loop now waits for cluster-origin announcements instead of dropping out when none are immediately available.
If you want, I can add an inline unit test (under a
mod testsin this file) that simulates an origin discovery stream and asserts the loop doesn’t terminate prematurely.
162-169: Prevent Panics on Early Origin CancellationThe call to
active.remove(node.as_str()).unwrap().abort();will panic if a cancellation event for
nodearrives before any task handle was ever inserted (or if it’s already been removed). Sinceactiveis initially empty and entries are only added once a new origin is discovered, it’s possible fororigin == Noneto occur first—triggering an unexpected crash.Apply this defensive check instead:
- active.remove(node.as_str()).unwrap().abort(); + if let Some(handle) = active.remove(node.as_str()) { + handle.abort(); + } else { + tracing::warn!(%node, "no active handle found to abort on origin cancellation"); + }• This ensures we only call
.abort()when a validAbortHandleexisted.
• Optionally, when inserting a new handle, you can abort any previous handle returned byactive.insertto avoid orphaned tasks:if let Some(prev) = active.insert(node.to_string(), handle.abort_handle()) { prev.abort(); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
rs/moq-relay/src/cluster.rs(3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
rs/moq-relay/**
📄 CodeRabbit inference engine (CLAUDE.md)
The CDN/relay (rs/moq-relay) must not include application logic, media codecs, or track details
Files:
rs/moq-relay/src/cluster.rs
rs/**/src/**/*.rs
📄 CodeRabbit inference engine (CLAUDE.md)
Rust tests should be integrated within source files (inline unit tests)
Files:
rs/moq-relay/src/cluster.rs
rs/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place Rust crates under the rs/ directory
Files:
rs/moq-relay/src/cluster.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Check
- GitHub Check: claude-review
🔇 Additional comments (1)
rs/moq-relay/src/cluster.rs (1)
81-83: Root path now bails on graceful shutdown (LGTM).Consistent with the “never return Ok” invariant. This prevents the root from silently exiting when combined closes.
Confirm the supervisor/main loop treats this Err as a controlled shutdown (e.g., logs and keeps the process running or restarts as intended), not as a fatal crash.
try_announced was being used instead of announced. This caused it to exit early and silently stop running the cluster code.
Summary by CodeRabbit
Bug Fixes
Refactor