Skip to content

Conversation

@kixelated
Copy link
Collaborator

@kixelated kixelated commented Aug 21, 2025

try_announced was being used instead of announced. This caused it to exit early and silently stop running the cluster code.

Summary by CodeRabbit

  • Bug Fixes

    • More reliable connection handling across root and remote modes, with consistent “connection closed” signaling.
    • Clearer error messages when connections fail or close, reducing ambiguity for operators.
    • Improved origin discovery to avoid missing announcements under load.
  • Refactor

    • Standardized run logic to consistently propagate errors and explicitly surface connection-closed states, enhancing stability and diagnosability without changing public interfaces.

try_announced was being used instead of announced.
This caused it to exit early and silently stop running the cluster code.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 21, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • rs/Cargo.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Refactors Cluster::run control flow: when acting as root, await run_combined() and then bail with a “combined connection closed” error. In tokio::select!, each branch now adds error context and, on success, explicitly bails with a connection-closed message. Origin discovery switches from try_announced() to awaited announced().

Changes

Cohort / File(s) Summary
Cluster run control flow and error signaling
rs/moq-relay/src/cluster.rs
Refactor tokio::select! branches: add contextual errors; on success, explicitly bail with connection-closed messages for run_remote, run_remotes, and run_combined. In root path, await run_combined() then bail. Change origin discovery from non-blocking try_announced() to awaited announced().

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Cluster::run
  participant RR as run_remote
  participant RRs as run_remotes
  participant RC as run_combined
  participant OD as origins.announced()

  rect rgb(235, 245, 255)
    note over C: Startup and selection
    C->>RR: maybe connect to root (branch)
    C->>RRs: maybe accept remotes (branch)
    C->>RC: root mode combined (branch)
    C->>OD: await announced()
  end

  alt RR completes successfully
    RR-->>C: Ok(())
    note over C: Now bails: "connection to root closed"
  else RR errors
    RR-->>C: Err(e)
    note over C: Propagate with context "failed to connect to root"
  end

  alt RRs completes successfully
    RRs-->>C: Ok(())
    note over C: Now bails: "connection to remotes closed"
  else RRs errors
    RRs-->>C: Err(e)
    note over C: Propagate with context "failed to connect to remotes"
  end

  alt RC completes successfully
    RC-->>C: Ok(())
    note over C: Now bails: "combined connection closed"
  else RC errors
    RC-->>C: Err(e)
    note over C: Propagate with context "failed to run combined"
  end

  note over OD: Switched from try_announced() to awaited announced()
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A relay rabbit taps its paws in code,
When links go quiet, it squeaks “closed road.”
Selects now shout with clearer tone,
Origins await, no longer lone.
If runs succeed yet pathways fade—
We bail, we log, the trail is made. 🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-clustering

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@claude
Copy link
Contributor

claude bot commented Aug 21, 2025

Claude finished @kixelated's task —— View job


Code Review Complete ✅

Reviewing PR for bugs and security issues:

  • Read and analyze the changed file rs/moq-relay/src/cluster.rs
  • Check for bugs, security issues, and logic problems
  • Provide review feedback

Review Findings

👍 Looks good! This is a correct fix for a critical bug.

The Issue Fixed:
The code was using origins.try_announced() (line 156) instead of origins.announced().await, which caused the clustering discovery loop to:

  • Exit immediately when no announcements were available (try_announced() returns None if nothing is ready)
  • Never wait for future announcements, silently breaking cluster functionality

The Fix:
Changing to origins.announced().await correctly:

  • Waits asynchronously for the next origin announcement
  • Keeps the discovery loop running to handle new cluster nodes
  • Maintains proper clustering behavior as intended

Other Changes Reviewed:

  • Error handling improvements in tokio::select! branches are correct
  • Added helpful comment about never returning Ok()
  • Consistent error propagation with anyhow::bail!

No security issues, logic errors, or other bugs found. The fix resolves the silent clustering failure described in the PR.

@claude
Copy link
Contributor

claude bot commented Aug 21, 2025

Claude finished @kixelated's task —— View job


Code Review Complete ✅

Reviewing PR for bugs and security issues:

  • Read and analyze the changed file rs/moq-relay/src/cluster.rs
  • Check for bugs, security issues, and logic problems
  • Provide review feedback

Review Findings

👍 Looks good! This is a correct fix for a critical bug.

The Issue Fixed:
The code was using origins.try_announced() (line 156) instead of origins.announced().await, which caused the clustering discovery loop to:

  • Exit immediately when no announcements were available (try_announced() returns None if nothing is ready)
  • Never wait for future announcements, silently breaking cluster functionality

The Fix:
Changing to origins.announced().await correctly:

  • Waits asynchronously for the next origin announcement
  • Keeps the discovery loop running to handle new cluster nodes
  • Maintains proper clustering behavior as intended

Other Changes Reviewed:

  • Error handling improvements in tokio::select! branches are correct
  • Added helpful comment about never returning Ok()
  • Consistent error propagation with anyhow::bail!

No security issues, logic errors, or other bugs found. The fix resolves the silent clustering failure described in the PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
rs/moq-relay/src/cluster.rs (3)

106-120: Select branches correctly add context and bail on success (good); consider including node detail and de-duplicating patterns.

Nice tightening of semantics: errors gain context; successful branch completion becomes an Err to uphold the invariant.

Minor polish: include the target in messages for quicker triage, and reduce repetition.

Apply this minimal diff for clearer diagnostics:

-            res = self.clone().run_remote(&connect, token.clone(), noop) => {
-                res.context("failed to connect to root")?;
-                anyhow::bail!("connection to root closed");
+            res = self.clone().run_remote(&connect, token.clone(), noop) => {
+                res.with_context(|| format!("failed to connect to root {connect}"))?;
+                anyhow::bail!("connection to root {connect} closed");
             }

Optional follow-up: extract a small helper to wrap the “await -> context -> bail with message” pattern to avoid repeating it across the three arms.


156-156: Switch to awaited announced() fixes the early-exit bug.

This addresses the core issue described in the PR: the loop now waits for cluster-origin announcements instead of dropping out when none are immediately available.

If you want, I can add an inline unit test (under a mod tests in this file) that simulates an origin discovery stream and asserts the loop doesn’t terminate prematurely.


162-169: Prevent Panics on Early Origin Cancellation

The call to

active.remove(node.as_str()).unwrap().abort();

will panic if a cancellation event for node arrives before any task handle was ever inserted (or if it’s already been removed). Since active is initially empty and entries are only added once a new origin is discovered, it’s possible for origin == None to occur first—triggering an unexpected crash.

Apply this defensive check instead:

-    active.remove(node.as_str()).unwrap().abort();
+    if let Some(handle) = active.remove(node.as_str()) {
+        handle.abort();
+    } else {
+        tracing::warn!(%node, "no active handle found to abort on origin cancellation");
+    }

• This ensures we only call .abort() when a valid AbortHandle existed.
• Optionally, when inserting a new handle, you can abort any previous handle returned by active.insert to avoid orphaned tasks:

if let Some(prev) = active.insert(node.to_string(), handle.abort_handle()) {
    prev.abort();
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ceb95a1 and cd66f3b.

📒 Files selected for processing (1)
  • rs/moq-relay/src/cluster.rs (3 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
rs/moq-relay/**

📄 CodeRabbit inference engine (CLAUDE.md)

The CDN/relay (rs/moq-relay) must not include application logic, media codecs, or track details

Files:

  • rs/moq-relay/src/cluster.rs
rs/**/src/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

Rust tests should be integrated within source files (inline unit tests)

Files:

  • rs/moq-relay/src/cluster.rs
rs/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place Rust crates under the rs/ directory

Files:

  • rs/moq-relay/src/cluster.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Check
  • GitHub Check: claude-review
🔇 Additional comments (1)
rs/moq-relay/src/cluster.rs (1)

81-83: Root path now bails on graceful shutdown (LGTM).

Consistent with the “never return Ok” invariant. This prevents the root from silently exiting when combined closes.

Confirm the supervisor/main loop treats this Err as a controlled shutdown (e.g., logs and keeps the process running or restarts as intended), not as a fatal crash.

@kixelated kixelated merged commit 9189d1c into main Aug 21, 2025
4 checks passed
@kixelated kixelated deleted the fix-clustering branch August 21, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants