Skip to content

moq-relay: restore gossip-style cluster discovery via --cluster-mesh#1504

Open
kixelated wants to merge 7 commits into
mainfrom
claude/funny-euler-64507a
Open

moq-relay: restore gossip-style cluster discovery via --cluster-mesh#1504
kixelated wants to merge 7 commits into
mainfrom
claude/funny-euler-64507a

Conversation

@kixelated
Copy link
Copy Markdown
Collaborator

@kixelated kixelated commented May 25, 2026

Summary

Fixes #1499. Restores gossip-style peer discovery for moq-relay clusters by adding --cluster-mesh <self-url>, while keeping the new hop-routed origin and loop detection from #1322.

  • New: --cluster-mesh <self-url> publishes a placeholder broadcast at .internal/origins/<url> so peers reached via --cluster-connect discover and dial this relay automatically. Pairs with --cluster-connect for the bootstrap; multiple nodes form a full mesh after a single rendezvous. A relay with --cluster-mesh alone (no --cluster-connect) runs as a passive rendezvous without a QUIC client.
  • New: .block(prefix) view on OriginProducer / OriginConsumer (symmetric with scope and with_root) that refuses publishes and hides announces under a prefix. Applied to non-mTLS sessions in Cluster::publisher / Cluster::subscriber so JWT and anonymous clients can never see or publish into .internal/*, no matter how broad their declared scope or what their token.root is. (Block is applied to self.origin before with_root/scope so a JWT with token.root = \".internal\" cannot sidestep it; OriginProducer::consume() propagates blocked to the derived consumer.)
  • Removed flags --cluster-root and --cluster-node (the latter existed both pre-91ea43c5 and briefly in earlier commits of this branch) are kept as hidden Option<String> fields solely so the relay can bail at startup with a migration message pointing at --cluster-connect and --cluster-mesh. Catches old configs without surfacing them in --help.
  • Dial-task provenance: active: HashMap<String, DialEntry { handle, is_static }>. Static --cluster-connect peers are exempt from gossip-driven aborts so a peer restart doesn't kill the reconnect loop.
  • Docs: rewrote doc/bin/relay/cluster.md to cover both modes, the .internal/* access rule, and a migration table for root → connect and node → mesh. Updated doc/bin/relay/config.md, rs/moq-relay/README.md to match.
  • Demos: switched demo/relay/leaf{0,1}.toml to use mesh = instead of static connect = [root, leaf0], so the existing cluster-loop-detection exercise still runs but via discovery.

Why

Without gossip, every operator had to enumerate every peer URL in every relay's config, and adding a node required editing every existing node. The hop-based loop detection added in #1322 makes the old gossip trick safe to bring back without the special-case cluster: bool auth flag it used to need.

Reviewer notes

  • The .internal/ prefix matches the existing dotted-infrastructure convention (.stats/... from moq-lite/moq-relay: hop-based clustering #1322). Choosing it (versus internal/origins from the pre-rewrite code) gives a path that's also conventionally hidden from app-level subscribers.
  • Loop detection: when a relay's own registration is gossiped back through a peer, OriginProducer::publish_broadcast already refuses it because the hop chain contains this relay's OriginId. No special case needed.
  • Dial dedup: the gossip discovery loop and static --cluster-connect share a HashMap<String, DialEntry> so a peer reached via both paths only opens one session. DialEntry.is_static keeps gossip unannounces from tearing down static reconnect loops.
  • mTLS-only access: AuthToken::unrestricted() (mTLS) sets internal: true and skips the .block(\".internal\") wrap; JWT sessions always get the block, even with subscribe = [\"\"]. JWT cluster peers can still relay user traffic but cannot participate in gossip discovery.
  • Passive rendezvous: a relay with only --cluster-mesh set runs without a QUIC client, publishes its self-registration, and parks via std::future::pending() so the registration stays alive.

Test plan

  • cargo test -p moq-net — 332 pass, including 5 tests for the block() view (announce filter, publish refusal, get_broadcast filter, stacking, with_root interaction).
  • cargo test -p moq-relay — 87 pass, including 10 cluster tests: .internal/* invisible to broad-scope JWT, invisible even when token.root = \".internal\", visible to mTLS, --cluster-root errors with migration message (CLI + TOML), --cluster-node errors with migration message (CLI + TOML), passive rendezvous runs without a client and keeps its registration alive, gossip unannounce preserves static peers, cluster.mesh config round-trips.
  • cargo clippy -p moq-net -p moq-relay --all-targets --all-features -- -D warnings — clean.
  • cargo fmt --all --check and bun biome check — clean.
  • cargo check --workspace --exclude moq-gst — clean. (moq-gst fails on this machine due to missing gstreamer-1.0 system lib, unrelated.)
  • Manual e2e: bring up demo/relay/root.toml + leaf0 + leaf1, publish through leaf0, subscribe from leaf1; verify it flows without leaf1 having leaf0 in its static connect list. (Not run locally.)

🤖 Generated with Claude Code

The 91ea43c rewrite collapsed clustering onto a single hop-routed
origin and removed the old `--cluster-root`/`--cluster-node` flags
that let leaves discover each other through a rendezvous. The
replacement `--cluster-connect` requires every operator to enumerate
every peer URL in every relay's config, which is O(N^2) configuration
and bites anyone adding a node. Issue #1499 hit this after merging
main.

Hop-based loop detection now lives on every broadcast, so the old
"publish a registration broadcast to a well-known path" trick can
return without the `cluster: bool` auth flag that used to gate it.

Bring back `--cluster-node <self-url>`: when set the relay publishes
a placeholder broadcast at `.internal/origins/<url>` on its own
origin, and watches that prefix for other peers' registrations. Each
newly-announced URL is dialed with the same exponential-backoff loop
as `--cluster-connect`, deduplicated by URL so static and gossip
peers don't double-dial.

Scope the `.internal/*` namespace to mTLS sessions only. Add a small
`OriginProducer::block(prefix)` / `OriginConsumer::block(prefix)`
view to moq-net (symmetric with `scope`/`with_root`) that refuses
publishes and hides announces under a prefix. `Cluster::publisher`
and `Cluster::subscriber` apply `.block(".internal")` for any token
where `internal == false`, so even a broad-scope JWT can't see or
publish into the mesh namespace.

`--cluster-root` / `cluster.root` is kept as a hidden Option<String>
field that bails at startup with a migration message pointing at
`--cluster-connect` and `--cluster-node`.

Rewrite doc/bin/relay/cluster.md to describe both the static and
gossip modes, the `.internal/*` access rule, and a migration table.
Update doc/bin/relay/config.md, rs/moq-relay/README.md to match.
Switch demo/relay/leaf{0,1}.toml from static `connect = [root,
leaf]` to gossip (`connect = [root]` + `node`), so the loop-detection
exercise still runs but through discovery.

Closes #1499

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds per-handle absolute path-prefix blocking to OriginProducer/OriginConsumer and preserves blocked prefixes across scope/with_root/Clone. Relay clustering is extended with ClusterConfig.mesh (--cluster-mesh) and a combined static+gossip run loop that publishes/consumes registrations under .internal/origins/* and dials discovered peers; legacy cluster.node/cluster.root are retained as hidden migration-only flags that cause startup errors. Authentication now blocks .internal/* for non-internal sessions. Documentation, demos, and README/config references are updated for the static vs gossip clustering model and migration guidance.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: restoring gossip-style cluster discovery via the new --cluster-mesh flag, which is the primary feature added in this PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, providing detailed context about the new --cluster-mesh feature, .block() view, migration path for removed flags, and test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/funny-euler-64507a

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@doc/bin/relay/cluster.md`:
- Line 83: The doc incorrectly claims a "cluster JWT must include the cluster
privileges that bypass" the `.internal/*` filter; update the paragraph about
"JWT-authenticated cluster sessions" to remove or correct that bypass claim and
clearly state that `.internal/*` endpoints are mTLS-only (JWT sessions are
tagged external and are filtered from `.internal/*` by default), and recommend
using mTLS for internal access; adjust the sentence referencing "cluster JWT" so
it no longer implies JWT can bypass `.internal/*` (or explicitly note any
supported exception if truly implemented).

In `@rs/moq-relay/src/cluster.rs`:
- Around line 196-205: The active map currently conflates statically configured
peers and gossip-discovered peers so an unannounce can abort static reconnect
loops; change the tracking so static peers are not aborted by gossip unannounces
by either (a) replacing active: Arc<Mutex<HashMap<String, AbortHandle>>> with a
map value that includes a provenance flag (e.g., enum or struct { handle:
AbortHandle, static: bool }) and only abort/remove entries when an unannounce
targets a non-static entry, or (b) maintain two maps (e.g., static_active and
discovered_active) and have spawn_dial, the initial loop that seeds from
self.config.connect, and the unannounce handler use the appropriate map; update
spawn_dial, any code that inserts/looks-up in active, and the unannounce/remove
logic to honor the static marker so static peers (seeded via
self.config.connect) keep their reconnect loop running.
- Around line 132-143: The current subscriber and publisher methods build a view
then call view.block(".internal") for non-internal tokens, which blocks
".internal" relative to the rooted view and allows bypass when token.root ==
".internal"; instead, for non-internal tokens apply block(".internal") to
self.origin before calling with_root(...) and scope(...). Update both
subscriber(&self, token: &AuthToken) and publisher(&self, token: &AuthToken) to
compute a base_origin = if token.internal { &self.origin } else {
&self.origin.block(".internal") } (or equivalent) and then call
base_origin.with_root(&token.root)?.scope(...) so the internal subtree is always
blocked for non-internal tokens while preserving original behavior for internal
tokens.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e87d291-01d5-481b-bcb8-5b3e5a7e3ee0

📥 Commits

Reviewing files that changed from the base of the PR and between 7a8616d and 83b84a4.

📒 Files selected for processing (7)
  • demo/relay/leaf0.toml
  • demo/relay/leaf1.toml
  • doc/bin/relay/cluster.md
  • doc/bin/relay/config.md
  • rs/moq-net/src/model/origin.rs
  • rs/moq-relay/README.md
  • rs/moq-relay/src/cluster.rs

Comment thread doc/bin/relay/cluster.md Outdated
Comment thread rs/moq-relay/src/cluster.rs Outdated
Comment thread rs/moq-relay/src/cluster.rs Outdated
…hurn

Three review fixes for #1504.

1. block-before-root for non-internal tokens. Previously the relay
   built a view as `with_root(token.root).scope(token.subscribe).consume()`
   and *then* called `.block(".internal")` on it. The block stores
   absolute paths, computed at call time as `self.root.join(prefix)`.
   For a token with `root == ".internal"` the block resolved to
   absolute `.internal/.internal`, and the real `.internal/origins/*`
   mesh paths leaked through. Apply `.block(".internal")` to
   `self.origin` *before* `with_root` so the block is anchored at the
   true absolute root regardless of the token's `root` claim.

2. `OriginProducer::consume()` now propagates `blocked` to the derived
   consumer. The view is meaningfully a (publish-refuse, announce-hide)
   pair; the consumer half was unfiltered, so the producer's publish
   block left the announce side wide open. The fix in #1 exposed this:
   the regression test for token.root=".internal" wouldn't have failed
   if the consumer had been blocked. Now `.block().consume()` returns
   a consumer that also filters announces.

3. Static peers survive gossip unannounces. `active: HashMap<String,
   AbortHandle>` used to be shared between `--cluster-connect` (static)
   peers and gossip-discovered peers. A peer that appeared in both
   would have its static reconnect loop aborted as soon as its
   `.internal/origins/<peer>` registration was unannounced (e.g. peer
   restart). Replace the value with `DialEntry { handle, is_static }`
   and only abort discovered entries on unannounce.

Doc: rewrote the JWT subsection in `cluster.md` to drop the incorrect
claim that a "cluster JWT" can bypass `.internal/*` filtering.
`.internal/*` is mTLS-only; JWT cluster peers can relay user traffic
but cannot participate in gossip discovery.

Tests:
- `internal_paths_invisible_when_token_root_is_internal` exercises
  the block-before-root path.
- `gossip_unannounce_preserves_static_peer` exercises the new
  `DialEntry.is_static` branch by inserting both a static and a
  discovered entry into the active map and asserting only the
  discovered one is removed.

Extract `Cluster::handle_gossip_unannounce` so the test can hit the
exact same code path the discovery loop uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rs/moq-relay/src/cluster.rs (1)

187-196: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep --cluster-node usable as passive rendezvous mode.

run() currently treats node.is_some() as outbound dial work, so a node-only relay now fails without with_client() and also starts discovery that dials inbound-announced peers. That breaks the documented --cluster-node-only behavior where the relay should just advertise itself and accept inbound cluster connections.

Proposed fix
-		let has_work = !self.config.connect.is_empty() || self.config.node.is_some();
+		let has_static_dials = !self.config.connect.is_empty();
+		let has_work = has_static_dials || self.config.node.is_some();
 		if !has_work {
 			tracing::info!("no cluster peers configured; running standalone");
 			return Ok(());
 		}
 
-		anyhow::ensure!(
-			self.client.is_some(),
-			"cluster peers configured but no QUIC client attached (call Cluster::with_client)"
-		);
+		if has_static_dials {
+			anyhow::ensure!(
+				self.client.is_some(),
+				"cluster peers configured but no QUIC client attached (call Cluster::with_client)"
+			);
+		}
@@
-		if let Some(self_url) = self.config.node.clone() {
+		if has_static_dials {
+			if let Some(self_url) = self.config.node.clone() {
 			let this = self.clone();
 			let token = token.clone();
 			let active = active.clone();
 			tasks.spawn(async move {
 				this.run_discovery(self_url, token, active).await;
 			});
+			}
 		}

This contradicts the documented node-only rendezvous behavior in rs/moq-relay/README.md:67-76.

Also applies to: 231-239

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-relay/src/cluster.rs` around lines 187 - 196, In run(), treat a
configured node as passive rendezvous (advertise/accept) and only require a QUIC
client when there is outbound dial work: replace the single has_work check with
two checks (e.g., has_any_config = !self.config.connect.is_empty() ||
self.config.node.is_some() and has_outbound = !self.config.connect.is_empty()),
and change the anyhow::ensure! that checks self.client to only fire when
has_outbound is true; apply the same split/logic to the second occurrence noted
around the 231-239 block so node-only mode no longer demands
Cluster::with_client and does not initiate outbound dialing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@rs/moq-relay/src/cluster.rs`:
- Around line 187-196: In run(), treat a configured node as passive rendezvous
(advertise/accept) and only require a QUIC client when there is outbound dial
work: replace the single has_work check with two checks (e.g., has_any_config =
!self.config.connect.is_empty() || self.config.node.is_some() and has_outbound =
!self.config.connect.is_empty()), and change the anyhow::ensure! that checks
self.client to only fire when has_outbound is true; apply the same split/logic
to the second occurrence noted around the 231-239 block so node-only mode no
longer demands Cluster::with_client and does not initiate outbound dialing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4f02c04d-4e2b-4716-b2fd-6d042b971a7a

📥 Commits

Reviewing files that changed from the base of the PR and between 83b84a4 and ea8678f.

📒 Files selected for processing (3)
  • doc/bin/relay/cluster.md
  • rs/moq-net/src/model/origin.rs
  • rs/moq-relay/src/cluster.rs
✅ Files skipped from review due to trivial changes (1)
  • doc/bin/relay/cluster.md

The README and cluster docs both describe `--cluster-node` alone as a
valid "rendezvous" mode: the relay advertises its URL on the cluster
origin and accepts inbound cluster sessions, but does not dial out.
The actual `run()` implementation was inconsistent with this:

- It required `Cluster::with_client` whenever any cluster work was
  configured, including node-only mode. A relay without outbound dials
  has no need for a QUIC client.
- It spawned the gossip discovery task whenever `node` was set, even
  with no `connect` peers. A passive rendezvous can't usefully dial
  peers it learns about (they already have inbound sessions to us;
  dialing back creates duplicates).
- The `_self_registration` BroadcastProducer was held in a local. If
  no dial tasks were spawned, the `while tasks.join_next()` loop
  exited immediately, dropping the broadcast and unannouncing the
  relay before any peer could see it.

Fix:

- Split `has_work` into `has_outbound = !connect.is_empty()` and
  `has_work = has_outbound || node.is_some()`.
- `with_client` is only required when `has_outbound`.
- Discovery is only spawned when `has_outbound` AND `node.is_some()`.
- When no tasks are spawned (passive rendezvous), park forever with
  `std::future::pending().await` so the self-registration broadcast
  stays alive. `cluster.run()` is one arm of a `tokio::select!` in
  `main.rs`, so the process still exits cleanly via the other arms.

Tests:
- `passive_rendezvous_runs_without_client_and_advertises_self` builds
  a Cluster with only `node` set, spawns `run()`, asserts the
  self-registration is visible on the origin, and confirms `run()`
  parks (doesn't return) so the registration stays published.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kixelated
Copy link
Copy Markdown
Collaborator Author

@coderabbitai, fixed in 0524a10 — split has_work into has_outbound = !connect.is_empty() and a node-only check. The with_client requirement and the discovery spawn are now both gated behind has_outbound. Also fixed a related bug where the _self_registration broadcast would be dropped immediately in passive-rendezvous mode (no tasks to wait on → while tasks.join_next() exited): run() now parks with std::future::pending() when no dial tasks are spawned, keeping the registration alive. Regression test: passive_rendezvous_runs_without_client_and_advertises_self. (Written by Claude)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@kixelated Thanks for the detailed write-up! The has_outbound split cleanly separates the two concerns, and parking with std::future::pending() is the right call to keep the registration broadcast alive in passive-rendezvous mode. The regression test covering that path is a nice addition too. Looks good!


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

…cluster-node

Use `--cluster-mesh <self-url>` for the gossip self-advertisement instead of
reusing the old `--cluster-node` name. Both `--cluster-root` and
`--cluster-node` (and their TOML counterparts) now error at startup with a
migration message pointing at `--cluster-connect` and `--cluster-mesh`.

`cluster.node` was a brief detour: the pre-91ea43c5 design used it for the
self-URL, then the rewrite removed it, and the prior commit on this branch
re-added it with the same meaning. Renaming to `mesh` makes the role explicit
and frees `node` for any future per-instance identity work without overlap.

ClusterConfig now has:
  - `connect: Vec<String>` (unchanged) -- static peer URLs to dial.
  - `mesh: Option<String>` (new) -- our own URL, advertised for gossip discovery.
  - `node: Option<String>` (hidden, deprecation-only) -- bails if set.
  - `root: Option<String>` (hidden, deprecation-only) -- bails if set.

Both deprecation messages reference the same two replacement flags so the
operator only has to learn one mapping.

Tests:
- Renamed `cluster_node_round_trips` -> `cluster_mesh_round_trips`.
- Added `cluster_node_errors_with_migration_message` (CLI) and
  `cluster_node_toml_parses_then_errors` (TOML) mirroring the root tests.
- `passive_rendezvous_runs_without_client_and_advertises_self` now uses
  `mesh: Some(...)`.

Docs: `doc/bin/relay/cluster.md`, `doc/bin/relay/config.md`,
`rs/moq-relay/README.md` updated. The migration table now covers both
`node -> mesh` and `root -> connect`.

Demos: `demo/relay/leaf{0,1}.toml` switched from `node =` to `mesh =`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kixelated kixelated changed the title moq-relay: restore gossip-style cluster discovery via --cluster-node moq-relay: restore gossip-style cluster discovery via --cluster-mesh May 26, 2026
Add a short section about keeping `gh pr edit --title/--body` in sync when
later commits change scope (renamed flag, reshaped API, extra fix landed).
The PR body becomes the squash-merge commit message, so a stale body means
a misleading entry in `git log` permanently. Calls out the common drift
spots and reminds to keep the `(Written by Claude)` marker on edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
rs/moq-relay/README.md (1)

68-68: ⚡ Quick win

Replace the em dash in this bullet.

Use a period, comma, or parenthesis instead of an em dash to comply with repo style.

As per coding guidelines, md prose must not contain em dash characters.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-relay/README.md` at line 68, The bullet "Gossip discovery —
`--cluster-mesh <self-url>`..." contains an em dash which violates the MD prose
style; replace the em dash with an approved punctuation (e.g., comma, period, or
parentheses) so the line reads like "Gossip discovery, `--cluster-mesh
<self-url>`..." or "Gossip discovery (`--cluster-mesh <self-url>`)...", keeping
the rest of the sentence intact and preserving the backticked flag tokens.
rs/moq-relay/src/cluster.rs (1)

173-183: ⚡ Quick win

Trim long explanatory comments to non-obvious constraints only.

Several new comments restate what the code already expresses. Please condense these to short why-focused notes to match the Rust comment style rule and keep scanability high.

As per coding guidelines, Rust comments should be brief and reserved for non-obvious why.

Also applies to: 232-233, 257-260, 273-276

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rs/moq-relay/src/cluster.rs` around lines 173 - 183, Condense the verbose doc
comment that starts with "Runs the cluster event loop." into a short,
why-focused note that only documents non-obvious constraints (e.g., the three
operational modes and when a QUIC client is required), instead of restating what
the code names already express; apply the same trimming to the other long
comments referenced around the same module (the blocks at the ranges
corresponding to lines 232-233, 257-260, and 273-276) so each comment briefly
explains the rationale or invariants rather than repeating implementation
details — update the comment text adjacent to the cluster runner function and
the related helpers (look for the "Runs the cluster event loop." docblock and
nearby helper comments) to be concise and idiomatic for Rust.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@doc/bin/relay/cluster.md`:
- Line 40: Update the sentence describing unannounce behavior to state that when
a leaf with `mesh` set unannounces its registration at `.internal/origins/<url>`
on the cluster origin, it causes gossip-driven dials to be aborted on peers that
discovered the leaf via the rendezvous broadcast, but it does not abort
connections to peers that were configured via static `connect` (i.e., explicitly
configured/static peers are exempt from gossip-driven aborts), so align this
wording with the behavior described around the `connect` exemption and the logic
implemented in the cluster code that avoids aborting static `connect` peers.

---

Nitpick comments:
In `@rs/moq-relay/README.md`:
- Line 68: The bullet "Gossip discovery — `--cluster-mesh <self-url>`..."
contains an em dash which violates the MD prose style; replace the em dash with
an approved punctuation (e.g., comma, period, or parentheses) so the line reads
like "Gossip discovery, `--cluster-mesh <self-url>`..." or "Gossip discovery
(`--cluster-mesh <self-url>`)...", keeping the rest of the sentence intact and
preserving the backticked flag tokens.

In `@rs/moq-relay/src/cluster.rs`:
- Around line 173-183: Condense the verbose doc comment that starts with "Runs
the cluster event loop." into a short, why-focused note that only documents
non-obvious constraints (e.g., the three operational modes and when a QUIC
client is required), instead of restating what the code names already express;
apply the same trimming to the other long comments referenced around the same
module (the blocks at the ranges corresponding to lines 232-233, 257-260, and
273-276) so each comment briefly explains the rationale or invariants rather
than repeating implementation details — update the comment text adjacent to the
cluster runner function and the related helpers (look for the "Runs the cluster
event loop." docblock and nearby helper comments) to be concise and idiomatic
for Rust.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: afb300b5-705a-48d4-9228-201a0612bcf7

📥 Commits

Reviewing files that changed from the base of the PR and between 0524a10 and ea4189e.

📒 Files selected for processing (7)
  • CLAUDE.md
  • demo/relay/leaf0.toml
  • demo/relay/leaf1.toml
  • doc/bin/relay/cluster.md
  • doc/bin/relay/config.md
  • rs/moq-relay/README.md
  • rs/moq-relay/src/cluster.rs
✅ Files skipped from review due to trivial changes (2)
  • CLAUDE.md
  • doc/bin/relay/config.md

Comment thread doc/bin/relay/cluster.md Outdated
Three small fixes from PR review on c2936e4:

1. doc/bin/relay/cluster.md: the "Removing a node unannounces..." sentence
   said the dial is aborted "on every other peer", but static `connect`
   peers are exempt by design. Reword to match the gossip-only abort
   behavior plus the static exemption already documented further down.

2. rs/moq-relay/README.md: replace em dashes in the clustering bullets
   with parenthetical phrasing to comply with the repo's no-em-dash
   prose rule.

3. rs/moq-relay/src/cluster.rs: trim verbose comments in `run()` per the
   project's comment guideline. The function docblock is shorter, and
   the inline notes around self-registration, dial tracking, discovery
   gating, and the parking branch are reduced to non-obvious why points.
   DialEntry's own docblock already covers the static/discovered split,
   so the duplicate explanation in `active`'s declaration is removed.

No behavior change; 10 cluster tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@kixelated kixelated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the .block functionality, don't make .internal special. relay operators should figure it out.

Comment thread demo/relay/leaf0.toml Outdated
Comment thread demo/relay/leaf1.toml Outdated
Comment thread doc/bin/relay/cluster.md Outdated
Comment thread doc/bin/relay/cluster.md Outdated
Comment thread doc/bin/relay/cluster.md Outdated
Comment thread doc/bin/relay/cluster.md Outdated
Comment thread doc/bin/relay/cluster.md Outdated
Comment thread rs/moq-net/src/model/origin.rs Outdated
…view

Address review feedback on the cluster work:

- Revert OriginProducer::block / OriginConsumer::block and the
  `Cluster::access_origin` helper. The relay relies on token scopes
  to keep customers away from `.internal/*`; if a token's subscribe
  scope is broad enough to see infrastructure paths, that's an auth
  configuration issue, not something the origin view should
  defensively paper over. Removes the `blocked` field, `is_blocked`
  helpers, `new_with_blocked`, and the five block-related tests in
  moq-net (no behavior change for any other caller). Cluster
  subscriber/publisher collapse back to the simple
  `with_root(token.root).scope(token.subscribe).consume()` form.

- Rewrite doc/bin/relay/cluster.md for humans. Open with one sentence
  on what clustering does (relays joined to proxy
  announcements/subscriptions). Lead with a multi-hop chain example
  (eu-west <- us-east <- us-west) that shows the actual point of
  clustering: a relay closer to viewers caches and dedups fetches
  back toward the source, which a full mesh would skip. Treat
  `cluster.mesh` as an optional auto-discovery convenience layered
  on top, with a one-line note that you still need at least one
  connection somewhere for advertisements to flow.

- Trim rs/moq-relay/README.md and doc/bin/relay/config.md to match
  the new framing.

- Demo configs: drop `mesh =` from leaf0.toml and leaf1.toml.
  The demo is meant to exercise proxying (and loop detection via
  leaf1's connect=[root, leaf0] triangle), not gossip. Restore the
  original comments.

No public API change beyond removing block(), which had only landed
on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

old leaf discovery strategy gone?

1 participant