stats: retain entries by liveness instead of a tick retention window#1548
Conversation
Replace the time-based `retention` knob with a state-based rule: a stats entry is emitted while the broadcast is live (any open counter still exceeds its `*_closed` counterpart, so a subscription could begin at any moment) and on the tick its snapshot changes, then dropped once every counter equals its `*_closed` counterpart. This keeps a "currently active" dashboard view accurate instead of letting still-announced but idle broadcasts fall out of the frame after N idle ticks. The global GC keeps the `Arc::strong_count > 1` check: it's the race-safe realization of "fully closed" (no live guard holds the entry) and avoids dropping a held-but-idle handle whose later bump would otherwise be lost. Removes `StatsConfig::retention` / `with_retention`, the `--stats-retention` / `MOQ_STATS_RETENTION` relay flag, and the now-unused tick counter / change-tick / within-retention machinery. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Patch bump for the stats liveness-retention change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (5)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR removes the retention-window based snapshot lifecycle from the stats system and replaces it with a "live-or-changed" emission model. Entries are now emitted while their open counters differ from 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
doc/bin/relay/config.md (1)
183-187: 💤 Low valueConsider clarifying the dual emission conditions.
The sentence structure "An entry surfaces while the broadcast is live...and on the tick its snapshot changes" could be read as requiring both conditions simultaneously. Consider rewording to more explicitly state the two independent triggers:
"An entry surfaces while the broadcast is live (any open counter exceeds its
*_closedcounterpart) or whenever its snapshot changes on a tick. Once every counter equals its*_closedcounterpart, the entry is dropped."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@doc/bin/relay/config.md` around lines 183 - 187, The sentence describing when a counter snapshot entry "surfaces" is ambiguous about whether both conditions must hold; reword the paragraph around "counter snapshot" so it clearly lists the two independent triggers: (1) an entry surfaces while the broadcast is live (any open counter exceeds its `*_closed` counterpart) OR (2) an entry surfaces whenever its snapshot changes on a tick, and keep the final clause about dropping the entry when every counter equals its `*_closed` counterpart; update the sentence containing "An entry surfaces while the broadcast is live...and on the tick its snapshot changes" to reflect this explicit OR logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rs/moq-net/Cargo.toml`:
- Line 8: The published version in Cargo.toml is incorrectly incremented as a
patch for a breaking change: update version = "0.1.6" to "0.2.0" to reflect the
removal of the public API (StatsConfig::retention field and
StatsConfig::with_retention()), so downstream crates depending on moq-net =
"0.1" won't receive an unexpected breaking change; change the version string in
Cargo.toml accordingly and ensure any release notes mention removal of
StatsConfig::retention and StatsConfig::with_retention().
In `@rs/moq-relay/Cargo.toml`:
- Line 8: The release version in Cargo.toml is still 0.12.3 but the PR removes
the --stats-retention CLI flag and MOQ_STATS_RETENTION env var (a breaking
change for 0.x), so update the package version to 0.13.0 in Cargo.toml and
update any release notes/changelog entries to reflect the breaking change;
ensure the unique symbol "version" in Cargo.toml is changed from "0.12.3" to
"0.13.0" and mention removal of --stats-retention / MOQ_STATS_RETENTION in the
changelog.
---
Nitpick comments:
In `@doc/bin/relay/config.md`:
- Around line 183-187: The sentence describing when a counter snapshot entry
"surfaces" is ambiguous about whether both conditions must hold; reword the
paragraph around "counter snapshot" so it clearly lists the two independent
triggers: (1) an entry surfaces while the broadcast is live (any open counter
exceeds its `*_closed` counterpart) OR (2) an entry surfaces whenever its
snapshot changes on a tick, and keep the final clause about dropping the entry
when every counter equals its `*_closed` counterpart; update the sentence
containing "An entry surfaces while the broadcast is live...and on the tick its
snapshot changes" to reflect this explicit OR logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0d5b0e1-5c29-4511-9a8f-94e3b7905a01
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
doc/bin/relay/config.mdrs/moq-net/Cargo.tomlrs/moq-net/src/stats.rsrs/moq-relay/Cargo.tomlrs/moq-relay/src/config.rsrs/moq-relay/src/stats.rs
The previous "while live ... and on the tick its snapshot changes" wording read as a conjunction; the two triggers are independent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The stats broadcast is a single stable .stats/node/{name} aggregate, so
there's no reason to lazily spawn it on first traffic and tear it down
when idle. Spawn the snapshot task in Stats::new when an origin is set
and run it for the lifetime of the aggregator (the task exits when the
last Stats clone drops and its Weak fails to upgrade). The broadcast now
stays announced even when idle (frames go to {}), which also stops the
endpoint itself from flickering as traffic ebbs and flows.
This removes the EMPTY_EXIT_TICKS / empty-tick exit logic, the task
respawn slot, and ensure_task/clear_task.
StatsConfig::origin is now Option<OriginProducer>: with no origin the
aggregator is a no-op (bumps dropped, nothing published, no task), which
replaces Stats::disabled() / StatsHandle::disabled(). Opt out via
Stats::default() / StatsHandle::default(). StatsConfig::new() no longer
takes an origin; set it with with_origin().
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reconcile main into dev. Key conflict resolutions: - conducer crate renamed to kio (main #1547): applied across all of dev's newer code; dropped the stale conducer path-dep, kept dev's new flate2 dep. - moq-mux: kept dev's thiserror Result (#1495); dropped main's CatalogSource as dead code since dev's catalog::Consumer already unifies Hang/MSF. - moq-net: kept dev's OriginConsumer/AnnounceConsumer split (#1434) and the TrackConsumer end_at cap; kept dev's non-optional auto-created origins on the lite session/publisher (#e770). - stats: combined main's StatsConfig + liveness retention (#1537, #1548) with dev's AnnounceConsumer usage. - libmoq + moq-native: kept main's auto-reconnect (#1544), terminal-callback contract (#1546), and consume_announced (#1552), adapted to dev's AnnounceConsumer and OriginProducer connect API. Restored the InitFailed error variant and made moq-rtc handle the now-fallible Log::init. cargo check/clippy/test all pass on the merged workspace.
Summary
Replaces the time-based
retentionknob in the stats aggregator with a state-based rule:*_closedcounterpart, so a subscription could begin at any moment) and on the tick its snapshot changes.*_closedcounterpart, no traffic can flow, so the entry is dropped immediately rather than lingering for N idle ticks.This keeps a "currently active broadcasts" dashboard view accurate: previously a still-announced but idle broadcast fell out of the frame after
retentionidle ticks even though a subscription could still arrive.Why
The old retention window was a time-based heuristic to smooth reconnect flicker. Keying on
announced != announced_closed(and the other open/closed pairs) is a more precise signal for "this broadcast can still carry traffic", which is what the metric actually wants to express.Notes for reviewers
Arc::strong_count(entry) > 1check. That's the race-safe realization of "fully closed":strong_count == 1means no live guard holds the entry, which (afterprocess_slotruns each tick) is exactly when every open/closed pair is equal. It also avoids dropping a held-but-idleBroadcastStatshandle whose later bump would otherwise land on an orphanedArcand be lost.Snapshotfields), so nojs/netcounterpart needed updating.moq_net::StatsConfig(retention/with_retentionremoved) and the relay's--stats-retention/MOQ_STATS_RETENTIONflag. Targetingmainper request despite the public-API change.Changes
rs/moq-net/src/stats.rs: liveness-based emission/GC; removedretention,tick_counter,last_change_tick,within_retention,all_slots, and the per-ticklivepath set; task exit now uses anEMPTY_EXIT_TICKSconstant.rs/moq-relay/src/stats.rs: dropped the--stats-retentionflag.rs/moq-relay/src/config.rs: updated the TOML-merge regression test.doc/bin/relay/config.md: removed the retention knob; updated emission description.Test plan
cargo test -p moq-net stats::— 16 passed (incl. newlive_entry_kept_while_idleandentry_dropped_once_fully_closed)cargo test -p moq-relay cli_does_not_clobber— passedcargo clippy -p moq-net -p moq-relay --all-targets— clean(Written by Claude)