Skip to content

stats: retain entries by liveness instead of a tick retention window#1548

Merged
kixelated merged 4 commits into
mainfrom
claude/loving-joliot-4592be
May 29, 2026
Merged

stats: retain entries by liveness instead of a tick retention window#1548
kixelated merged 4 commits into
mainfrom
claude/loving-joliot-4592be

Conversation

@kixelated
Copy link
Copy Markdown
Collaborator

Summary

Replaces the time-based retention knob in the stats aggregator with a state-based rule:

  • An entry is emitted into the frame while it's live (any open counter still exceeds its *_closed counterpart, so a subscription could begin at any moment) and on the tick its snapshot changes.
  • Once every counter equals its *_closed counterpart, no traffic can flow, so the entry is dropped immediately rather than lingering for N idle ticks.

This keeps a "currently active broadcasts" dashboard view accurate: previously a still-announced but idle broadcast fell out of the frame after retention idle ticks even though a subscription could still arrive.

Why

The old retention window was a time-based heuristic to smooth reconnect flicker. Keying on announced != announced_closed (and the other open/closed pairs) is a more precise signal for "this broadcast can still carry traffic", which is what the metric actually wants to express.

Notes for reviewers

  • The global GC keeps the Arc::strong_count(entry) > 1 check. That's the race-safe realization of "fully closed": strong_count == 1 means no live guard holds the entry, which (after process_slot runs each tick) is exactly when every open/closed pair is equal. It also avoids dropping a held-but-idle BroadcastStats handle whose later bump would otherwise land on an orphaned Arc and be lost.
  • The on-the-wire frame format is unchanged (same Snapshot fields), so no js/net counterpart needed updating.
  • Breaking change to moq_net::StatsConfig (retention / with_retention removed) and the relay's --stats-retention / MOQ_STATS_RETENTION flag. Targeting main per request despite the public-API change.

Changes

  • rs/moq-net/src/stats.rs: liveness-based emission/GC; removed retention, tick_counter, last_change_tick, within_retention, all_slots, and the per-tick live path set; task exit now uses an EMPTY_EXIT_TICKS constant.
  • rs/moq-relay/src/stats.rs: dropped the --stats-retention flag.
  • rs/moq-relay/src/config.rs: updated the TOML-merge regression test.
  • doc/bin/relay/config.md: removed the retention knob; updated emission description.

Test plan

  • cargo test -p moq-net stats:: — 16 passed (incl. new live_entry_kept_while_idle and entry_dropped_once_fully_closed)
  • cargo test -p moq-relay cli_does_not_clobber — passed
  • cargo clippy -p moq-net -p moq-relay --all-targets — clean

(Written by Claude)

kixelated and others added 2 commits May 29, 2026 14:01
Replace the time-based `retention` knob with a state-based rule: a stats
entry is emitted while the broadcast is live (any open counter still
exceeds its `*_closed` counterpart, so a subscription could begin at any
moment) and on the tick its snapshot changes, then dropped once every
counter equals its `*_closed` counterpart. This keeps a "currently
active" dashboard view accurate instead of letting still-announced but
idle broadcasts fall out of the frame after N idle ticks.

The global GC keeps the `Arc::strong_count > 1` check: it's the
race-safe realization of "fully closed" (no live guard holds the entry)
and avoids dropping a held-but-idle handle whose later bump would
otherwise be lost.

Removes `StatsConfig::retention` / `with_retention`, the
`--stats-retention` / `MOQ_STATS_RETENTION` relay flag, and the
now-unused tick counter / change-tick / within-retention machinery.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Patch bump for the stats liveness-retention change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebbfca5a-083b-4d32-afb2-f39ea5857da3

📥 Commits

Reviewing files that changed from the base of the PR and between 331c4da and 67aabd7.

📒 Files selected for processing (8)
  • rs/moq-net/src/client.rs
  • rs/moq-net/src/lite/publisher.rs
  • rs/moq-net/src/lite/session.rs
  • rs/moq-net/src/lite/subscriber.rs
  • rs/moq-net/src/server.rs
  • rs/moq-net/src/stats.rs
  • rs/moq-relay/src/cluster.rs
  • rs/moq-relay/src/stats.rs
✅ Files skipped from review due to trivial changes (5)
  • rs/moq-net/src/lite/publisher.rs
  • rs/moq-net/src/client.rs
  • rs/moq-net/src/server.rs
  • rs/moq-net/src/lite/subscriber.rs
  • rs/moq-net/src/lite/session.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • rs/moq-relay/src/stats.rs

Walkthrough

This PR removes the retention-window based snapshot lifecycle from the stats system and replaces it with a "live-or-changed" emission model. Entries are now emitted while their open counters differ from *_closed counterparts, and are dropped once all counters match. The retention configuration parameter and all associated tick-tracking logic are removed from moq-net core stats, relay configuration, and supporting tests. Documentation is updated to reflect the new entry lifetime semantics based on counter state and Arc reference counting rather than configurable retention periods.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing time-based retention with liveness-based entry retention in the stats aggregator.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the rationale, implementation details, and test coverage for the retention mechanism refactor.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/loving-joliot-4592be

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kixelated
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
doc/bin/relay/config.md (1)

183-187: 💤 Low value

Consider clarifying the dual emission conditions.

The sentence structure "An entry surfaces while the broadcast is live...and on the tick its snapshot changes" could be read as requiring both conditions simultaneously. Consider rewording to more explicitly state the two independent triggers:

"An entry surfaces while the broadcast is live (any open counter exceeds its *_closed counterpart) or whenever its snapshot changes on a tick. Once every counter equals its *_closed counterpart, the entry is dropped."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@doc/bin/relay/config.md` around lines 183 - 187, The sentence describing when
a counter snapshot entry "surfaces" is ambiguous about whether both conditions
must hold; reword the paragraph around "counter snapshot" so it clearly lists
the two independent triggers: (1) an entry surfaces while the broadcast is live
(any open counter exceeds its `*_closed` counterpart) OR (2) an entry surfaces
whenever its snapshot changes on a tick, and keep the final clause about
dropping the entry when every counter equals its `*_closed` counterpart; update
the sentence containing "An entry surfaces while the broadcast is live...and on
the tick its snapshot changes" to reflect this explicit OR logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rs/moq-net/Cargo.toml`:
- Line 8: The published version in Cargo.toml is incorrectly incremented as a
patch for a breaking change: update version = "0.1.6" to "0.2.0" to reflect the
removal of the public API (StatsConfig::retention field and
StatsConfig::with_retention()), so downstream crates depending on moq-net =
"0.1" won't receive an unexpected breaking change; change the version string in
Cargo.toml accordingly and ensure any release notes mention removal of
StatsConfig::retention and StatsConfig::with_retention().

In `@rs/moq-relay/Cargo.toml`:
- Line 8: The release version in Cargo.toml is still 0.12.3 but the PR removes
the --stats-retention CLI flag and MOQ_STATS_RETENTION env var (a breaking
change for 0.x), so update the package version to 0.13.0 in Cargo.toml and
update any release notes/changelog entries to reflect the breaking change;
ensure the unique symbol "version" in Cargo.toml is changed from "0.12.3" to
"0.13.0" and mention removal of --stats-retention / MOQ_STATS_RETENTION in the
changelog.

---

Nitpick comments:
In `@doc/bin/relay/config.md`:
- Around line 183-187: The sentence describing when a counter snapshot entry
"surfaces" is ambiguous about whether both conditions must hold; reword the
paragraph around "counter snapshot" so it clearly lists the two independent
triggers: (1) an entry surfaces while the broadcast is live (any open counter
exceeds its `*_closed` counterpart) OR (2) an entry surfaces whenever its
snapshot changes on a tick, and keep the final clause about dropping the entry
when every counter equals its `*_closed` counterpart; update the sentence
containing "An entry surfaces while the broadcast is live...and on the tick its
snapshot changes" to reflect this explicit OR logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0d5b0e1-5c29-4511-9a8f-94e3b7905a01

📥 Commits

Reviewing files that changed from the base of the PR and between cf5dfd6 and 195be16.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • doc/bin/relay/config.md
  • rs/moq-net/Cargo.toml
  • rs/moq-net/src/stats.rs
  • rs/moq-relay/Cargo.toml
  • rs/moq-relay/src/config.rs
  • rs/moq-relay/src/stats.rs

Comment thread rs/moq-net/Cargo.toml
Comment thread rs/moq-relay/Cargo.toml
kixelated and others added 2 commits May 29, 2026 14:23
The previous "while live ... and on the tick its snapshot changes"
wording read as a conjunction; the two triggers are independent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The stats broadcast is a single stable .stats/node/{name} aggregate, so
there's no reason to lazily spawn it on first traffic and tear it down
when idle. Spawn the snapshot task in Stats::new when an origin is set
and run it for the lifetime of the aggregator (the task exits when the
last Stats clone drops and its Weak fails to upgrade). The broadcast now
stays announced even when idle (frames go to {}), which also stops the
endpoint itself from flickering as traffic ebbs and flows.

This removes the EMPTY_EXIT_TICKS / empty-tick exit logic, the task
respawn slot, and ensure_task/clear_task.

StatsConfig::origin is now Option<OriginProducer>: with no origin the
aggregator is a no-op (bumps dropped, nothing published, no task), which
replaces Stats::disabled() / StatsHandle::disabled(). Opt out via
Stats::default() / StatsHandle::default(). StatsConfig::new() no longer
takes an origin; set it with with_origin().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@kixelated kixelated enabled auto-merge (squash) May 29, 2026 22:05
@kixelated kixelated merged commit 4d46f73 into main May 29, 2026
1 check passed
@kixelated kixelated deleted the claude/loving-joliot-4592be branch May 29, 2026 22:17
This was referenced May 29, 2026
kixelated added a commit that referenced this pull request May 30, 2026
Reconcile main into dev. Key conflict resolutions:

- conducer crate renamed to kio (main #1547): applied across all of dev's
  newer code; dropped the stale conducer path-dep, kept dev's new flate2 dep.
- moq-mux: kept dev's thiserror Result (#1495); dropped main's CatalogSource
  as dead code since dev's catalog::Consumer already unifies Hang/MSF.
- moq-net: kept dev's OriginConsumer/AnnounceConsumer split (#1434) and the
  TrackConsumer end_at cap; kept dev's non-optional auto-created origins on the
  lite session/publisher (#e770).
- stats: combined main's StatsConfig + liveness retention (#1537, #1548) with
  dev's AnnounceConsumer usage.
- libmoq + moq-native: kept main's auto-reconnect (#1544), terminal-callback
  contract (#1546), and consume_announced (#1552), adapted to dev's
  AnnounceConsumer and OriginProducer connect API. Restored the InitFailed
  error variant and made moq-rtc handle the now-fallible Log::init.

cargo check/clippy/test all pass on the merged workspace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant