fix(moq-gst): stop moqsrc panicking on backwards timestamps; globally unique pad ids#1646
Conversation
… unique pad ids Two follow-ups from the #1633 review: - build_buffer pinned its PTS reference to the first decoded frame and computed `frame.timestamp - reference`. Frames arrive in decode order, so a B-frame's presentation timestamp can precede the reference; `Timestamp`'s `Sub` panics on underflow (`checked_sub(..).expect`), which crashed the pump task and leaked its pad, killing the rendition. Open-GOP / leading-B content triggers this. Use `checked_sub` and clamp to zero instead. - The pad-id counter was per-session (reset to 0 each run_session) despite the "process-unique" doc. A quick Paused->Ready->Paused restart could mint `video_0` while the previous session's cancelled pump was still removing its own `video_0`. Promote it to a process-wide `AtomicU64`. Verified with a publisher that sends a keyframe at ts=2000ms then frames at ts=1000ms+: unfixed build panics after 1 buffer; fixed build streams all 201 and EOSes cleanly. Real H264 decode (bbb.hang) still works. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
WalkthroughThis PR refactors pad ID allocation to use a process-wide AtomicU64 (removing the per-session next_pad_id), updates reconcile call sites and allocation to use NEXT_PAD_ID.fetch_add(Ordering::Relaxed), introduces relative_pts(timestamp, reference) using checked_sub and clamping to gst::ClockTime::ZERO, updates build_buffer to call relative_pts, adds unit tests for relative_pts, and tweaks rs/moq-gst Cargo.toml to set doctest = false and remove test = false. 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
rs/moq-gst/src/source/imp.rs (1)
533-539: ⚡ Quick winAdd an in-file regression test for the backward-PTS clamp path.
This fix looks correct, but a small unit test here would lock in the no-panic underflow behavior.
As per coding guidelines, "Add tests where they're easy to write; Rust tests are integrated within source files".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rs/moq-gst/src/source/imp.rs` around lines 533 - 539, Add a small in-file unit test that exercises the backward-PTS clamp path so the checked_sub underflow returns gst::ClockTime::ZERO rather than panicking: create a test in imp.rs (#[cfg(test)] mod tests) that constructs a frame with a timestamp smaller than the reference used in the match (exercise the code path using frame.timestamp.checked_sub(reference) in the surrounding function), call the function or the minimal public/private helper that performs the subtraction, and assert the result equals gst::ClockTime::ZERO and that no panic occurs; keep the test self-contained using dummy/frame builders present in the file or constructing minimal types needed.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@rs/moq-gst/src/source/imp.rs`:
- Around line 533-539: Add a small in-file unit test that exercises the
backward-PTS clamp path so the checked_sub underflow returns
gst::ClockTime::ZERO rather than panicking: create a test in imp.rs
(#[cfg(test)] mod tests) that constructs a frame with a timestamp smaller than
the reference used in the match (exercise the code path using
frame.timestamp.checked_sub(reference) in the surrounding function), call the
function or the minimal public/private helper that performs the subtraction, and
assert the result equals gst::ClockTime::ZERO and that no panic occurs; keep the
test self-contained using dummy/frame builders present in the file or
constructing minimal types needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e4aaeeb6-262c-41f7-a1e9-2da0ae2568b9
📒 Files selected for processing (1)
rs/moq-gst/src/source/imp.rs
Extract the relative-PTS math into `relative_pts` and add an in-file unit test asserting a timestamp before the reference clamps to zero (the B-frame / decode-order case) instead of panicking, plus that a forward delta is preserved. The lib had `test = false`, which kept `#[cfg(test)]` code from compiling at all; drop it so the test runs (via `cargo test -p moq-gst` -- moq-gst is excluded from default-members, so the normal `cargo test` run is unaffected). Revert that one line if you'd rather keep the crate test-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the nitpick in 720c83d: extracted |
Follow-up to #1633 (review findings #1 and #2).
1. PTS-underflow panic (the important one) 🟠
build_bufferpins its PTS reference to the first decoded frame and computesframe.timestamp - reference. Frames arrive in decode order, so a B-frame's presentation timestamp can fall before the reference.Timestamp'sSubischecked_sub(..).expect("time overflow")— it panics on underflow (moq-net/src/model/time.rs:281). That crashes the pump task and, because the panic happens before the pad-removal cleanup, leaks the pad and kills the rendition. Open-GOP / leading-B-frame content triggers it.Fix:
checked_suband clamp toClockTime::ZEROinstead of crashing.2. Pad-id counter was per-session 🟡
next_pad_idreset to0on everyrun_session, despite the doc claiming a "process-unique counter". A rapidPaused -> Ready -> Pausedrestart could mintvideo_0while the previous session's cancelled pump was still removing itsvideo_0(two same-named pads on the element). Promote it to a process-wideAtomicU64so it actually matches the doc and the collision window is gone.Test
Added a throwaway publisher that sends a keyframe at
ts=2000msthen frames atts=1000ms+(the backwards-timestamp case):panicked at time.rs:281: time overflowafter 1 buffer; pad leakedReal H264 decode (
cdn.moq.dev/demobbb.hang) still decodes (20/20 frames).clippy+fmt --checkclean.Deliberately not included (noted in the #1633 review as lower priority): blocking
pad.push()on tokio workers (situational), redundant CMAF init parsing per catalog update (minor perf), and reconcile unit tests.(Written by Claude)
🤖 Generated with Claude Code