Skip to content

libmoq: terminal-callback lifetime contract for C consumers#1546

Merged
kixelated merged 3 commits into
mainfrom
claude/libmoq-callback-lifetime
May 29, 2026
Merged

libmoq: terminal-callback lifetime contract for C consumers#1546
kixelated merged 3 commits into
mainfrom
claude/libmoq-callback-lifetime

Conversation

@kixelated
Copy link
Copy Markdown
Collaborator

@kixelated kixelated commented May 29, 2026

Summary

Fixes a use-after-free race in libmoq's C callback API and replaces it with an explicit, uniform callback-lifetime contract.

The race. Every consume/announce run-loop copied the callback ({user_data, fn ptr}, a Copy type) out from under the global State lock, dropped the lock, then invoked it. A concurrent *_close could .take() the slab entry and return in that window, letting the caller free user_data before the in-flight callback fired and dereferenced it. The lock is dropped on purpose (calling a C callback while holding the global mutex risks re-entrant deadlock), so the previous "check-then-call" guard couldn't close the gap.

The fix. *_close now only signals shutdown (drops the oneshot sender) without removing the entry or revoking the callback. The background task always delivers exactly one final callback as its last act, then removes its own slab entry, so user_data stays valid until that callback. The status code carries the lifecycle uniformly across every callback (on_status, on_announce, on_catalog, on_frame):

  • > 0 — a live result id (frame / catalog / announce; for sessions, the connection epoch)
  • 0 — closed cleanly (terminal)
  • < 0 — error (terminal)

After any <= 0 code, the callback is never invoked again and user_data is never touched again, so callers release user_data in that final callback. This is the read()/recv() EOF convention (0 = orderly close, negative = error) and the libuv uv_close(handle, cb) teardown model. It removes the need for the external weak-ref/refcount workaround C++/Qt bindings currently wrap around user_data.

The terminal callback fires on libmoq's internal thread, so thread-affine bindings (e.g. a Qt QObject) still hop to the owning thread to destroy the object; the user_data lifetime contract holds regardless.

Notable changes

  • *_close is signal-only and idempotent-erroring: returns 0 once, negative if already closing/closed. It no longer frees user_data.
  • Run-loops select on the close receiver per iteration with biased;, so a pending close is honored between every item. (The earlier outer select! polled the whole run future as one branch, so a continuously-ready stream could starve the close signal and a hot loop couldn't be interrupted.)
  • Breaking: moq_session_connect's on_status reports a positive connection epoch on every (re)connect (1 = first connect, >1 = reconnect), 0 on a clean close, and a negative code if reconnection permanently gives up. Previously 0 meant "connected". This integrates with libmoq: auto-reconnect sessions; conducer-based Reconnect notifications #1544 (auto-reconnect), merged into this branch: report() forwards epochs from moq_native::Reconnect, and the contract supplies the terminal 0/negative.
  • Callbacks are invoked outside the lock at the terminal site too (the old if let Some(e) = State::lock()...{ e.call() } held the guard across the call — a latent re-entrant-deadlock hazard).
  • Removes the now-unused Error::Closed (-1).
  • Bumps libmoq 0.2.17 -> 0.3.0 for the new public contract.

Per the Cross-Package Sync table this is C-API-only: the uniffi wrappers (py/swift/kt/go) manage lifetime through Rust ownership and are unaffected. Doc comments (which generate moq.h) and doc/lib/c are updated.

Test plan

  • cargo build -p libmoq (regenerates moq.h with the new contract docs)
  • cargo clippy -p libmoq --all-targets -- -D warnings
  • cargo test -p libmoq — 19/19 pass. Tests assert each consumer/session delivers a terminal <= 0 callback after close, and that user_data is drained before teardown.
  • Merged latest main (incl. libmoq: auto-reconnect sessions; conducer-based Reconnect notifications #1544 auto-reconnect); rebuilt/clippy/tested green against the refactored moq-net/moq-mux.
  • Confirm with a real C/Qt consumer that freeing user_data in the terminal callback resolves the UAF without the weak-ref shim.

🤖 Generated with Claude Code

(Written by Claude)

Fixes a use-after-free race in the C callback API. Every consume/announce
run-loop copied the callback out from under the global lock, dropped the
lock, then invoked it. A concurrent `*_close` could take the slab entry and
return, letting the caller free `user_data`, before the in-flight callback
fired and dereferenced it.

`*_close` now only signals shutdown (drops the oneshot sender) without
removing the entry or revoking the callback. The background task always
delivers exactly one final callback as its last act, then removes its own
entry, so `user_data` stays valid until that callback. The status code
carries the lifecycle uniformly across every callback:

- `> 0`  live result id (frame/catalog/announce; `1` = session connected)
- `0`    closed cleanly (terminal)
- `< 0`  error (terminal)

After any `<= 0` code the callback never fires again and `user_data` is
never touched again, so callers release `user_data` in that final callback
(libuv `uv_close` style). This removes the need for an external
weak-ref/refcount around `user_data` in C++/Qt bindings.

Session connect now reports `1` on connect (previously `0`); `0` is reserved
for a clean close. Removes the now-unused `Error::Closed` (-1).

Bumps libmoq 0.2.17 -> 0.3.0 for the new public contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 59132d27-4c26-4145-bbed-b0a969755ed4

📥 Commits

Reviewing files that changed from the base of the PR and between 67e4434 and 836c6ff.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • rs/libmoq/src/api.rs
  • rs/libmoq/src/session.rs
✅ Files skipped from review due to trivial changes (1)
  • rs/libmoq/src/api.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • rs/libmoq/src/session.rs

Walkthrough

This PR refactors callback lifetime semantics across the MoQ library by unifying how task shutdown and callback delivery are handled. The core change is switching from revoking callbacks by removing task entries to signaling shutdown via optional oneshot senders, allowing final terminal callbacks to fire after task completion. The Error::Closed variant is removed since the new pattern no longer needs it. All FFI documentation is updated to clarify that close functions return immediately without freeing user_data, and that user_data should be released during the terminal callback delivery. Tests are updated to explicitly verify terminal callback behavior across all consumer and session types.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: introducing a terminal-callback lifetime contract for C consumers in libmoq.
Description check ✅ Passed The description thoroughly explains the use-after-free race being fixed and documents the new callback-lifetime contract with specific examples and breaking changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/libmoq-callback-lifetime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kixelated
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Comment thread rs/libmoq/src/audio.rs Outdated
let res = tokio::select! {
res = Self::run(id, consumer) => res,
res = Self::run(on_frame, consumer) => res,
_ = channel.1 => Ok(()),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be biased; so we won't busy loop on delivering new frames.

It might be better to move the channel inside the run function too, so it can cancel mid loop in case callback takes non-zero time to execute.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on both, fixed in 67e4434.

You're right that the outer select! { run(..), _ = close } couldn't interrupt a hot loop: run is polled as a single branch, so if the stream keeps yielding frames its internal while loop never returns Pending, the select never gets to poll the close receiver, and close is ignored until the stream stalls. A slow callback made it worse.

I moved the oneshot::Receiver into each run loop and select per-iteration with biased;:

let frame = tokio::select! {
    biased;
    _ = &mut close => return Ok(()),
    frame = consumer.read() => match frame? {
        Some(frame) => frame,
        None => return Ok(()),
    },
};

biased makes the close branch win deterministically over a ready frame, so once close is requested we stop before delivering another frame (rather than relying on random branch order and dribbling out extra frames). This now applies to the catalog / video / audio / raw-track / announce loops, and the raw-track inner per-frame loop too.

I left moq_session_connect on the outer select: connect_run has no per-item loop, only the connect() and closed() await points, which the outer select already cancels at, so there's no starvation there. Happy to make it consistent if you'd prefer.

(Written by Claude)

kixelated and others added 2 commits May 29, 2026 13:04
The previous outer `select! { run(..), close }` polled the whole run future
as one branch, so a continuously-ready stream let its internal loop spin
forever without yielding, starving the close signal: close could not
interrupt a hot frame loop, and a slow callback delayed it further.

Move the close receiver into each run loop and select on it per iteration
with `biased;`, so a pending close is checked (and wins) before reading or
delivering the next item. Applies to catalog/video/audio/raw-track/announce
loops. Session connect is unchanged: it has no per-item loop, just connect
and closed await points the outer select already cancels at.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Integrates #1544 (auto-reconnect sessions) with the terminal-callback
lifetime contract. session.rs now combines both:

- connect_run/report drive moq_native::Reconnect and report a positive
  connection epoch (1 = first connect, >1 = reconnect) per (re)connect.
- The contract's terminal rules apply: close() signals shutdown (Option
  sender), the task delivers exactly one final callback (0 = clean close,
  < 0 = reconnect gave up), then removes its own entry. user_data stays
  valid until that terminal callback.
- report() takes the callback by value and calls it without holding the
  lock, replacing the task_id slab-lookup + notify() revoke dance (close
  no longer revokes, so the lookup was redundant).

The moq_session_connect docs merge the reconnect/epoch wording with the
terminal-0-on-close contract.
@kixelated kixelated enabled auto-merge (squash) May 29, 2026 21:46
@kixelated kixelated merged commit 72c8aeb into main May 29, 2026
1 check passed
@kixelated kixelated deleted the claude/libmoq-callback-lifetime branch May 29, 2026 22:00
This was referenced May 29, 2026
kixelated added a commit that referenced this pull request May 30, 2026
Reconcile main into dev. Key conflict resolutions:

- conducer crate renamed to kio (main #1547): applied across all of dev's
  newer code; dropped the stale conducer path-dep, kept dev's new flate2 dep.
- moq-mux: kept dev's thiserror Result (#1495); dropped main's CatalogSource
  as dead code since dev's catalog::Consumer already unifies Hang/MSF.
- moq-net: kept dev's OriginConsumer/AnnounceConsumer split (#1434) and the
  TrackConsumer end_at cap; kept dev's non-optional auto-created origins on the
  lite session/publisher (#e770).
- stats: combined main's StatsConfig + liveness retention (#1537, #1548) with
  dev's AnnounceConsumer usage.
- libmoq + moq-native: kept main's auto-reconnect (#1544), terminal-callback
  contract (#1546), and consume_announced (#1552), adapted to dev's
  AnnounceConsumer and OriginProducer connect API. Restored the InitFailed
  error variant and made moq-rtc handle the now-fallible Log::init.

cargo check/clippy/test all pass on the merged workspace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant