libmoq: terminal-callback lifetime contract for C consumers#1546
Conversation
Fixes a use-after-free race in the C callback API. Every consume/announce run-loop copied the callback out from under the global lock, dropped the lock, then invoked it. A concurrent `*_close` could take the slab entry and return, letting the caller free `user_data`, before the in-flight callback fired and dereferenced it. `*_close` now only signals shutdown (drops the oneshot sender) without removing the entry or revoking the callback. The background task always delivers exactly one final callback as its last act, then removes its own entry, so `user_data` stays valid until that callback. The status code carries the lifecycle uniformly across every callback: - `> 0` live result id (frame/catalog/announce; `1` = session connected) - `0` closed cleanly (terminal) - `< 0` error (terminal) After any `<= 0` code the callback never fires again and `user_data` is never touched again, so callers release `user_data` in that final callback (libuv `uv_close` style). This removes the need for an external weak-ref/refcount around `user_data` in C++/Qt bindings. Session connect now reports `1` on connect (previously `0`); `0` is reserved for a clean close. Removes the now-unused `Error::Closed` (-1). Bumps libmoq 0.2.17 -> 0.3.0 for the new public contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR refactors callback lifetime semantics across the MoQ library by unifying how task shutdown and callback delivery are handled. The core change is switching from revoking callbacks by removing task entries to signaling shutdown via optional oneshot senders, allowing final terminal callbacks to fire after task completion. The Error::Closed variant is removed since the new pattern no longer needs it. All FFI documentation is updated to clarify that close functions return immediately without freeing user_data, and that user_data should be released during the terminal callback delivery. Tests are updated to explicitly verify terminal callback behavior across all consumer and session types. 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
| let res = tokio::select! { | ||
| res = Self::run(id, consumer) => res, | ||
| res = Self::run(on_frame, consumer) => res, | ||
| _ = channel.1 => Ok(()), |
There was a problem hiding this comment.
should be biased; so we won't busy loop on delivering new frames.
It might be better to move the channel inside the run function too, so it can cancel mid loop in case callback takes non-zero time to execute.
There was a problem hiding this comment.
Good catch on both, fixed in 67e4434.
You're right that the outer select! { run(..), _ = close } couldn't interrupt a hot loop: run is polled as a single branch, so if the stream keeps yielding frames its internal while loop never returns Pending, the select never gets to poll the close receiver, and close is ignored until the stream stalls. A slow callback made it worse.
I moved the oneshot::Receiver into each run loop and select per-iteration with biased;:
let frame = tokio::select! {
biased;
_ = &mut close => return Ok(()),
frame = consumer.read() => match frame? {
Some(frame) => frame,
None => return Ok(()),
},
};biased makes the close branch win deterministically over a ready frame, so once close is requested we stop before delivering another frame (rather than relying on random branch order and dribbling out extra frames). This now applies to the catalog / video / audio / raw-track / announce loops, and the raw-track inner per-frame loop too.
I left moq_session_connect on the outer select: connect_run has no per-item loop, only the connect() and closed() await points, which the outer select already cancels at, so there's no starvation there. Happy to make it consistent if you'd prefer.
(Written by Claude)
The previous outer `select! { run(..), close }` polled the whole run future
as one branch, so a continuously-ready stream let its internal loop spin
forever without yielding, starving the close signal: close could not
interrupt a hot frame loop, and a slow callback delayed it further.
Move the close receiver into each run loop and select on it per iteration
with `biased;`, so a pending close is checked (and wins) before reading or
delivering the next item. Applies to catalog/video/audio/raw-track/announce
loops. Session connect is unchanged: it has no per-item loop, just connect
and closed await points the outer select already cancels at.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Integrates #1544 (auto-reconnect sessions) with the terminal-callback lifetime contract. session.rs now combines both: - connect_run/report drive moq_native::Reconnect and report a positive connection epoch (1 = first connect, >1 = reconnect) per (re)connect. - The contract's terminal rules apply: close() signals shutdown (Option sender), the task delivers exactly one final callback (0 = clean close, < 0 = reconnect gave up), then removes its own entry. user_data stays valid until that terminal callback. - report() takes the callback by value and calls it without holding the lock, replacing the task_id slab-lookup + notify() revoke dance (close no longer revokes, so the lookup was redundant). The moq_session_connect docs merge the reconnect/epoch wording with the terminal-0-on-close contract.
Reconcile main into dev. Key conflict resolutions: - conducer crate renamed to kio (main #1547): applied across all of dev's newer code; dropped the stale conducer path-dep, kept dev's new flate2 dep. - moq-mux: kept dev's thiserror Result (#1495); dropped main's CatalogSource as dead code since dev's catalog::Consumer already unifies Hang/MSF. - moq-net: kept dev's OriginConsumer/AnnounceConsumer split (#1434) and the TrackConsumer end_at cap; kept dev's non-optional auto-created origins on the lite session/publisher (#e770). - stats: combined main's StatsConfig + liveness retention (#1537, #1548) with dev's AnnounceConsumer usage. - libmoq + moq-native: kept main's auto-reconnect (#1544), terminal-callback contract (#1546), and consume_announced (#1552), adapted to dev's AnnounceConsumer and OriginProducer connect API. Restored the InitFailed error variant and made moq-rtc handle the now-fallible Log::init. cargo check/clippy/test all pass on the merged workspace.
Summary
Fixes a use-after-free race in libmoq's C callback API and replaces it with an explicit, uniform callback-lifetime contract.
The race. Every consume/announce run-loop copied the callback (
{user_data, fn ptr}, aCopytype) out from under the globalStatelock, dropped the lock, then invoked it. A concurrent*_closecould.take()the slab entry and return in that window, letting the caller freeuser_databefore the in-flight callback fired and dereferenced it. The lock is dropped on purpose (calling a C callback while holding the global mutex risks re-entrant deadlock), so the previous "check-then-call" guard couldn't close the gap.The fix.
*_closenow only signals shutdown (drops the oneshot sender) without removing the entry or revoking the callback. The background task always delivers exactly one final callback as its last act, then removes its own slab entry, souser_datastays valid until that callback. The status code carries the lifecycle uniformly across every callback (on_status,on_announce,on_catalog,on_frame):> 0— a live result id (frame / catalog / announce; for sessions, the connection epoch)0— closed cleanly (terminal)< 0— error (terminal)After any
<= 0code, the callback is never invoked again anduser_datais never touched again, so callers releaseuser_datain that final callback. This is theread()/recv()EOF convention (0= orderly close, negative = error) and the libuvuv_close(handle, cb)teardown model. It removes the need for the external weak-ref/refcount workaround C++/Qt bindings currently wrap arounduser_data.The terminal callback fires on libmoq's internal thread, so thread-affine bindings (e.g. a Qt
QObject) still hop to the owning thread to destroy the object; theuser_datalifetime contract holds regardless.Notable changes
*_closeis signal-only and idempotent-erroring: returns0once, negative if already closing/closed. It no longer freesuser_data.biased;, so a pending close is honored between every item. (The earlier outerselect!polled the whole run future as one branch, so a continuously-ready stream could starve the close signal and a hot loop couldn't be interrupted.)moq_session_connect'son_statusreports a positive connection epoch on every (re)connect (1= first connect,>1= reconnect),0on a clean close, and a negative code if reconnection permanently gives up. Previously0meant "connected". This integrates with libmoq: auto-reconnect sessions; conducer-based Reconnect notifications #1544 (auto-reconnect), merged into this branch:report()forwards epochs frommoq_native::Reconnect, and the contract supplies the terminal0/negative.if let Some(e) = State::lock()...{ e.call() }held the guard across the call — a latent re-entrant-deadlock hazard).Error::Closed(-1).libmoq0.2.17 -> 0.3.0 for the new public contract.Per the Cross-Package Sync table this is C-API-only: the uniffi wrappers (py/swift/kt/go) manage lifetime through Rust ownership and are unaffected. Doc comments (which generate
moq.h) anddoc/lib/care updated.Test plan
cargo build -p libmoq(regeneratesmoq.hwith the new contract docs)cargo clippy -p libmoq --all-targets -- -D warningscargo test -p libmoq— 19/19 pass. Tests assert each consumer/session delivers a terminal<= 0callback after close, and thatuser_datais drained before teardown.main(incl. libmoq: auto-reconnect sessions; conducer-based Reconnect notifications #1544 auto-reconnect); rebuilt/clippy/tested green against the refactored moq-net/moq-mux.user_datain the terminal callback resolves the UAF without the weak-ref shim.🤖 Generated with Claude Code
(Written by Claude)