Fix room composite duration to include post-participant silence tail#1169
Conversation
When all participants leave a room composite recording, the server disconnects the egress ~15-20s later. During that window the gst mixer produces silence that gets written to the file, but `ended_at` only reflected the last RTP packet — missing the silence tail. Capturing the pipeline running time at `MessageEOS` (when all content is flushed to sinks) and using it as a floor for `ended_at` in `updateEndTime()`, so reported duration matches actual file content.
|
|
||
| // The reported duration should include the silence tail. At minimum it | ||
| // should be longer than the active recording period (10s) plus most of | ||
| // the silence gap. We allow 5s of slack for pipeline startup/teardown. |
There was a problem hiding this comment.
probably worth adding a comment that default room departure_timeout is 20s and this check is accounting for that. Ideally, can create a room with room config with a set departure timeout so that this check can be more precise, but adding a comment about default is fine.
There was a problem hiding this comment.
👍 interesting - didn't know we can set it on room creation - we could experiment with lowering it - but only to a point timing variation doesn't cause false negatives, probably not huge trimming is practical. Will update the comment with that.
There was a problem hiding this comment.
Yeah, it is here - https://github.com/livekit/protocol/blob/f3f2ad607084f6940a2ea7edea096ec1481f5201/protobufs/livekit_room.proto#L89, but yeah probably not worth the effort. Only issue is default changing and this starting to fail because of it.
There was a problem hiding this comment.
we could explicitly set that default value - will add that.
## Changelog ### Added - V2 egress API — unified source and output config (StartEgressRequest, MediaSource, Output, StorageConfig), per-participant audio channel routing (AudioRoute), and request-level storage overrides (#1155) - Support for egress auto retry (#1138) - Faster than realtime recording (non-live) (#1192) - Cgroup-aware memory monitoring for admission control and OOM kill (#1118) - Support for mulaw and alaw input codecs (#1105) - Add handler outcome Prometheus metric for SLO-based kill rate alerting (#1230) - Add livekit_load_ratio metric for composite load-based autoscaling (#1234) - Better multi-publisher support (#1214) - Read K8s CPU requests to automatically set GOMAXPROCS for cgroup-aware scheduling (#1204) - Backup storage observability — including storage event IPC for tracking uploads across primary/backup stores (#1120, #1184) - Enable one-shot sender report sync mode for room composite behind config (#1158) - Export InitLogger with configurable service name (#1218) - Allow specifying affinity timeout (#1104) - Instrument all leaky queues (#1116) - Instrument data loss on video leaky queues (#1109) - Add a 1G memory buffer for accepting requests (#1088) - Log rss periodically (#1151) - Log pipeline time to playing (#1103) - Log file upload stats (size, duration) (#1096) - Chrome 146 (#1173) ### Fixed - Fix s3-compat multi-part uploads (#1228) - Fix deadlock when AbortProcess/KillProcess call kill() under pm.mu lock (#1226) - Use statistical cadence check to tolerate WebRTC NetEQ time-stretching (#1223) - Fix MP3 duration metadata and CBR encoding (#1187) - Fix room composite duration to include post-participant silence tail (#1169) - Fix awaitMediaTracks race (#1165) - Fix for x264 encoder errors causing egress failures at the end of the recording (#1095) - Fix for data race inside monitor (#1091) - Fixing one of the causes of pipeline frozen errors (#1129) - Potential fix for rare issue causing tracks not to be recorded (#1130) - Handle EOS when removing source bin (#1093) - Drain appwriter (#1090) - Drain audio tracks before removing their appsource (#1085) - Heal track which enters into continuous flow flushing loop on pushing packets (#1142) - One pacer per audio track (#1235) - Set time provider only after pipeline reaches playing state (#1212) - AbortProcess is not safe to be executed by multiple goroutines (#1208) - Fixing unprotected state write from timer goroutine (#1206) - Disable PTS adjustments on sender reports for track egresses (#1134) - Disconnect from room on failing to await for some track (#1132) - Retry chrome egress navigation on chrome cert verifier change (#1194) - Return a 500 if handler fails to start (#1137) - Make sure all data is read from LK server (#1111) - Always check video dimensions (#1119) - Fail web egress on HTTP 4xx/5xx page load errors (#1106) - Avoid panic if we fail to parse a gst pipeline error (#1086) - Address unsafe int casting (#1126) - Get original room name from Start request (#1189) - Suppress noisy colorimetry warnings (#1163) - Don't log error on manifest upload failure if backup storage wasn't used (#1152) - Don't count room composite SDK source against Pulse limits (#1114) - Use a 10 min deadline for the RPC watchdog (#1207) - Remove enable_room_composite_sdk_source and always enable sdk source when conditions allow it (#1122) - Gstreamer logs based on configured level (#1161) - Reintroduce sdk logs filtering (#1213) - Using variant of OnDisconnected callback which passes a reason (#1221) - AV sync content verification in integration tests (#1215) - Fixing arm64 chrome installer (#1175) - Use separate token with delete permission (#1172) - Update module go.opentelemetry.io/otel to v1.41.0 [SECURITY] (#1201) - Update module github.com/aws/aws-sdk-go-v2/service/s3 to v1.97.3 [SECURITY] (#1178) - Update module github.com/go-jose/go-jose/v4 to v4.1.4 [SECURITY] (#1170) - Update module google.golang.org/grpc to v1.79.3 [SECURITY] (#1153) - Rename IOClient to SessionReporter (#1097) - Move ProcessManager to interface and create a fake implementation (#1147) - Enable staticcheck (#1094)
When all participants leave a room, the server disconnects the egress ~15-20s later. During that window the gst mixer produces silence that gets written to the file, but
ended_atonly reflected the last RTP packet — missing the silence tail. Capturing the pipeline running time atMessageEOS(when all content is flushed to sinks) and using it as a floor forended_atinupdateEndTime(), so reported duration matches actual file content.Discovered accidentally during testing.