fix(mp4_recording_core): treat writer==NULL+running==1 as initializing to prevent duplicate recordings#354
Conversation
…g to prevent duplicate recordings The already-running guard in all four start_mp4_recording* variants had only two states: healthy (writer!=NULL && recording) or dead (everything else). This missed the initialization window between pthread_create() returning and ctx->mp4_writer being assigned inside mp4_recording_thread() after RTSP connect + avformat_find_stream_info(), which can take several seconds. During that window every caller saw writer==NULL, concluded the recording was dead, cleared the slot, and spawned a new thread - producing multiple parallel recordings with overlapping timestamps in the database. Fix: introduce a third state using ctx->running as discriminator: 1. writer!=NULL && mp4_writer_is_recording() -> healthy, return early 2. writer==NULL && ctx->running==1 -> initializing, return early 3. everything else -> dead, clean up and restart ctx->running is set to 1 by the caller before pthread_create() and only cleared by the thread on exit or by a stop path, making it an unambiguous sentinel for the initialization window. Fix applied identically to all four entry points: - start_mp4_recording() - start_mp4_recording_with_url() - start_mp4_recording_with_trigger() - start_mp4_recording_with_url_and_trigger() Tested on Rock Pi 4C+ with TP-Link C545D camera. Rapid motion events previously produced 2-3 concurrent recording entries per stream in the database. With this fix only one recording instance per stream is active.
|
✅ All contributors have signed the CLA. Thank you! |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
This PR addresses a race condition in mp4_recording_core.c where rapid successive start requests could create overlapping MP4 recording threads for the same stream (duplicate DB entries and competing writers). It does so by expanding the “already running” guard to recognize an additional “initializing” state using ctx->running.
Changes:
- Update all
start_mp4_recording*entry points to treatwriter == NULL && ctx->running == 1as “initializing” and skip duplicate starts. - Add explanatory comments (including a preserved “OLD CODE” block) describing the race and the three-state guard.
- Change MP4 directory path construction to use
stream_namedirectly (removing priorsanitize_stream_name()usage and dropping thecore/path_utils.hinclude).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -990,13 +1036,6 @@ int start_mp4_recording_with_trigger(const char *stream_name, const char *trigge | |||
| } | |||
| pthread_mutex_unlock(&recording_contexts_mutex); | |||
|
|
|||
There was a problem hiding this comment.
In start_mp4_recording_with_trigger(), when an existing context is deemed dead, it’s extracted into dead_ctx and the slot is cleared, but dead_ctx is never joined/detached/freed (cleanup_dead_recording() is not called). This can leak the ctx and leave the previous outer recording thread running concurrently with the new one, reintroducing overlapping recordings. Call cleanup_dead_recording(dead_ctx, stream_name) after releasing recording_contexts_mutex (as done in the other start_mp4_recording* variants) before proceeding to allocate/start a new context.
| if (dead_ctx) { | |
| cleanup_dead_recording(dead_ctx, stream_name); | |
| } |
| snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/%s", | ||
| global_config->mp4_storage_path, encoded_name); | ||
| global_config->mp4_storage_path, stream_name); | ||
| } else { | ||
| // Use mp4 directory parallel to hls, NOT inside it | ||
| snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s", |
There was a problem hiding this comment.
MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths, and use the sanitized value for both the direct and non-direct mp4_dir branches.
| global_config->mp4_storage_path, stream_name); | ||
| } else { | ||
| // Use mp4 directory parallel to hls, NOT inside it | ||
| snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s", | ||
| global_config->storage_path, encoded_name); | ||
| global_config->storage_path, stream_name); |
There was a problem hiding this comment.
MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths, and use the sanitized value for both the direct and non-direct mp4_dir branches.
| snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/%s", | ||
| global_config->mp4_storage_path, encoded_name); | ||
| global_config->mp4_storage_path, stream_name); | ||
| } else { | ||
| snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s", | ||
| global_config->storage_path, encoded_name); | ||
| global_config->storage_path, stream_name); |
There was a problem hiding this comment.
MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths.
| snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/%s", | ||
| global_config->mp4_storage_path, encoded_name); | ||
| global_config->mp4_storage_path, stream_name); | ||
| } else { | ||
| snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s", | ||
| global_config->storage_path, encoded_name); | ||
| global_config->storage_path, stream_name); |
There was a problem hiding this comment.
MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths.
| // | ||
| // OLD CODE (kept for reference): | ||
| // | ||
| // mp4_recording_ctx_t *dead_ctx = NULL; | ||
| // pthread_mutex_lock(&recording_contexts_mutex); | ||
| // for (int i = 0; i < g_config.max_streams; i++) { | ||
| // if (recording_contexts[i] && strcmp(recording_contexts[i]->config.name, stream_name) == 0) { | ||
| // mp4_writer_t *writer = recording_contexts[i]->mp4_writer; | ||
| // if (writer && mp4_writer_is_recording(writer)) { | ||
| // pthread_mutex_unlock(&recording_contexts_mutex); | ||
| // return 0; // Already running and healthy | ||
| // } | ||
| // // <-- bug: writer==NULL was treated as dead and the slot was cleared, | ||
| // // allowing a second recording instance to start for the same stream. | ||
| // // writer is set asynchronously inside mp4_recording_thread (after | ||
| // // RTSP connect + avformat_find_stream_info, which can take several | ||
| // // seconds). During that window every call here saw writer==NULL, |
There was a problem hiding this comment.
This large commented-out OLD CODE block makes the function harder to read/maintain and duplicates what is already preserved in git history. Consider removing the block and keeping only a brief comment describing the race and the three-state guard, or referencing the commit/PR that fixed it.
| // | |
| // OLD CODE (kept for reference): | |
| // | |
| // mp4_recording_ctx_t *dead_ctx = NULL; | |
| // pthread_mutex_lock(&recording_contexts_mutex); | |
| // for (int i = 0; i < g_config.max_streams; i++) { | |
| // if (recording_contexts[i] && strcmp(recording_contexts[i]->config.name, stream_name) == 0) { | |
| // mp4_writer_t *writer = recording_contexts[i]->mp4_writer; | |
| // if (writer && mp4_writer_is_recording(writer)) { | |
| // pthread_mutex_unlock(&recording_contexts_mutex); | |
| // return 0; // Already running and healthy | |
| // } | |
| // // <-- bug: writer==NULL was treated as dead and the slot was cleared, | |
| // // allowing a second recording instance to start for the same stream. | |
| // // writer is set asynchronously inside mp4_recording_thread (after | |
| // // RTSP connect + avformat_find_stream_info, which can take several | |
| // // seconds). During that window every call here saw writer==NULL, | |
| // Note: mp4_writer is initialized asynchronously inside | |
| // mp4_recording_thread(). A NULL writer can therefore mean "still starting" | |
| // rather than "dead". The logic below must preserve the three-state guard | |
| // and avoid clearing the slot during that startup window. |
| mp4_writer_t *writer = recording_contexts[i]->mp4_writer; | ||
|
|
||
| if (writer && mp4_writer_is_recording(writer)) { | ||
| // State 1: fully up and healthy | ||
| pthread_mutex_unlock(&recording_contexts_mutex); |
There was a problem hiding this comment.
The new guard treats only (writer==NULL && ctx->running==1) as the initialization window. However, mp4_recording_thread() sets ctx->mp4_writer before it starts the inner RTSP writer thread (and can spend seconds resolving the URL / retrying go2rtc before calling mp4_writer_start_recording_thread). During that period writer!=NULL but mp4_writer_is_recording(writer)==0, so this code will classify the context as “dead” and try to restart, potentially reintroducing overlapping recordings. Consider treating any ctx->running==1 as “already in progress” (or deferring assignment of ctx->mp4_writer until after the inner thread is started) so that a start request can’t race the remainder of initialization.
|
I am going to merge this one and clean it up on main so it can be included in release. |
Restore sanitize_stream_name() in all four start functions to prevent path traversal via crafted stream names. Restore cleanup_dead_recording() call in start_mp4_recording_with_trigger() to avoid leaking dead contexts. Remove large commented "OLD CODE" block and restore path_utils.h include. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
It has been done, I fetched your brnach and cherry picked so I could get into next release. |
Problem
Under certain conditions, the same stream accumulates two or more parallel,
overlapping MP4 recording threads — resulting in duplicate database entries
and competing writers for the same file path.
The bug is in the already-running guard in all four
start_mp4_recording*variants in
src/video/mp4_recording_core.c. The original guard had onlytwo states: healthy or dead. It missed a third state: initializing.
ctx->mp4_writeris assigned insidemp4_recording_thread(), after asuccessful RTSP connect and
avformat_find_stream_info()— a process thatcan take several seconds. During that window the pointer is
NULL. Anycaller checking the guard saw
writer==NULL, concluded the recording wasdead, cleared the slot, and spawned a fresh thread — while the original
thread was still alive and initializing.
Fix
Introduce a third state in the guard using the existing
ctx->runningflagas a discriminator:
writer != NULL && mp4_writer_is_recording()writer == NULL && ctx->running == 1ctx->runningis set to1by the caller beforepthread_create()andonly cleared by the thread on exit or by a stop path — making it an
unambiguous sentinel for the initialization window.
Files changed
src/video/mp4_recording_core.cFix applied identically to all four entry points:
start_mp4_recording()start_mp4_recording_with_url()start_mp4_recording_with_trigger()start_mp4_recording_with_url_and_trigger()Testing
Tested on Rock Pi 4C+ running the lightNVR Docker container with a TP-Link
C545D dual-lens camera. Rapid successive motion events previously produced
2-3 concurrent recording entries per stream in the database. With this fix
only one recording instance per stream is ever active.
Notes
The old two-state guard is preserved as a commented block with inline
// OLD CODE/// <-- bug:annotations instart_mp4_recording_with_trigger(),which is the most commonly exercised variant and where the race is most
easily triggered.