Skip to content

fix(mp4_recording_core): treat writer==NULL+running==1 as initializing to prevent duplicate recordings#354

Closed
origin2000 wants to merge 1 commit intoopensensor:mainfrom
origin2000:fix/mp4-duplicate-recording-race
Closed

fix(mp4_recording_core): treat writer==NULL+running==1 as initializing to prevent duplicate recordings#354
origin2000 wants to merge 1 commit intoopensensor:mainfrom
origin2000:fix/mp4-duplicate-recording-race

Conversation

@origin2000
Copy link
Copy Markdown
Contributor

Problem

Under certain conditions, the same stream accumulates two or more parallel,
overlapping MP4 recording threads — resulting in duplicate database entries
and competing writers for the same file path.

The bug is in the already-running guard in all four start_mp4_recording*
variants in src/video/mp4_recording_core.c. The original guard had only
two states: healthy or dead. It missed a third state: initializing.

ctx->mp4_writer is assigned inside mp4_recording_thread(), after a
successful RTSP connect and avformat_find_stream_info() — a process that
can take several seconds. During that window the pointer is NULL. Any
caller checking the guard saw writer==NULL, concluded the recording was
dead, cleared the slot, and spawned a fresh thread — while the original
thread was still alive and initializing.

Fix

Introduce a third state in the guard using the existing ctx->running flag
as a discriminator:

Condition Interpretation Action
writer != NULL && mp4_writer_is_recording() Healthy Return early
writer == NULL && ctx->running == 1 Initializing Return early
Everything else Dead Extract, join, restart

ctx->running is set to 1 by the caller before pthread_create() and
only cleared by the thread on exit or by a stop path — making it an
unambiguous sentinel for the initialization window.

Files changed

  • src/video/mp4_recording_core.c

Fix applied identically to all four entry points:

  • start_mp4_recording()
  • start_mp4_recording_with_url()
  • start_mp4_recording_with_trigger()
  • start_mp4_recording_with_url_and_trigger()

Testing

Tested on Rock Pi 4C+ running the lightNVR Docker container with a TP-Link
C545D dual-lens camera. Rapid successive motion events previously produced
2-3 concurrent recording entries per stream in the database. With this fix
only one recording instance per stream is ever active.

Notes

The old two-state guard is preserved as a commented block with inline
// OLD CODE / // <-- bug: annotations in start_mp4_recording_with_trigger(),
which is the most commonly exercised variant and where the race is most
easily triggered.

…g to prevent duplicate recordings

The already-running guard in all four start_mp4_recording* variants had
only two states: healthy (writer!=NULL && recording) or dead (everything
else). This missed the initialization window between pthread_create()
returning and ctx->mp4_writer being assigned inside mp4_recording_thread()
after RTSP connect + avformat_find_stream_info(), which can take several
seconds.

During that window every caller saw writer==NULL, concluded the recording
was dead, cleared the slot, and spawned a new thread - producing multiple
parallel recordings with overlapping timestamps in the database.

Fix: introduce a third state using ctx->running as discriminator:
  1. writer!=NULL && mp4_writer_is_recording() -> healthy, return early
  2. writer==NULL && ctx->running==1           -> initializing, return early
  3. everything else                           -> dead, clean up and restart

ctx->running is set to 1 by the caller before pthread_create() and only
cleared by the thread on exit or by a stop path, making it an unambiguous
sentinel for the initialization window.

Fix applied identically to all four entry points:
  - start_mp4_recording()
  - start_mp4_recording_with_url()
  - start_mp4_recording_with_trigger()
  - start_mp4_recording_with_url_and_trigger()

Tested on Rock Pi 4C+ with TP-Link C545D camera. Rapid motion events
previously produced 2-3 concurrent recording entries per stream in the
database. With this fix only one recording instance per stream is active.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

✅ All contributors have signed the CLA. Thank you!
Posted by the CLA Assistant Lite bot.

@origin2000
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a race condition in mp4_recording_core.c where rapid successive start requests could create overlapping MP4 recording threads for the same stream (duplicate DB entries and competing writers). It does so by expanding the “already running” guard to recognize an additional “initializing” state using ctx->running.

Changes:

  • Update all start_mp4_recording* entry points to treat writer == NULL && ctx->running == 1 as “initializing” and skip duplicate starts.
  • Add explanatory comments (including a preserved “OLD CODE” block) describing the race and the three-state guard.
  • Change MP4 directory path construction to use stream_name directly (removing prior sanitize_stream_name() usage and dropping the core/path_utils.h include).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -990,13 +1036,6 @@ int start_mp4_recording_with_trigger(const char *stream_name, const char *trigge
}
pthread_mutex_unlock(&recording_contexts_mutex);

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In start_mp4_recording_with_trigger(), when an existing context is deemed dead, it’s extracted into dead_ctx and the slot is cleared, but dead_ctx is never joined/detached/freed (cleanup_dead_recording() is not called). This can leak the ctx and leave the previous outer recording thread running concurrently with the new one, reintroducing overlapping recordings. Call cleanup_dead_recording(dead_ctx, stream_name) after releasing recording_contexts_mutex (as done in the other start_mp4_recording* variants) before proceeding to allocate/start a new context.

Suggested change
if (dead_ctx) {
cleanup_dead_recording(dead_ctx, stream_name);
}

Copilot uses AI. Check for mistakes.
Comment on lines 658 to 662
snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/%s",
global_config->mp4_storage_path, encoded_name);
global_config->mp4_storage_path, stream_name);
} else {
// Use mp4 directory parallel to hls, NOT inside it
snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s",
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths, and use the sanitized value for both the direct and non-direct mp4_dir branches.

Copilot uses AI. Check for mistakes.
Comment on lines +826 to +830
global_config->mp4_storage_path, stream_name);
} else {
// Use mp4 directory parallel to hls, NOT inside it
snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s",
global_config->storage_path, encoded_name);
global_config->storage_path, stream_name);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths, and use the sanitized value for both the direct and non-direct mp4_dir branches.

Copilot uses AI. Check for mistakes.
Comment on lines 1089 to +1093
snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/%s",
global_config->mp4_storage_path, encoded_name);
global_config->mp4_storage_path, stream_name);
} else {
snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s",
global_config->storage_path, encoded_name);
global_config->storage_path, stream_name);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths.

Copilot uses AI. Check for mistakes.
Comment on lines 1242 to +1246
snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/%s",
global_config->mp4_storage_path, encoded_name);
global_config->mp4_storage_path, stream_name);
} else {
snprintf(mp4_dir, MAX_PATH_LENGTH, "%s/mp4/%s",
global_config->storage_path, encoded_name);
global_config->storage_path, stream_name);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MP4 output directory paths are now built using the raw stream_name. Unlike sanitize_stream_name(), this allows path separators / traversal sequences in stream names and can write outside the intended storage root (and also diverges from the consistent sanitization used for HLS and other storage paths). Reintroduce sanitize_stream_name() (or equivalent validation) when constructing filesystem paths.

Copilot uses AI. Check for mistakes.
Comment on lines +975 to +991
//
// OLD CODE (kept for reference):
//
// mp4_recording_ctx_t *dead_ctx = NULL;
// pthread_mutex_lock(&recording_contexts_mutex);
// for (int i = 0; i < g_config.max_streams; i++) {
// if (recording_contexts[i] && strcmp(recording_contexts[i]->config.name, stream_name) == 0) {
// mp4_writer_t *writer = recording_contexts[i]->mp4_writer;
// if (writer && mp4_writer_is_recording(writer)) {
// pthread_mutex_unlock(&recording_contexts_mutex);
// return 0; // Already running and healthy
// }
// // <-- bug: writer==NULL was treated as dead and the slot was cleared,
// // allowing a second recording instance to start for the same stream.
// // writer is set asynchronously inside mp4_recording_thread (after
// // RTSP connect + avformat_find_stream_info, which can take several
// // seconds). During that window every call here saw writer==NULL,
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This large commented-out OLD CODE block makes the function harder to read/maintain and duplicates what is already preserved in git history. Consider removing the block and keeping only a brief comment describing the race and the three-state guard, or referencing the commit/PR that fixed it.

Suggested change
//
// OLD CODE (kept for reference):
//
// mp4_recording_ctx_t *dead_ctx = NULL;
// pthread_mutex_lock(&recording_contexts_mutex);
// for (int i = 0; i < g_config.max_streams; i++) {
// if (recording_contexts[i] && strcmp(recording_contexts[i]->config.name, stream_name) == 0) {
// mp4_writer_t *writer = recording_contexts[i]->mp4_writer;
// if (writer && mp4_writer_is_recording(writer)) {
// pthread_mutex_unlock(&recording_contexts_mutex);
// return 0; // Already running and healthy
// }
// // <-- bug: writer==NULL was treated as dead and the slot was cleared,
// // allowing a second recording instance to start for the same stream.
// // writer is set asynchronously inside mp4_recording_thread (after
// // RTSP connect + avformat_find_stream_info, which can take several
// // seconds). During that window every call here saw writer==NULL,
// Note: mp4_writer is initialized asynchronously inside
// mp4_recording_thread(). A NULL writer can therefore mean "still starting"
// rather than "dead". The logic below must preserve the three-state guard
// and avoid clearing the slot during that startup window.

Copilot uses AI. Check for mistakes.
Comment on lines 1010 to 1014
mp4_writer_t *writer = recording_contexts[i]->mp4_writer;

if (writer && mp4_writer_is_recording(writer)) {
// State 1: fully up and healthy
pthread_mutex_unlock(&recording_contexts_mutex);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new guard treats only (writer==NULL && ctx->running==1) as the initialization window. However, mp4_recording_thread() sets ctx->mp4_writer before it starts the inner RTSP writer thread (and can spend seconds resolving the URL / retrying go2rtc before calling mp4_writer_start_recording_thread). During that period writer!=NULL but mp4_writer_is_recording(writer)==0, so this code will classify the context as “dead” and try to restart, potentially reintroducing overlapping recordings. Consider treating any ctx->running==1 as “already in progress” (or deferring assignment of ctx->mp4_writer until after the inner thread is started) so that a start request can’t race the remainder of initialization.

Copilot uses AI. Check for mistakes.
@matteius
Copy link
Copy Markdown
Contributor

matteius commented Apr 3, 2026

I am going to merge this one and clean it up on main so it can be included in release.

matteius added a commit that referenced this pull request Apr 3, 2026
Restore sanitize_stream_name() in all four start functions to prevent
path traversal via crafted stream names.  Restore cleanup_dead_recording()
call in start_mp4_recording_with_trigger() to avoid leaking dead contexts.
Remove large commented "OLD CODE" block and restore path_utils.h include.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matteius
Copy link
Copy Markdown
Contributor

matteius commented Apr 3, 2026

It has been done, I fetched your brnach and cherry picked so I could get into next release.

@matteius matteius closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants