Skip to content

mtmd: refactor video subproc handling#24316

Open
ngxson wants to merge 1 commit into
masterfrom
xsn/video_refactor_subproc
Open

mtmd: refactor video subproc handling#24316
ngxson wants to merge 1 commit into
masterfrom
xsn/video_refactor_subproc

Conversation

@ngxson

@ngxson ngxson commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Overview

Refactor mtmd_helper_video, add a RAII wrapper subprocess_handle to make it a bit safer to work with

Requirements

@mudler mudler left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested and fixes also the issue that #24313 was fixing too. Thanks @ngxson

mudler added a commit to mudler/LocalAI that referenced this pull request Jun 8, 2026
Upstream replaced the ad-hoc video stdin handling with a proper RAII
refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc
handling"), which includes the same `sp->stdin_file = nullptr` guard our
patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to
that branch head and drop patches/0001 - it's now redundant.

Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode
and the model answers correctly (red clip -> "Red", blue -> "Blue").

NOTE: #24316 is not yet merged, so this pins to its branch-head commit
(28ca1e60). Re-pin to the squash-merge commit on master once it lands,
otherwise `git fetch` may lose the commit after the branch is deleted.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
mudler added a commit to mudler/LocalAI that referenced this pull request Jun 8, 2026
* chore(llama-cpp): bump to 8f83d6c for mtmd video input support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama-cpp): forward video input to mtmd (template + non-template paths)

Wire request->videos() into grpc-server.cpp mirroring the existing image
and audio handling: a video_data build + non-template files extraction, and
input_video chat chunks on the tokenizer-template path. allow_video is
auto-set at model load by the vendored upstream chat_params.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add video attachment support to the chat UI

Mirror the image/audio attachment path for video: emit video_url content
parts, accept video/* in the picker, keep video files as base64, show a
film icon badge, and render attached video inline with a <video> player.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(llama-cpp): patch mtmd video stdin double-close (heap crash)

Upstream mtmd video input (ggml-org/llama.cpp#24269) double-fcloses the
ffmpeg/ffprobe stdin FILE: feed_stdin() fclose()s the FILE returned by
subprocess_stdin() (which is sp->stdin_file), then subprocess_destroy()
fclose()s the same pointer again -> heap corruption that aborts the
backend on any base64 input_video request (the CLI --video file path is
unaffected). Vendor a one-line fix (null sp->stdin_file after fclose)
via prepare.sh's patches/ until upstream merges it.

Verified e2e with gemma-4-e2b-it-qat-q4_0: video frames decode via
ffmpeg and the model answers correctly (red clip -> 'Red', blue -> 'Blue').

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(llama-cpp): re-pin to upstream #24316, drop vendored stdin patch

Upstream replaced the ad-hoc video stdin handling with a proper RAII
refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc
handling"), which includes the same `sp->stdin_file = nullptr` guard our
patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to
that branch head and drop patches/0001 - it's now redundant.

Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode
and the model answers correctly (red clip -> "Red", blue -> "Blue").

NOTE: #24316 is not yet merged, so this pins to its branch-head commit
(28ca1e60). Re-pin to the squash-merge commit on master once it lands,
otherwise `git fetch` may lose the commit after the branch is deleted.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
@ngxson ngxson added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 8, 2026
cmd.push_back("-ss");
cmd.push_back(seek_buf);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either in this PR or a follow-up PR you might want to try adding -nostdin to ffmpeg. The test clip in the repo test-3.mp4 encodes fine without it, but a longer clip I tested gets stuck midway, with ffmpegs just sitting there and nothing progressing.

I think might be because my clip is larger than the test clip. test-3.mp4 from llama.cpp repo is 647K but my clip is about 11 megabytes. Without -nostdin, test-3.mp4 still works fine but my larger test clip doesn't.

Suggested change
cmd.push_back("-nostdin");

https://ffmpeg.org/ffmpeg.html (search for -nostdin, no convenient anchor URL).

-stdin
Enable interaction on standard input. On by default unless standard input is used as an input. To explicitly disable interaction you need to specify -nostdin.

Disabling interaction on standard input is useful, for example, if ffmpeg is in the background process group. Roughly the same result can be achieved with ffmpeg ... < /dev/null but it requires a shell.

I am wondering if ffmpeg is being dumdum and not recognizing the cache:pipe:0 as "standard input is being used as source". Maybe something there tries to race reading stdin in some bad interaction way, didn't bother tracing syscalls or anything.

GitHub is rejecting the .mp4 I used for testing, so I uploaded it here: https://submarination.space/public/subnautica2_test_clip_2026_06_08.mp4 (~11mb, 42 seconds of audioless clip of some Subnautica 2 gameplay)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants