mtmd: refactor video subproc handling by ngxson · Pull Request #24316 · ggml-org/llama.cpp

ngxson · 2026-06-08T19:43:24Z

Overview

Refactor mtmd_helper_video, add a RAII wrapper subprocess_handle to make it a bit safer to work with

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: yes

mudler

I've tested and fixes also the issue that #24313 was fixing too. Thanks @ngxson

Upstream replaced the ad-hoc video stdin handling with a proper RAII refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc handling"), which includes the same `sp->stdin_file = nullptr` guard our patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to that branch head and drop patches/0001 - it's now redundant. Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode and the model answers correctly (red clip -> "Red", blue -> "Blue"). NOTE: #24316 is not yet merged, so this pins to its branch-head commit (28ca1e60). Re-pin to the squash-merge commit on master once it lands, otherwise `git fetch` may lose the commit after the branch is deleted. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(llama-cpp): bump to 8f83d6c for mtmd video input support Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): forward video input to mtmd (template + non-template paths) Wire request->videos() into grpc-server.cpp mirroring the existing image and audio handling: a video_data build + non-template files extraction, and input_video chat chunks on the tokenizer-template path. allow_video is auto-set at model load by the vendored upstream chat_params. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add video attachment support to the chat UI Mirror the image/audio attachment path for video: emit video_url content parts, accept video/* in the picker, keep video files as base64, show a film icon badge, and render attached video inline with a <video> player. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(llama-cpp): patch mtmd video stdin double-close (heap crash) Upstream mtmd video input (ggml-org/llama.cpp#24269) double-fcloses the ffmpeg/ffprobe stdin FILE: feed_stdin() fclose()s the FILE returned by subprocess_stdin() (which is sp->stdin_file), then subprocess_destroy() fclose()s the same pointer again -> heap corruption that aborts the backend on any base64 input_video request (the CLI --video file path is unaffected). Vendor a one-line fix (null sp->stdin_file after fclose) via prepare.sh's patches/ until upstream merges it. Verified e2e with gemma-4-e2b-it-qat-q4_0: video frames decode via ffmpeg and the model answers correctly (red clip -> 'Red', blue -> 'Blue'). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(llama-cpp): re-pin to upstream #24316, drop vendored stdin patch Upstream replaced the ad-hoc video stdin handling with a proper RAII refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc handling"), which includes the same `sp->stdin_file = nullptr` guard our patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to that branch head and drop patches/0001 - it's now redundant. Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode and the model answers correctly (red clip -> "Red", blue -> "Blue"). NOTE: #24316 is not yet merged, so this pins to its branch-head commit (28ca1e60). Re-pin to the squash-merge commit on master once it lands, otherwise `git fetch` may lose the commit after the branch is deleted. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Noeda · 2026-06-09T01:15:16Z

            cmd.push_back("-ss");
            cmd.push_back(seek_buf);
        }



I think either in this PR or a follow-up PR you might want to try adding -nostdin to ffmpeg. The test clip in the repo test-3.mp4 encodes fine without it, but a longer clip I tested gets stuck midway, with ffmpegs just sitting there and nothing progressing.

I think might be because my clip is larger than the test clip. test-3.mp4 from llama.cpp repo is 647K but my clip is about 11 megabytes. Without -nostdin, test-3.mp4 still works fine but my larger test clip doesn't.

Suggested change

cmd.push_back("-nostdin");

https://ffmpeg.org/ffmpeg.html (search for -nostdin, no convenient anchor URL).

-stdin
Enable interaction on standard input. On by default unless standard input is used as an input. To explicitly disable interaction you need to specify -nostdin.

Disabling interaction on standard input is useful, for example, if ffmpeg is in the background process group. Roughly the same result can be achieved with ffmpeg ... < /dev/null but it requires a shell.

I am wondering if ffmpeg is being dumdum and not recognizing the cache:pipe:0 as "standard input is being used as source". Maybe something there tries to race reading stdin in some bad interaction way, didn't bother tracing syscalls or anything.

GitHub is rejecting the .mp4 I used for testing, so I uploaded it here: https://submarination.space/public/subnautica2_test_clip_2026_06_08.mp4 (~11mb, 42 seconds of audioless clip of some Subnautica 2 gameplay)

mtmd: refactor video subproc handling

28ca1e6

ngxson requested a review from a team as a code owner June 8, 2026 19:43

github-actions Bot added the examples label Jun 8, 2026

ngxson mentioned this pull request Jun 8, 2026

mtmd: fix double-close of ffmpeg/ffprobe stdin in video helper #24313

Closed

CISC approved these changes Jun 8, 2026

View reviewed changes

mudler approved these changes Jun 8, 2026

View reviewed changes

localai-bot mentioned this pull request Jun 8, 2026

feat(llama-cpp): video input support (mtmd #24269) mudler/LocalAI#10216

Merged

ngxson added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 8, 2026

Noeda reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd: refactor video subproc handling#24316

mtmd: refactor video subproc handling#24316
ngxson wants to merge 1 commit into
masterfrom
xsn/video_refactor_subproc

ngxson commented Jun 8, 2026

Uh oh!

mudler left a comment

Uh oh!

Noeda Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ngxson commented Jun 8, 2026

Overview

Requirements

Uh oh!

mudler left a comment

Choose a reason for hiding this comment

Uh oh!

Noeda Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants