mtmd: refactor video subproc handling#24316
Conversation
Upstream replaced the ad-hoc video stdin handling with a proper RAII refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc handling"), which includes the same `sp->stdin_file = nullptr` guard our patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to that branch head and drop patches/0001 - it's now redundant. Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode and the model answers correctly (red clip -> "Red", blue -> "Blue"). NOTE: #24316 is not yet merged, so this pins to its branch-head commit (28ca1e60). Re-pin to the squash-merge commit on master once it lands, otherwise `git fetch` may lose the commit after the branch is deleted. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(llama-cpp): bump to 8f83d6c for mtmd video input support Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): forward video input to mtmd (template + non-template paths) Wire request->videos() into grpc-server.cpp mirroring the existing image and audio handling: a video_data build + non-template files extraction, and input_video chat chunks on the tokenizer-template path. allow_video is auto-set at model load by the vendored upstream chat_params. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add video attachment support to the chat UI Mirror the image/audio attachment path for video: emit video_url content parts, accept video/* in the picker, keep video files as base64, show a film icon badge, and render attached video inline with a <video> player. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(llama-cpp): patch mtmd video stdin double-close (heap crash) Upstream mtmd video input (ggml-org/llama.cpp#24269) double-fcloses the ffmpeg/ffprobe stdin FILE: feed_stdin() fclose()s the FILE returned by subprocess_stdin() (which is sp->stdin_file), then subprocess_destroy() fclose()s the same pointer again -> heap corruption that aborts the backend on any base64 input_video request (the CLI --video file path is unaffected). Vendor a one-line fix (null sp->stdin_file after fclose) via prepare.sh's patches/ until upstream merges it. Verified e2e with gemma-4-e2b-it-qat-q4_0: video frames decode via ffmpeg and the model answers correctly (red clip -> 'Red', blue -> 'Blue'). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(llama-cpp): re-pin to upstream #24316, drop vendored stdin patch Upstream replaced the ad-hoc video stdin handling with a proper RAII refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc handling"), which includes the same `sp->stdin_file = nullptr` guard our patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to that branch head and drop patches/0001 - it's now redundant. Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode and the model answers correctly (red clip -> "Red", blue -> "Blue"). NOTE: #24316 is not yet merged, so this pins to its branch-head commit (28ca1e60). Re-pin to the squash-merge commit on master once it lands, otherwise `git fetch` may lose the commit after the branch is deleted. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
| cmd.push_back("-ss"); | ||
| cmd.push_back(seek_buf); | ||
| } | ||
|
|
There was a problem hiding this comment.
I think either in this PR or a follow-up PR you might want to try adding -nostdin to ffmpeg. The test clip in the repo test-3.mp4 encodes fine without it, but a longer clip I tested gets stuck midway, with ffmpegs just sitting there and nothing progressing.
I think might be because my clip is larger than the test clip. test-3.mp4 from llama.cpp repo is 647K but my clip is about 11 megabytes. Without -nostdin, test-3.mp4 still works fine but my larger test clip doesn't.
| cmd.push_back("-nostdin"); |
https://ffmpeg.org/ffmpeg.html (search for -nostdin, no convenient anchor URL).
-stdin
Enable interaction on standard input. On by default unless standard input is used as an input. To explicitly disable interaction you need to specify -nostdin.Disabling interaction on standard input is useful, for example, if ffmpeg is in the background process group. Roughly the same result can be achieved with ffmpeg ... < /dev/null but it requires a shell.
I am wondering if ffmpeg is being dumdum and not recognizing the cache:pipe:0 as "standard input is being used as source". Maybe something there tries to race reading stdin in some bad interaction way, didn't bother tracing syscalls or anything.
GitHub is rejecting the .mp4 I used for testing, so I uploaded it here: https://submarination.space/public/subnautica2_test_clip_2026_06_08.mp4 (~11mb, 42 seconds of audioless clip of some Subnautica 2 gameplay)
Overview
Refactor
mtmd_helper_video, add a RAII wrappersubprocess_handleto make it a bit safer to work withRequirements