fix(stablediffusion-ggml): mux LTX-2 audio into output MP4 by localai-bot · Pull Request #9990 · mudler/LocalAI

localai-bot · 2026-05-25T19:59:52Z

Summary

sd.cpp's generate_video() returns audio alongside frames for LTX-2.3 (audio VAE produces a waveform), but our gosd wrapper was collecting the sd_audio_t* and immediately freeing it without muxing — every LTX-2 generation landed as a silent MP4.
Stage the planar float32 waveform to a temp WAV (IEEE-float header, samples interleaved on the fly), then add it as a second ffmpeg input with -c:a aac -b:a 192k -map 0:v:0 -map 1:a:0 -shortest.
Temp WAV is unlink()'d on all ffmpeg exit paths (success, write error, waitpid error).
Non-LTX models (Wan i2v / FLF2V, Flux video) are unaffected: audio == nullptr → have_audio stays false → no audio flags added to ffmpeg argv, no temp file created.

Test plan

make libgosd-fallback.so builds cleanly with the change; gen_video and load_model symbols are still exported.
Verify on a live worker: install ltx-2.3-22b-distilled-ggml, POST /video with a non-trivial prompt and num_frames >= 25, download the resulting MP4 and confirm ffprobe reports a streaming aac audio track alongside h264 video.
Verify no regression on Wan: same /video call against wan-2.1-i2v-14b-480p-ggml should still produce a single-stream h264 MP4 (no audio, no errors).

Assisted-by: Claude:claude-opus-4-7

sd.cpp's generate_video now returns a sd_audio_t* alongside the video frames for models with an audio VAE (LTX-2.3). Our gosd wrapper was already collecting that pointer but immediately freed it without ever muxing it into the output, so LTX-2 generations landed as silent MP4s even though the audio VAE decode succeeded. Stage the planar float32 waveform to a temp WAV (IEEE float, header hand-built; samples interleaved on the fly), then add it as a second ffmpeg input with -c:a aac -map 0:v:0 -map 1:a:0 -shortest. The temp WAV is cleaned up unconditionally after ffmpeg exits, including on the write/waitpid error paths. Non-LTX models (Wan i2v / FLF2V) keep their current behaviour: audio arg is nullptr, the audio-related ffmpeg flags are not added, and no temp file is created. Assisted-by: Claude:claude-opus-4-7

mudler merged commit de2ce74 into master May 25, 2026
63 checks passed

mudler deleted the fix/stablediffusion-ggml-ltx2-audio branch May 25, 2026 20:40

BrewTestBot mentioned this pull request May 27, 2026

localai 4.3.2 Homebrew/homebrew-core#285003

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(stablediffusion-ggml): mux LTX-2 audio into output MP4#9990

fix(stablediffusion-ggml): mux LTX-2 audio into output MP4#9990
mudler merged 1 commit into
masterfrom
fix/stablediffusion-ggml-ltx2-audio

localai-bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 25, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants