Skip to content

fix(stablediffusion-ggml): mux LTX-2 audio into output MP4#9990

Merged
mudler merged 1 commit into
masterfrom
fix/stablediffusion-ggml-ltx2-audio
May 25, 2026
Merged

fix(stablediffusion-ggml): mux LTX-2 audio into output MP4#9990
mudler merged 1 commit into
masterfrom
fix/stablediffusion-ggml-ltx2-audio

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

  • sd.cpp's generate_video() returns audio alongside frames for LTX-2.3 (audio VAE produces a waveform), but our gosd wrapper was collecting the sd_audio_t* and immediately freeing it without muxing — every LTX-2 generation landed as a silent MP4.
  • Stage the planar float32 waveform to a temp WAV (IEEE-float header, samples interleaved on the fly), then add it as a second ffmpeg input with -c:a aac -b:a 192k -map 0:v:0 -map 1:a:0 -shortest.
  • Temp WAV is unlink()'d on all ffmpeg exit paths (success, write error, waitpid error).
  • Non-LTX models (Wan i2v / FLF2V, Flux video) are unaffected: audio == nullptrhave_audio stays false → no audio flags added to ffmpeg argv, no temp file created.

Test plan

  • make libgosd-fallback.so builds cleanly with the change; gen_video and load_model symbols are still exported.
  • Verify on a live worker: install ltx-2.3-22b-distilled-ggml, POST /video with a non-trivial prompt and num_frames >= 25, download the resulting MP4 and confirm ffprobe reports a streaming aac audio track alongside h264 video.
  • Verify no regression on Wan: same /video call against wan-2.1-i2v-14b-480p-ggml should still produce a single-stream h264 MP4 (no audio, no errors).

Assisted-by: Claude:claude-opus-4-7

sd.cpp's generate_video now returns a sd_audio_t* alongside the video
frames for models with an audio VAE (LTX-2.3). Our gosd wrapper was
already collecting that pointer but immediately freed it without ever
muxing it into the output, so LTX-2 generations landed as silent MP4s
even though the audio VAE decode succeeded.

Stage the planar float32 waveform to a temp WAV (IEEE float, header
hand-built; samples interleaved on the fly), then add it as a second
ffmpeg input with -c:a aac -map 0:v:0 -map 1:a:0 -shortest. The temp
WAV is cleaned up unconditionally after ffmpeg exits, including on
the write/waitpid error paths.

Non-LTX models (Wan i2v / FLF2V) keep their current behaviour: audio
arg is nullptr, the audio-related ffmpeg flags are not added, and no
temp file is created.

Assisted-by: Claude:claude-opus-4-7
@mudler mudler merged commit de2ce74 into master May 25, 2026
63 checks passed
@mudler mudler deleted the fix/stablediffusion-ggml-ltx2-audio branch May 25, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants