fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI)#9682
Merged
Conversation
The Hugo build has been failing on master since the relevant pages landed: - text-generation.md:720 referenced `/docs/features/distributed-mode`, but Hugo `relref` paths are relative to the content root, not the rendered URL. Drop the `/docs/` prefix so the lookup matches the existing `features/...` form used elsewhere in the file. - audio-transform.md:144 referenced `tts.md`; the actual page is `text-to-audio.md`. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The recent backend.proto additions (Diarize, AudioTransform, AudioTransformStream) extended the gRPC Backend trait, breaking kokoros-grpc compilation with E0046 because the Rust implementation hadn't picked up the new methods. Add Unimplemented stubs matching the existing pattern for non-applicable RPCs in this TTS-only backend. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Two recent commits in mudler/vibevoice.cpp reshaped the vv_capi_tts
signature without a corresponding bump on the LocalAI side:
3bd759c "1.5b: unify into a single tts entry point" inserted a
ref_audio_path parameter between voice_path and dst_wav_path.
ad856bd "1.5b: multi-speaker dialog support" promoted that to a
(const char* const* ref_audio_paths, int n_ref_audio_paths)
pair for per-speaker conditioning.
Because purego resolves symbols by name and not by signature, the
build kept linking; at runtime the misaligned arguments turned the
TTS->ASR closed-loop test into a SIGSEGV inside cgo. Track HEAD
explicitly and bring the bridge in line with it:
* Update the CppTTS purego binding to the 9-arg form. purego
marshals []*byte as a **char by handing the C side the underlying
array address; nil/empty maps to NULL, which matches the C
contract for "no reference audio" on the realtime-0.5B path.
* Add a `ref_audio` gallery option (comma-separated, repeatable)
that the 1.5B path consumes for runtime voice cloning. Multiple
entries are interpreted as one WAV per speaker (Speaker 0..n-1).
* TTSRequest.Voice now routes by extension/shape: `.wav` or a
comma-separated list goes to ref_audio_paths; anything else stays
on voice_path (realtime-0.5B's pre-baked voice gguf).
* Pin VIBEVOICE_CPP_VERSION to ad856bd and wire the Makefile into
the existing bump_deps matrix so future upstream rolls land as
reviewable PRs instead of a silent CI break.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Use the existing audio_path field from ModelOptions (already plumbed through config_file's `audio_path:` YAML and consumed by other audio backends like kokoros) instead of inventing a custom `ref_audio:` Options[] string. Multi-speaker setups stay on a single comma- separated value. No behavior change beyond the gallery key name; per-call routing via TTSRequest.Voice is unchanged. Assisted-by: Claude:claude-opus-4-7[1m] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent fixes for failures observed on master after
06a15241:Deploy docs to GitHub Pages❌ — Hugo build aborts on two brokenrelrefs (/docs/features/distributed-modewas using a URL-style path;tts.mddoesn't exist, the page istext-to-audio.md).tests-extras / tests-kokoros❌ — recentbackend.protoadditions (Diarize,AudioTransform,AudioTransformStream) extended the gRPCBackendtrait. kokoros-grpc didn't pick them up and failed compilation with E0046. AddsUnimplementedstubs matching the pattern used for the other non-applicable RPCs in this TTS-only backend.tests-extras / tests-vibevoice-cpp+tests-vibevoice-cpp-grpc-tts❌ —mudler/vibevoice.cppreshapedvv_capi_ttstwice in quick succession (3bd759cinsertedref_audio_path,ad856bdpromoted it to(const char* const* ref_audio_paths, int n_ref_audio_paths)for multi-speaker). purego resolves by symbol name, so the build kept linking; at runtime the misaligned arguments turned the closed-loop TTS→ASR test into a SIGSEGV inside cgo.The vibevoice fix wires the new ABI properly and uses the moment to expose voice-cloning support:
CppTTSpurego binding switched to the 9-arg form.[]*bytemarshals as**char(nil/empty → NULL).ref_audiogallery option (comma-separated, repeatable) — one WAV per speaker for the 1.5B path.TTSRequest.Voiceroutes by extension/shape:.wavor a comma-list goes toref_audio_paths; anything else stays onvoice_pathfor the 0.5B pre-baked voice gguf.VIBEVOICE_CPP_VERSIONpinned toad856bda6b1311b7f3d7c4a667be43eeb8a8249a. Floating onmasteris what allowed the silent ABI break to reach CI in the first place.mudler/vibevoice.cppto.github/workflows/bump_deps.yamlso future upstream rolls land as reviewable PRs alongside the other backends.Test plan
Deploy docs to GitHub Pagessucceeds on this PRtests-extras / tests-kokoroscompiles and passestests-extras / tests-vibevoice-cppclosed-loop TTS→ASR test passestests-extras / tests-vibevoice-cpp-grpc-ttsand…-grpc-transcriptionpassinfer_schemarejecting stringified'torch.Tensor'annotations; not in scope here)Out of scope
mudler/vibevoice.cpp-modelsdoesn't ship a 1.5B GGUF yet. Backend already supports it viaref_audio:; gallery row is a follow-up once the model file is published.tests-rerankerstorch/transformers version drift — separate workstream.