You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docker-talkies v0.4.0 — Qwen3-TTS voice cloning + custom voices.
Second TTS engine (qwen3-tts-0.6b, CUDA-only) alongside Kokoro, with a
/data/custom-voices/ user-mount convention for voice cloning. Renames
the local host cache dir ~/.talkies-models → ~/.talkies-data.
Highlights:
- faster-qwen3-tts 0.2.6 backend, bfloat16 + SDPA. First synth captures
CUDA graphs (~30-60s); subsequent calls sub-second.
- 3 builtin Qwen3 voices bundled (alloy/echo/fable as cloned samples)
plus user-mountable /data/custom-voices/. Nested subdirs preserved
in the voice name. Sibling <name>.txt (ref text) and <name>.lang
(language) honored.
- Path-traversal guard on voice resolution.
- /v1/audio/voices now reports origin: "builtin" | "custom".
- Qwen3 CUDA check deferred to load time so the server boots on CPU
hosts when qwen3-tts-0.6b is excluded via TALKIES_ENABLED_MODELS.
- Integration suite: 7 new qwen3 tests; transcribe loop skips TTS
slugs via a /v1/models-derived ASR-only list.
Backwards-compatible: existing /v1/audio/speech against kokoro-82m,
/v1/audio/transcriptions, the MCP tool surface, and all model slugs
work identically.