Skip to content

v0.4.0

Choose a tag to compare

@github-actions github-actions released this 28 May 17:08
· 16 commits to main since this release
docker-talkies v0.4.0 — Qwen3-TTS voice cloning + custom voices.

Second TTS engine (qwen3-tts-0.6b, CUDA-only) alongside Kokoro, with a
/data/custom-voices/ user-mount convention for voice cloning. Renames
the local host cache dir ~/.talkies-models → ~/.talkies-data.

Highlights:
- faster-qwen3-tts 0.2.6 backend, bfloat16 + SDPA. First synth captures
  CUDA graphs (~30-60s); subsequent calls sub-second.
- 3 builtin Qwen3 voices bundled (alloy/echo/fable as cloned samples)
  plus user-mountable /data/custom-voices/. Nested subdirs preserved
  in the voice name. Sibling <name>.txt (ref text) and <name>.lang
  (language) honored.
- Path-traversal guard on voice resolution.
- /v1/audio/voices now reports origin: "builtin" | "custom".
- Qwen3 CUDA check deferred to load time so the server boots on CPU
  hosts when qwen3-tts-0.6b is excluded via TALKIES_ENABLED_MODELS.
- Integration suite: 7 new qwen3 tests; transcribe loop skips TTS
  slugs via a /v1/models-derived ASR-only list.

Backwards-compatible: existing /v1/audio/speech against kokoro-82m,
/v1/audio/transcriptions, the MCP tool surface, and all model slugs
work identically.