Skip to content

v0.6.0

Choose a tag to compare

@github-actions github-actions released this 30 May 15:45
· 9 commits to main since this release
docker-talkies v0.6.0 — kokoro-82m-nvidia ONNX backend, qwen3-tts

instructions wiring, self-spawning integration test harness.

Minor bump. Three additive threads, no breaking change.

1. New TTS slug `kokoro-82m-nvidia` (nvidia/kokoro-82M-onnx-opt,
   Apache-2.0). Same Kokoro-82M weights as `kokoro-82m`, same
   40-voice catalog, same wire shape, served via ONNXRuntime
   against NVIDIA's TensorRT-friendly ONNX export. No PyTorch on
   the inference hot path. G2P via espeak-ng. Pick this slug for
   ORT execution; pick `kokoro-82m` for misaki-driven G2P quality.

2. PR #1 (martincohen): qwen3-tts now honours the `instructions`
   request field — passed through to faster-qwen3-tts as the
   `instruct` parameter. Voices without a sibling `.txt` transcript
   now fall back to x-vector-only mode (with a warning log)
   instead of returning 400. Kokoro continues to accept and ignore
   `instructions` for OpenAI wire-shape parity.

3. Integration test harness refactor. Every test_*.sh / e2e_*.sh
   self-spawns its own --rm --gpus all container on an ephemeral
   port, runs its checks, tears the container down on EXIT trap.
   `bash tests/integration/<file>` does the whole lifecycle
   without an external orchestrator. `run.sh` is now a dispatcher
   that runs each file as a subprocess.

Round-trip verified: kokoro-82m-nvidia synth →
whisper-large-v3-turbo transcribes to the expected phrase,
proving the ONNX backend produces intelligible English, not just
well-formed bytes. test_speech.sh 15/15, test_endpoints.sh 7/7,
e2e_kokoro_nvidia.sh 7/7, 11 unit tests green.

No breaking change. New slug is additive; every other slug
behaves identically (with Qwen3 `instructions` now honoured
instead of dropped — behaviour upgrade, not wire-shape change).