You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docker-talkies v0.6.0 — kokoro-82m-nvidia ONNX backend, qwen3-tts
instructions wiring, self-spawning integration test harness.
Minor bump. Three additive threads, no breaking change.
1. New TTS slug `kokoro-82m-nvidia` (nvidia/kokoro-82M-onnx-opt,
Apache-2.0). Same Kokoro-82M weights as `kokoro-82m`, same
40-voice catalog, same wire shape, served via ONNXRuntime
against NVIDIA's TensorRT-friendly ONNX export. No PyTorch on
the inference hot path. G2P via espeak-ng. Pick this slug for
ORT execution; pick `kokoro-82m` for misaki-driven G2P quality.
2. PR #1 (martincohen): qwen3-tts now honours the `instructions`
request field — passed through to faster-qwen3-tts as the
`instruct` parameter. Voices without a sibling `.txt` transcript
now fall back to x-vector-only mode (with a warning log)
instead of returning 400. Kokoro continues to accept and ignore
`instructions` for OpenAI wire-shape parity.
3. Integration test harness refactor. Every test_*.sh / e2e_*.sh
self-spawns its own --rm --gpus all container on an ephemeral
port, runs its checks, tears the container down on EXIT trap.
`bash tests/integration/<file>` does the whole lifecycle
without an external orchestrator. `run.sh` is now a dispatcher
that runs each file as a subprocess.
Round-trip verified: kokoro-82m-nvidia synth →
whisper-large-v3-turbo transcribes to the expected phrase,
proving the ONNX backend produces intelligible English, not just
well-formed bytes. test_speech.sh 15/15, test_endpoints.sh 7/7,
e2e_kokoro_nvidia.sh 7/7, 11 unit tests green.
No breaking change. New slug is additive; every other slug
behaves identically (with Qwen3 `instructions` now honoured
instead of dropped — behaviour upgrade, not wire-shape change).