Releases · psyb0t/docker-talkies

09 Jun 07:37

v0.9.0

a1b7d2a

v0.9.0 Latest

Latest

docker-talkies v0.9.0 — Nemotron-3.5-ASR (parakeet.cpp) + GPU drain b…

Assets 2

31 May 16:34

github-actions

v0.8.0

388b100

v0.8.0

docker-talkies v0.8.0 — Qwen3-TTS CustomVoice + VoiceDesign + 1.7B Ba…

Assets 2

31 May 10:17

github-actions

v0.7.0

de718ad

v0.7.0

docker-talkies v0.7.0 — Qwen3-TTS PCM streaming + supply-chain

bump-on-mutation Makefile workflow.

Minor release. Two user-visible threads.

1. PCM streaming for Qwen3-TTS. response_format="pcm" against a
   qwen3_tts model now streams the raw PCM body via HTTP/1.1
   chunked transfer-encoding instead of buffering the full
   utterance. First-audio latency drops from ~3-8 s (synthesise +
   buffer) to ~200-700 ms (TTFA on first decoded chunk). Marked
   WIP in the original development commit — surface is live, edge
   cases still soaking. Other formats + Kokoro backends are
   unchanged. New env var TALKIES_QWEN3_STREAM_CHUNK_SIZE (default
   8) controls codec-steps-per-chunk.

2. pkg-* Makefile workflow. New make targets (pkg-lock / pkg-add /
   pkg-update / pkg-upgrade / pkg-remove) call
   scripts/bump_exclude_newer.sh before any uv operation so the
   [tool.uv] exclude-newer age gate is always anchored to the
   moment of the mutation. Closes the "silent drift forward" hole.

Plus housekeeping: .gitattributes enforces LF on shell scripts,
Dockerfile.cuda strips CRLF defensively, qwen3-tts xvec_only kwarg
fix landed (parallel patch — same content as v0.6.1's fix).

Caller code that assumed Content-Length on /v1/audio/speech needs
to adapt for the qwen3_tts + response_format=pcm case. Every
other code path is wire-compatible with v0.6.1.

v0.6.2 was a local-only tag (never published) — this is the next
public release.

Assets 2

30 May 16:10

github-actions

v0.6.1

862050a

v0.6.1

docker-talkies v0.6.1 — fix qwen3-tts kwarg regression from v0.6.0.

Patch release. v0.6.0 shipped PR #1's Qwen3-TTS instructions
wiring + x-vector fallback with a wrong kwarg name on
`model.generate_voice_clone(...)` — every Qwen3 synth request
500'd with TypeError.

Fix: `x_vector_only_mode=` → `xvec_only=` (the correct name on
faster_qwen3_tts==0.2.6's higher-level voice-clone API).
`instruct=` was already right.

New tests guard the instructions field, the x-vector fallback,
and the Kokoro protocol-bump compatibility path.

Kokoro slugs (kokoro-82m, kokoro-82m-nvidia) were unaffected by
the v0.6.0 regression.

No breaking change. No new dependency.

Assets 2

30 May 15:45

github-actions

v0.6.0

a3b6c2f

v0.6.0

docker-talkies v0.6.0 — kokoro-82m-nvidia ONNX backend, qwen3-tts

instructions wiring, self-spawning integration test harness.

Minor bump. Three additive threads, no breaking change.

1. New TTS slug `kokoro-82m-nvidia` (nvidia/kokoro-82M-onnx-opt,
   Apache-2.0). Same Kokoro-82M weights as `kokoro-82m`, same
   40-voice catalog, same wire shape, served via ONNXRuntime
   against NVIDIA's TensorRT-friendly ONNX export. No PyTorch on
   the inference hot path. G2P via espeak-ng. Pick this slug for
   ORT execution; pick `kokoro-82m` for misaki-driven G2P quality.

2. PR #1 (martincohen): qwen3-tts now honours the `instructions`
   request field — passed through to faster-qwen3-tts as the
   `instruct` parameter. Voices without a sibling `.txt` transcript
   now fall back to x-vector-only mode (with a warning log)
   instead of returning 400. Kokoro continues to accept and ignore
   `instructions` for OpenAI wire-shape parity.

3. Integration test harness refactor. Every test_*.sh / e2e_*.sh
   self-spawns its own --rm --gpus all container on an ephemeral
   port, runs its checks, tears the container down on EXIT trap.
   `bash tests/integration/<file>` does the whole lifecycle
   without an external orchestrator. `run.sh` is now a dispatcher
   that runs each file as a subprocess.

Round-trip verified: kokoro-82m-nvidia synth →
whisper-large-v3-turbo transcribes to the expected phrase,
proving the ONNX backend produces intelligible English, not just
well-formed bytes. test_speech.sh 15/15, test_endpoints.sh 7/7,
e2e_kokoro_nvidia.sh 7/7, 11 unit tests green.

No breaking change. New slug is additive; every other slug
behaves identically (with Qwen3 `instructions` now honoured
instead of dropped — behaviour upgrade, not wire-shape change).

Assets 2

28 May 18:23

github-actions

v0.5.0

e6691ee

v0.5.0

docker-talkies v0.5.0 — drop distil-whisper-large-v3.

Minor bump (breaking pre-1.0). distil-whisper-large-v3 was English-
only and lived alongside the multilingual whisper-large-v3 (OG, max
accuracy) and whisper-large-v3-turbo (multilingual, 8× faster) —
redundant for the value it provided. Removing it.

CUDA registry now: 6 ASR (whisper×2, parakeet, canary×3) + 2 TTS
(kokoro, qwen3) = 8 models.
CPU registry now: 3 ASR (whisper×2, canary-180m) + 1 TTS (kokoro)
= 4 models.

Migration: TALKIES_ENABLED_MODELS=...distil-whisper-large-v3 → drop
the slug or replace with whisper-large-v3-turbo (multilingual) or
whisper-large-v3 (max accuracy). No API or wire-format change.

Assets 2

28 May 17:30

github-actions

v0.4.1

9bd71ec

v0.4.1

docker-talkies v0.4.1 — README rewrite for above-the-fold conversion.

Patch release. Pure docs refresh + a tiny .gitignore tweak. No
behavior change, no API change, no new models or endpoints.

README highlights:
- One-sentence tagline + Python drop-in snippet in the first 25 lines
  (was buried in 4 paragraphs of prose).
- 7 single-line feature bullets: ASR / TTS / voice cloning / hot
  swap / MCP / diarization / CPU+CUDA.
- Quick Start trimmed to 1 docker run + 1 curl above the fold; full
  examples + TOC folded into <details>.
- Dense "how it works" prose moved below the fold, unchanged.

.gitignore: about.txt added to local-tooling section.

Assets 2

28 May 17:08

github-actions

v0.4.0

9f7e34c

v0.4.0

docker-talkies v0.4.0 — Qwen3-TTS voice cloning + custom voices.

Second TTS engine (qwen3-tts-0.6b, CUDA-only) alongside Kokoro, with a
/data/custom-voices/ user-mount convention for voice cloning. Renames
the local host cache dir ~/.talkies-models → ~/.talkies-data.

Highlights:
- faster-qwen3-tts 0.2.6 backend, bfloat16 + SDPA. First synth captures
  CUDA graphs (~30-60s); subsequent calls sub-second.
- 3 builtin Qwen3 voices bundled (alloy/echo/fable as cloned samples)
  plus user-mountable /data/custom-voices/. Nested subdirs preserved
  in the voice name. Sibling <name>.txt (ref text) and <name>.lang
  (language) honored.
- Path-traversal guard on voice resolution.
- /v1/audio/voices now reports origin: "builtin" | "custom".
- Qwen3 CUDA check deferred to load time so the server boots on CPU
  hosts when qwen3-tts-0.6b is excluded via TALKIES_ENABLED_MODELS.
- Integration suite: 7 new qwen3 tests; transcribe loop skips TTS
  slugs via a /v1/models-derived ASR-only list.

Backwards-compatible: existing /v1/audio/speech against kokoro-82m,
/v1/audio/transcriptions, the MCP tool surface, and all model slugs
work identically.

Assets 2

28 May 14:32

github-actions

v0.3.0

c12eeb7

v0.3.0

docker-talkies v0.3.0 — Kokoro TTS.

Adds OpenAI-compatible /v1/audio/speech with mp3/opus/aac/flac/wav/pcm
output, /v1/audio/voices discovery, kokoro-82m in both CPU and CUDA
images. New backend protocol split (BackendBase / ASRBackend /
TTSBackend). Cross-modality eviction shares one VRAM pool between ASR
and TTS; idle TTL sweeper applies to both.

Both runtime images now bundle en_core_web_sm so Kokoro's English G2P
never tries to pip-download at first call (runtime has no pip).

Integration suite gains a cross-modality round-trip test (Kokoro synth
→ fast ASR → assert expected words) plus CPU/memory caps on the test
container to keep the host responsive while inference is running.

Backwards-compatible: all existing ASR endpoints, model slugs, MCP
tools, and response shapes work identically.

Assets 2

28 May 11:58

github-actions

v0.2.1

9b54c2c

v0.2.1

docker-talkies v0.2.1 — agent skill scaffolding + speaches credit.

Docs-only release. Adds .agents/.skills/talkies/ (SKILL.md +
references/setup.md + scripts/bulk_transcribe.sh) so AI agents can
discover and use the talkies API without re-reading the full README.
README gains a Credits section linking speaches as the inspiration
project. No runtime, API, config, or wire-format changes.

Assets 2

Uh oh!

Releases: psyb0t/docker-talkies

v0.9.0

Uh oh!

v0.8.0

Uh oh!

v0.7.0

Uh oh!

v0.6.1

Uh oh!

v0.6.0

Uh oh!

v0.5.0

Uh oh!

v0.4.1

Uh oh!

v0.4.0

Uh oh!

v0.3.0

Uh oh!

v0.2.1

Uh oh!