Browser-native text-to-speech running 100% client-side via Rust/WASM.
Disclaimer: Experimental port. Model from Pocket TTS by Kyutai Labs.
- Modern browser with WebAssembly support
# 1. Build WASM package
wasm-pack build crates/tts-wasm --target web
# 2. Start dev server
bun web/serve.mjs- TTS model (
crates/tts-wasm/): Pocket TTS compiled to WebAssembly via Candle. Generates speech from text tokens using a voice embedding. - mimi-rs: Shared Rust library for the Mimi audio codec (encoder + decoder + streaming transformer). Used by both tts-web and stt-web.
- Web UI (
web/): Web Worker orchestrates model loading and generation, streams audio chunks back to the main thread for real-time playback.
The model ships as a GGUF Q8_0 file (~130MB). Weights are loaded directly as Q8_0 via candle's QMatMul, keeping ~97M quantized parameters in memory (~103MB vs ~388MB F32) and reducing memory bandwidth ~4x per inference step.