Pre-trained text-to-speech models for Aurlo - a privacy-first TTS desktop application.
This repository contains the Kokoro-82M ONNX models optimized for fast, high-quality speech synthesis.
| File | Size | SHA256 Checksum | Description |
|---|---|---|---|
kokoro-v1.0.onnx |
~310 MB | 7d5df8ecf7d4b1878015a32686053fd0eebe2bc377234608764cc0ef3636a6c5 |
Kokoro TTS model (ONNX format) |
voices-v1.0.bin |
~27 MB | bca610b8308e8d99f32e6fe4197e7ec01679264efed0cac9140fe9c29f1fbf7d |
Voice embedding data (54 voices) |
The voices-v1.0.bin file contains embeddings for 54 high-quality voices across multiple languages:
- American English - Multiple male and female voices
- British English - Various regional accents
- Japanese - Native speakers
- Mandarin Chinese - Multiple tones
- Spanish - International variations
- French - European French
- Hindi - Indian voices
- Italian - Native speakers
- Portuguese - Brazilian and European
- Base Model: Kokoro-82M (87 million parameters)
- Architecture: ONNX Runtime optimized
- Performance: 210× realtime speed on modern hardware
- Sample Rate: 24kHz
- Format: 16-bit PCM WAV output
- License: Apache 2.0 (see LICENSE file)
These models are automatically downloaded by Aurlo on first launch. They are stored in the application's data directory:
- macOS:
~/Library/Application Support/com.aurlo.app/models/ - Windows:
%APPDATA%\com.aurlo.app\models\
You can download the models manually from the Releases page.
Always verify the SHA256 checksums after downloading:
# macOS/Linux
shasum -a 256 kokoro-v1.0.onnx
shasum -a 256 voices-v1.0.bin
# Windows (PowerShell)
Get-FileHash kokoro-v1.0.onnx -Algorithm SHA256
Get-FileHash voices-v1.0.bin -Algorithm SHA256These models are based on the Kokoro-82M project and have been converted to ONNX format for optimal performance.
Original Model Credits:
- Kokoro-82M: hexgrad on Hugging Face
- ONNX Conversion: thewh1teagle/kokoro-onnx
These models are released under the Apache License 2.0 - see the LICENSE file for details.
The models maintain the same license as the original Kokoro project to ensure compatibility and proper attribution.
- Initial release
- Kokoro-82M base model
- 54 voice embeddings
- ONNX optimized for cross-platform deployment
- Input: Text (UTF-8 string)
- Output: 24kHz 16-bit mono PCM audio
- Inference Engine: ONNX Runtime
- Supported Platforms: macOS (ARM64, x86_64), Windows (x86_64)
- GPU Acceleration: Optional (CPU-only supported)
For issues related to:
- Model performance: Open an issue in this repository
- Aurlo application: Open an issue in seed-blocks/aurlo
- Original Kokoro model: See hexgrad/Kokoro-82M
If you use these models in your research or application, please cite the original Kokoro project:
@misc{kokoro2024,
title={Kokoro-82M: Compact Text-to-Speech Model},
author={hexgrad},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/hexgrad/Kokoro-82M}}
}Repository: seed-blocks/aurlo-models Main Application: seed-blocks/aurlo License: Apache 2.0