Convert e-books (FB2, EPUB, TXT) to MP3 audiobooks using various Text-to-Speech technologies.
- Five TTS providers - RHVoice, Piper, Silero, Coqui XTTS-v2 and ElevenLabs
- 60+ voices - Multiple voices in Russian and English
- Format support - FB2, EPUB, TXT
- Speed control - from 0.5x to 2.0x
- Multiple themes - Light/dark/system theme support
- Multi-language UI - English and Russian interface
- Auto-splitting - large books split into parts
- GPU acceleration - CUDA support for faster generation
Speed: Fast | Quality: Good | Offline
Lightweight offline engine based on Windows SAPI with minimal installation size (~15 MB per voice). Provides instant speech generation with very low CPU usage, making it perfect for converting large books quickly.
- Aleksandr (Male)
- Irina, Anna, Elena (Female)
- Bdl, Alan (Male)
- Slt, Clb (Female)
Download: RHVoice releases
Speed: Fast | Quality: Good | Offline
Neural TTS engine powered by ONNX Runtime. Offers excellent voice quality with fast generation — processes text 10-50x faster than real-time on most CPUs.
- Denis, Dmitri, Ruslan (Male)
- Irina (Female)
US voices: Amy, Kathleen, Kristin, HFC Female, LJSpeech (Female) • Arctic, Bryce, Danny, HFC Male, Joe, John, Kusal, L2Arctic, Lessac, LibriTTS, Norman, Reza Ibrahim, Ryan, Sam (Male)
GB voices: Alba, Cori, Jenny Dioco, Southern English Female (Female) • Alan, Aru, Northern English Male, Semaine, VCTK (Male)
Download models: Piper Voices
Speed: Medium | Quality: Excellent | Offline
Advanced neural TTS engine built on PyTorch. Delivers natural, expressive speech with excellent prosody.
- Aidar, Eugene (Male)
- Baya, Kseniya, Xenia (Female)
- Male 1, Male 2 (Male)
- Female 1, Female 2 (Female)
Models download automatically on first use (~100-200 MB).
Speed: Slow | Quality: Premium | Offline
State-of-the-art multilingual model with 14 built-in speaker voices. Produces the most natural-sounding speech among local engines with exceptional emotional range and prosody.
Female: Claribel Dervla, Daisy Studious, Gracie Wise, Tammie Ema, Alison Dietlinde, Ana Florence, Annmarie Nele, Asya Anara
Male: Andrew Chipper, Badr Odhiambo, Dionisio Schuyler, Royston Min, Viktor Eka, Abrahan Mack
Supports 17 languages including Russian, English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Dutch, Czech, Arabic, Chinese, Japanese, Hungarian, Korean, and Hindi.
Speed: Fast | Quality: Premium | Online
Premium cloud-based TTS with cutting-edge AI voice synthesis. Offers studio-quality output with remarkable naturalness.
- Adam (Male)
- Rachel (Female)
- Adam, Josh, Sam (Male)
- Rachel, Domi, Bella (Female)
Setup:
- Create account at ElevenLabs
- Get API key from your profile
- Add to
.envfile:
ELEVENLABS_API_KEY=your_api_key_here
- Node.js 18+
- Windows 10/11
- Python 3.9+ (for Silero and Coqui)
- Clone repository
git clone <repo-url>
cd voicecraft- Install dependencies
npm install- Setup TTS components
Run the universal setup script:
# Via npm (recommended)
npm run setup
# Or directly via PowerShell
powershell .\scripts\setup-all.ps1This will install:
- Piper TTS
- FFmpeg
- Silero TTS
- Coqui XTTS-v2
- Download voice models
Download and install from RHVoice releases. Voices will be automatically detected after installation.
Download voices from Piper releases and extract to:
tts_resources/piper/voices/
Structure:
tts_resources/
piper/
voices/
ru_RU/
denis/
medium/
ru_RU-denis-medium.onnx
ru_RU-denis-medium.onnx.json
en_US/
lessac/
medium/
en_US-lessac-medium.onnx
en_US-lessac-medium.onnx.json
Models download automatically on first use (~100-200 MB).
# Install only Silero
npm run setup:sileroModel downloads automatically on first use (~2 GB). Requires Python 3.9+ and GPU recommended for faster generation.
Add your API key to .env file:
ELEVENLABS_API_KEY=your_api_key_here
npm run devnpm run build
npm run packagevoicecraft/
├── electron/ # Electron main process
│ ├── main.ts # Main process entry point
│ ├── preload.ts # IPC bridge (preload script)
│ ├── main/
│ │ ├── window.ts # Window management
│ │ └── handlers/ # IPC handlers
│ └── services/
│ ├── parser.ts # Book parsing
│ ├── setup/ # Dependency installation
│ └── tts/ # TTS services
├── src/ # React frontend
│ ├── App.tsx # Main component
│ ├── i18n/ # Internationalization (EN/RU)
│ ├── components/ # UI components
│ ├── hooks/ # React hooks
│ ├── fsm/ # State machine
│ └── utils/ # Utility functions
├── tts_resources/ # TTS resources
│ ├── piper/ # Piper TTS
│ ├── silero/ # Silero TTS
│ ├── coqui/ # Coqui XTTS-v2
│ ├── ffmpeg/ # FFmpeg for conversion
│ └── tts_server.py # Python TTS server
├── scripts/
│ ├── setup-all.ps1 # Universal setup
│ ├── setup-silero.ps1 # Setup only Silero
│ └── release.cjs # Release automation
└── .env # Environment variables (API keys)
| Provider | Speed | Quality | Model Size | Type | Recommendation |
|---|---|---|---|---|---|
| RHVoice | Fast | Good | ~15 MB | CPU | Quick processing |
| Piper | Fast | Good | ~50 MB | CPU | Balanced option |
| Silero | Medium | Excellent | ~100-200 MB | CPU/GPU | Natural Russian voices |
| Coqui | Slow | Premium | ~2 GB | CPU/GPU | Best offline quality |
| ElevenLabs | Fast | Premium | Cloud | API | Best overall quality |
Silero and Coqui support hardware acceleration for faster speech generation:
| Accelerator | Supported GPUs | PyTorch Size | Speed Boost |
|---|---|---|---|
| CUDA | NVIDIA (GTX 10xx+, RTX series) | ~2.3 GB | 3-10x |
| Intel XPU | Intel Arc, Iris Xe, UHD 7xx+ | ~500 MB | 2-5x |
| CPU | Any | ~200 MB | Baseline |
-
During initial setup: When installing Silero or Coqui, select your preferred accelerator (CUDA, Intel XPU, or CPU)
-
Change accelerator later: Go to Settings → TTS Setup → click "Reinstall" button next to Silero or Coqui to change the accelerator
For NVIDIA CUDA:
- NVIDIA GPU with CUDA support (GTX 10xx or newer recommended)
- Latest NVIDIA drivers installed
- Automatically detected via
nvidia-smi
For Intel XPU:
- Intel Arc, Iris Xe, or UHD Graphics 7xx+
- Latest Intel GPU drivers
- Intel Extension for PyTorch (installed automatically)
The application automatically detects available GPUs:
- Priority order: CUDA → Intel XPU → CPU
- GPU name and VRAM are displayed in the setup dialog
- If no compatible GPU is found, CPU mode is used
- RHVoice: up to 30 parallel threads
- Piper: up to 10 parallel threads
- Silero: up to 5 parallel threads
- Coqui: 1 thread (sequential processing)
- ElevenLabs: up to 3 parallel requests
- Install RHVoice from official releases
- Restart application after installation
- Voices are detected automatically via Windows SAPI
- Make sure voice models are downloaded
- Check directory structure
.onnxand.onnx.jsonfiles must be in same folder
- Normal - it uses PyTorch models
- First run downloads models (~100-200 MB)
- For large books consider Piper or RHVoice
- First run downloads ~2 GB model
- GPU recommended for faster generation
- Requires Python 3.9+
- Check that
tts_resources/coqui/venvexists
- Check that API key is set in
.envfile - Verify API key is valid at elevenlabs.io
- Check internet connection
- Make sure FFmpeg is installed:
npm run setup - Check that
tts_resources/ffmpeg/ffmpeg.exeexists
MIT