Skip to content

ianva/voicebox

 
 

Repository files navigation

Voicebox

Voicebox

The open-source voice synthesis studio.
Clone voices. Generate speech. Build voice-powered apps.
All running locally on your machine.

voicebox.shDownloadFeaturesAPIRoadmap


Voicebox App Screenshot

Click the image above to watch the demo video on voicebox.sh


Voicebox Screenshot 2

Voicebox Screenshot 3


What is Voicebox?

Voicebox is a local-first voice cloning studio with DAW-like features for professional voice synthesis. Think of it as the Ollama for voice — download models, clone voices, and generate speech entirely on your machine.

Unlike cloud services that lock your voice data behind subscriptions, Voicebox gives you:

  • Complete privacy — models and voice data stay on your machine
  • Professional tools — multi-track timeline editor, audio trimming, conversation mixing
  • Model flexibility — currently powered by Qwen3-TTS, with support for XTTS, Bark, and other models coming soon
  • API-first — use the desktop app or integrate voice synthesis into your own projects
  • Native performance — built with Tauri (Rust), not Electron

Download a voice model, clone any voice from a few seconds of audio, and compose multi-voice projects with studio-grade editing tools. No Python install required, no cloud dependency, no limits.


Download

Voicebox is available now for macOS and Windows.

Platform Download
macOS (Apple Silicon) voicebox_aarch64.app.tar.gz
macOS (Intel) voicebox_x64.app.tar.gz
Windows (MSI) voicebox_0.1.0_x64_en-US.msi
Windows (Setup) voicebox_0.1.0_x64-setup.exe

Linux builds coming soon — Currently blocked by GitHub runner disk space limitations.


Features

Voice Cloning with Qwen3-TTS

Powered by Alibaba's Qwen3-TTS — a breakthrough model that achieves near-perfect voice cloning from just a few seconds of audio.

  • Instant cloning — Upload a sample, get a voice profile
  • High fidelity — Natural prosody, emotion, and cadence
  • Multi-language — English, Chinese, and more coming

Voice Profile Management

  • Create profiles from audio files or record directly in-app
  • Import/Export profiles to share or backup
  • Multi-sample support — combine multiple samples for higher quality cloning
  • Organize with descriptions and language tags

Speech Generation

  • Text-to-speech with any cloned voice
  • Batch generation for long-form content
  • Smart caching — regenerate instantly with voice prompt caching

Stories Editor

Create multi-voice narratives, podcasts, and conversations with a timeline-based editor.

  • Multi-track composition — arrange multiple voice tracks in a single project
  • Inline audio editing — trim and split clips directly in the timeline
  • Auto-playback — preview stories with synchronized playhead
  • Voice mixing — build conversations with multiple participants

Recording & Transcription

  • In-app recording with waveform visualization
  • System audio capture — record desktop audio on macOS and Windows
  • Automatic transcription powered by Whisper
  • Export recordings in multiple formats

Generation History

  • Full history of all generated audio
  • Search & filter by voice, text, or date
  • Re-generate any past generation with one click

Flexible Deployment

  • Local mode — Everything runs on your machine
  • Remote mode — Connect to a GPU server on your network
  • One-click server — Turn any machine into a Voicebox server

API

Voicebox exposes a full REST API, so you can integrate voice synthesis into your own apps.

# Generate speech
curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "profile_id": "abc123"}'

# List voice profiles
curl http://localhost:8000/api/profiles

# Create a profile from audio
curl -X POST http://localhost:8000/api/profiles \
  -F "audio=@voice-sample.wav" \
  -F "name=My Voice"

Use cases:

  • Game dialogue systems
  • Podcast/video production pipelines
  • Accessibility tools
  • Voice assistants
  • Content creation automation

Full API documentation available at http://localhost:8000/docs when running.


Tech Stack

Layer Technology
Desktop App Tauri (Rust)
Frontend React, TypeScript, Tailwind CSS
State Zustand, React Query
Backend FastAPI (Python)
Voice Model Qwen3-TTS
Transcription Whisper
Database SQLite
Audio WaveSurfer.js, librosa

Why this stack?

  • Tauri over Electron — 10x smaller bundle, native performance, lower memory
  • FastAPI — Async Python with automatic OpenAPI schema generation
  • Type-safe end-to-end — Generated TypeScript client from OpenAPI spec

Roadmap

Voicebox is the beginning of something bigger. Here's what's coming:

Coming Soon

Feature Description
Real-time Synthesis Stream audio as it generates, word by word
Conversation Mode Multi-speaker dialogues with automatic turn-taking
Voice Effects Pitch shift, reverb, M3GAN-style effects
Timeline Editor Audio studio with word-level precision editing
More Models XTTS, Bark, and other open-source voice models

Future Vision

  • Voice Design — Create new voices from text descriptions
  • Project System — Save and load complex multi-voice sessions
  • Plugin Architecture — Extend with custom models and effects
  • Mobile Companion — Control Voicebox from your phone

Voicebox aims to be the one-stop shop for everything voice — cloning, synthesis, editing, effects, and beyond.


Development

See CONTRIBUTING.md for detailed setup and contribution guidelines.

Quick Start

# Clone the repo
git clone https://github.com/voicebox-sh/voicebox.git
cd voicebox

# Install dependencies
bun install

# Install Python dependencies
cd backend && pip install -r requirements.txt && cd ..

# Start development
bun run dev

Prerequisites: Bun, Rust, Python 3.11+. CUDA-capable GPU recommended (CPU inference supported but slower).

Project Structure

voicebox/
├── app/              # Shared React frontend
├── tauri/            # Desktop app (Tauri + Rust)
├── web/              # Web deployment
├── backend/          # Python FastAPI server
├── landing/          # Marketing website
└── scripts/          # Build & release scripts

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

  1. Fork the repo
  2. Create a feature branch
  3. Make your changes
  4. Submit a PR

Security

Found a security vulnerability? Please report it responsibly. See SECURITY.md for details.


License

MIT License — see LICENSE for details.


voicebox.sh

About

The open-source voice synthesis studio.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 63.0%
  • Python 24.8%
  • Rust 9.3%
  • Shell 1.9%
  • CSS 0.7%
  • JavaScript 0.2%
  • HTML 0.1%