Voicebox

Local-first AI voice studio — clone voices, synthesize speech, dictate into any application, and connect MCP agents to custom voices. All inference runs on your hardware; audio never leaves your machine unless you explicitly configure remote access.

Version 0.5.0 · Bun monorepo · TypeScript-first stack · Tauri desktop · React UI

Overview

Voicebox combines voice input (Whisper STT + global dictation hotkey) and voice output (seven TTS engines with cloning, effects, and chunking) in a single application. A bundled local LLM refines transcripts and drives per-profile personas.

Surface	Stack	Purpose
Desktop app	Tauri 2 + Rust	Native hotkeys, audio I/O, sidecar management
Shared UI	React 18 + TypeScript	Studio interface shared by desktop and web
API layer	TypeScript-first service interfaces	Voice and session integrations
Web deploy	Vite + optional persistence cache	Browser-accessible build for self-hosting
Marketing	Next.js	Public site at voicebox.sh

Architecture

System Context

flowchart TB
  subgraph Client["Client Layer"]
    Desktop["Tauri Desktop"]
    Browser["Web Browser"]
  end

  subgraph UI["Shared React UI (app/)"]
    Routes["TanStack Router"]
    Stores["Zustand Stores"]
    Platform["Platform Adapters"]
  end

  subgraph Backend["API Service Layer"]
    API["REST + SSE Routes"]
    SVC["Services + Task Queue"]
    ENG["TTS / STT Engines"]
    MCP["MCP Server /mcp"]
    DB[(SQLite)]
  end

  subgraph Persist["Optional Persistence"]
    Cache[(Persistence Cache)]
  end

  Desktop --> UI
  Browser --> UI
  UI -->|HTTP / SSE| API
  Desktop -->|Sidecar| Backend
  API --> SVC --> ENG
  SVC --> DB
  MCP --> SVC
  Browser -.->|Web deploy| Cache

Request Flow — Speech Generation

sequenceDiagram
  participant U as User
  participant R as React UI
  participant F as API Service
  participant Q as Task Queue
  participant E as TTS Engine

  U->>R: Enter text + select profile
  R->>F: POST /generations
  F->>Q: Enqueue synthesis job
  Q->>E: Run engine inference
  E-->>Q: Audio chunks + progress
  Q-->>F: SSE progress events
  F-->>R: Stream status
  R-->>U: Playback + download

Platform Abstraction

Native capabilities (filesystem, updater, audio capture) are defined in app/src/platform/types.ts. Each deployment target provides its own implementation:

Desktop — tauri/src/platform/ (full native access)
Web — web/src/platform/ (browser-safe stubs)

Feature Highlights

Seven TTS engines — Qwen3-TTS, Qwen CustomVoice, LuxTTS, Chatterbox Multilingual/Turbo, HumeAI TADA, Kokoro
Voice cloning — Zero-shot cloning from a short reference sample
23 languages — Multilingual synthesis across major language families
Global dictation — System-wide hotkey with paste-into-focused-field
Audio effects — Pitch, reverb, delay, chorus, compression, filters
MCP integration — HTTP MCP server for agent-driven voice workflows
Local LLM refinement — Bundled Qwen3 model for transcript cleanup
Optional persistence cache — Session and deployment persistence for web hosting

Project Structure

voicebox/
├── app/                 # Shared React application
├── tauri/               # Desktop shell (Vite + Rust)
├── web/                 # Browser deployment wrapper
│   └── src/server/      # Node persistence bootstrap
├── landing/             # Marketing site
├── src/                 # Root TypeScript utilities and runtime modules
│   └── redis/           # Persistence connection manager + cache API
├── scripts/             # Build and release automation
├── biome.json           # Lint + format (JS/TS)
└── justfile             # Primary dev orchestration

Installation

Prerequisites

Tool	Version	Notes
Bun	≥ 1.0	JavaScript package manager
Rust	stable	Required for Tauri desktop builds
FFmpeg	any recent	Audio processing

Quick Setup (recommended)

# Clone and enter the repository
git clone https://github.com/jamiepine/voicebox.git
cd voicebox

# Install JavaScript/TypeScript dependencies
just setup

# Start development mode
just dev

Local Development

bun run dev:web
# Web app available at http://127.0.0.1:5173

Desktop Release

Download pre-built binaries from GitHub Releases for macOS, Windows, and Linux.

Configuration

Application

Variable	Default	Description
`VOICEBOX_HOST`	`127.0.0.1`	API bind address
`VOICEBOX_PORT`	`17493`	API port
`VOICEBOX_CORS_ORIGINS`	—	Comma-separated extra CORS origins
`LOG_LEVEL`	`info`	Application logging level

Persistence Cache (optional — web deployments)

Variable	Default	Description
`VOICEBOX_REDIS_ENABLED`	`false`	Enable Redis persistence
`VOICEBOX_REDIS_URL`	—	Full connection URL
`VOICEBOX_REDIS_HOST`	`127.0.0.1`	Host when URL not set
`VOICEBOX_REDIS_PORT`	`6379`	Port
`VOICEBOX_REDIS_KEY_PREFIX`	`voicebox:`	Key namespace

See .env.redis.example for the full list.

When the persistence cache is disabled or unreachable, the connection manager transparently falls back to in-memory storage.

MCP Client

Point your MCP client at http://127.0.0.1:17493/mcp. Example configuration is in .mcp.json.

Development

Common Commands

just dev              # Web development mode
bun run dev:web       # Web UI only
bun run typecheck     # TypeScript across all workspaces
bun run check         # Biome lint + format
bun run test          # TypeScript persistence unit tests
just check            # Repository checks

Adding a TTS Engine

Follow the agent skill at .agents/skills/add-tts-engine/SKILL.md and mirror existing patterns in the TypeScript app modules.

Testing

Layer	Command	Framework
TypeScript	`bun run test`	Vitest (TypeScript persistence tests)
TypeScript	`bun run typecheck`	tsc strict mode
Rust	`cargo test` (in `tauri/src-tauri`)	cargo test
CI	`bun run ci`	typecheck + biome + vitest + web build

# Run everything locally before opening a PR
bun run ci

Troubleshooting

Service won't start

Confirm dependencies are installed (just setup).
Check port 17493 is free: curl http://127.0.0.1:17493/health.
Review logs for startup failures and missing runtime dependencies.

Tauri dev compile fails (missing sidecar)

Run bun run setup:dev to create placeholder sidecar binaries, then restart bun run dev.

Web build type errors

Run bun run typecheck from the repository root. All workspaces (app, web, tauri, landing) plus root src/ are included.

Persistence connection errors

Set VOICEBOX_REDIS_ENABLED=false to use in-memory fallback, or verify the cache backend is reachable:

docker run --rm -p 6379:6379 redis:7-alpine
export VOICEBOX_REDIS_ENABLED=true
bun run bootstrap:persistence

Model download failures

Ensure sufficient disk space and network access for HuggingFace downloads. Set HF_TOKEN for gated models.

Extended guides: docs.voicebox.sh → Troubleshooting.

Contributing

Fork the repository and create a feature branch from main.
Run just setup if this is your first contribution.
Follow existing conventions — Biome for TypeScript.
Ensure bun run ci passes before opening a PR.
Update documentation when changing user-facing behavior.

See CONTRIBUTING.md for coding standards and review expectations.

FAQ

Is my voice data sent to the cloud?
No. By default all models and audio remain on your machine. Remote mode requires explicit configuration.

Which GPU is supported?
NVIDIA CUDA, Apple MLX (macOS), AMD ROCm (Windows/Linux overlay), and CPU fallback.

Can I use Voicebox as an API server?
Yes. The app exposes a REST API surface when running. Bind beyond localhost only on trusted networks.

Why optional persistence cache?
The cache layer is optional and primarily benefits self-hosted web deployments that need shared session or cache state across processes.

How does this differ from ElevenLabs or Wispr Flow?
Voicebox is open source, runs locally, and covers both speech synthesis and dictation in one stack with MCP agent integration.

License

See LICENSE for details.

Links: Website · Documentation · Releases · Security Policy

Name		Name	Last commit message	Last commit date
Latest commit History 628 Commits
.github		.github
app		app
data		data
landing		landing
scripts		scripts
src		src
tauri		tauri
tests/redis		tests/redis
web		web
.biomeignore		.biomeignore
.editorconfig		.editorconfig
.env.example		.env.example
.env.redis.example		.env.redis.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.npmrc		.npmrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RESPONSIBLE_USE.md		RESPONSIBLE_USE.md
SECURITY.md		SECURITY.md
biome.json		biome.json
bun.lock		bun.lock
justfile		justfile
package.json		package.json
tsconfig.base.json		tsconfig.base.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Voicebox

Table of Contents

Overview

Architecture

System Context

Request Flow — Speech Generation

Platform Abstraction

Feature Highlights

Project Structure

Installation

Prerequisites

Quick Setup (recommended)

Local Development

Desktop Release

Configuration

Application

Persistence Cache (optional — web deployments)

MCP Client

Development

Common Commands

Adding a TTS Engine

Testing

Troubleshooting

Service won't start

Tauri dev compile fails (missing sidecar)

Web build type errors

Persistence connection errors

Model download failures

Contributing

FAQ

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages