VibeVoice FastAPI Server

A production-ready FastAPI server that exposes the VibeVoice TTS model as an OpenAI-compatible API, with Docker support and comprehensive voice management.

🚀 Features

Docker-First Deployment: Production-ready Docker setup with GPU support (recommended)
OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API (/v1/audio/speech)
Unlimited Custom Voices: Automatically load any voice from a directory - just drop audio files and restart
Multi-Format Support: MP3, WAV, FLAC, AAC, M4A, Opus, PCM
Streaming Support: Real-time audio streaming for long-form content
Voice Management: Dynamic voice loading, OpenAI voice mapping, and custom voice presets
Production Ready: Health checks, error handling, CORS support, and comprehensive logging

📋 Quick Start

Docker is the recommended deployment method - it handles all dependencies, ensures consistent environments, and is production-ready.

Docker Deployment (Recommended)

# Clone the repository
git clone https://github.com/ncoder-ai/VibeVoice-FastAPI.git
cd VibeVoice-FastAPI

# Copy and configure environment
cp docker-env.example .env
# Edit .env - set VOICES_DIR to your voice files path

# Build and run
docker-compose up -d

# Check logs
docker-compose logs -f

The API will be available at http://localhost:8001

See DOCKER_QUICKSTART.md for detailed Docker instructions.

Local Installation (Alternative)

For development or if you prefer bare-metal installation:

# Clone the repository
git clone https://github.com/ncoder-ai/VibeVoice-FastAPI.git
cd VibeVoice-FastAPI

# Run setup script
./setup.sh

# Configure environment
cp env.example .env
# Edit .env with your settings

# Start server
./start.sh

📖 Documentation

API README - Complete API documentation with examples, voice management, and troubleshooting
Docker Quickstart - Docker deployment quickstart guide

🎯 API Endpoints

OpenAI-Compatible Endpoints

POST /v1/audio/speech - Generate speech from text (OpenAI-compatible)
GET /v1/audio/voices - List all available voices

VibeVoice-Specific Endpoints

POST /v1/vibevoice/generate - Advanced generation with multi-speaker support
GET /v1/vibevoice/voices - List all voices with detailed info
GET /v1/vibevoice/health - Detailed health check

Example: Generate Speech

curl -X POST http://localhost:8001/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, this is a test of the VibeVoice API",
    "voice": "alloy",
    "response_format": "mp3"
  }' \
  --output speech.mp3

Example: List Voices

# List all available voices
curl http://localhost:8001/v1/audio/voices

# List with OpenAI format
curl http://localhost:8001/v1/audio/voices | jq

MODEL MANAGEMENT

VibeeVoice Large: Huggingface: rsxdalv/VibeVoice-Large [https://huggingface.co/rsxdalv/VibeVoice-Large]

VibeVoice 1.5B: Huggingface microsoft/VibeVoice-1.5B [https://huggingface.co/microsoft/VibeVoice-1.5B]

🎤 Voice Management

Using OpenAI-Compatible Voices

The API includes 6 OpenAI-compatible voice mappings:

alloy, echo, fable, onyx, nova, shimmer

Using Custom Voices

Simply place audio files (.wav, .mp3, .flac, .m4a, etc.) in your VOICES_DIR and restart the server. All files are automatically loaded as voice presets!

# Add a custom voice
cp my_voice.wav /path/to/voices/custom_voice.wav
# Restart server - voice is now available!

Direct Voice Usage

You can use any voice name directly in API requests:

curl -X POST http://localhost:8001/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Testing custom voice",
    "voice": "custom_voice",
    "response_format": "wav"
  }'

⚙️ Configuration

Key environment variables (see env.example for full list):

# Model Configuration
VIBEVOICE_MODEL_PATH=microsoft/VibeVoice-1.5B  # or local path
VIBEVOICE_DEVICE=cuda                           # cuda, cpu, or mps
VIBEVOICE_INFERENCE_STEPS=10                    # 5-50, higher = better quality

# Voice Configuration
VOICES_DIR=demo/voices                           # Directory with voice files

# API Configuration
API_PORT=8001
API_CORS_ORIGINS=*

# Generation Defaults
DEFAULT_CFG_SCALE=1.3                            # 1.0-3.0
DEFAULT_RESPONSE_FORMAT=mp3

🐳 Docker Deployment

Docker is the recommended and preferred deployment method. It provides:

✅ Consistent environment across all systems
✅ No dependency conflicts
✅ Easy GPU configuration
✅ Production-ready setup
✅ Simplified updates and maintenance

Requirements

Docker and Docker Compose
NVIDIA Container Toolkit (for GPU support)
NVIDIA GPU with 8GB+ VRAM (for 1.5B model) or 16GB+ (for Large model)

Quick Start

# Copy and configure environment
cp docker-env.example .env
# Edit .env - set VOICES_DIR to your voice files path

# Build and run
docker-compose up -d

# Check logs
docker-compose logs -f

See DOCKER_QUICKSTART.md for complete Docker deployment guide.

🔧 Development

Setup Development Environment

# Create virtual environment
python3.12 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -e .
pip install -r requirements-api.txt

# Install PyTorch (CUDA)
pip install torch --index-url https://download.pytorch.org/whl/cu128

# Optional: Install flash-attn for faster inference
# See setup.sh for pre-built wheel installation

Running Tests

# Start server
./start.sh

# Test API
curl http://localhost:8001/health
curl http://localhost:8001/v1/audio/voices

📊 Supported Models

Model	Size	Context	Max Length	VRAM Required
VibeVoice-1.5B	1.5B	64K	~90 min	8GB+
VibeVoice-Large	7B	32K	~45 min	16GB+

Models are automatically downloaded from HuggingFace on first use.

🛠️ System Requirements

For Docker Deployment (Recommended):

Docker: Docker and Docker Compose installed
GPU: NVIDIA GPU with 8GB+ VRAM (for 1.5B model) or 16GB+ (for Large model)
NVIDIA Container Toolkit: Required for GPU support
RAM: 16GB minimum, 32GB recommended
Storage: 10GB minimum, 50GB recommended (with model cache)
OS: Linux (recommended), macOS, or Windows (with WSL2)

For Local Installation:

Python: 3.12
GPU: NVIDIA GPU with 8GB+ VRAM
RAM: 16GB minimum, 32GB recommended
Storage: 10GB minimum, 50GB recommended
OS: Linux, macOS, or Windows (with WSL2)

🔐 Security Notes

The API does not include authentication by default. For production use, add authentication middleware or deploy behind a reverse proxy with authentication.
Voice files are loaded from the configured directory - ensure proper file permissions.
Model weights are downloaded from HuggingFace - verify model integrity in production.

📝 License

This project maintains the original VibeVoice model codebase. Please refer to the original VibeVoice license for model usage terms.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

🙏 Acknowledgments

VibeVoice Team at Microsoft for the original model
shijincai for maintaining a backup of the original VibeVoice codebase
FastAPI for the excellent web framework
HuggingFace for model hosting and transformers library

📚 Additional Resources

⚠️ Limitations

Language Support: Primarily English and Chinese. Other languages may produce unexpected results.
Non-Speech Audio: The model focuses on speech synthesis and may generate background music or sounds spontaneously.
Commercial Use: This model is intended for research and development. Test thoroughly before production use.

🆘 Troubleshooting

Server won't start

Check GPU availability: nvidia-smi
Verify Python version: python3 --version (should be 3.12)
Check dependencies: pip list | grep torch

Out of memory errors

Use smaller model: VIBEVOICE_MODEL_PATH=microsoft/VibeVoice-1.5B
Reduce inference steps: VIBEVOICE_INFERENCE_STEPS=5
Use CPU mode: VIBEVOICE_DEVICE=cpu (much slower)

Voice not found errors

Verify VOICES_DIR path in .env
Check file permissions
Ensure audio files are in supported formats

For more help, see the API README or open an issue.

Made with ❤️ for the VibeVoice community

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Figures		Figures
api		api
demo		demo
vibevoice		vibevoice
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
API_README.md		API_README.md
DOCKER_QUICKSTART.md		DOCKER_QUICKSTART.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build-docker.sh		build-docker.sh
docker-compose.yml		docker-compose.yml
docker-env.example		docker-env.example
env.example		env.example
example_add_voice.sh		example_add_voice.sh
pyproject.toml		pyproject.toml
requirements-api.txt		requirements-api.txt
setup.sh		setup.sh
start.sh		start.sh

License

ncoder-ai/VibeVoice-FastAPI

Folders and files

Latest commit

History

Repository files navigation