Hey — I built a Docker image that packages s2.cpp with the Q8_0 GGUF model for easy deployment on GPUs with 12GB VRAM.
Image: ghcr.io/orrinwitt/s2-tts:latest
Repo: https://github.com/orrinwitt/s2-tts
Features:
- Q8_0 model and tokenizer baked into the image (no external downloads)
- CUDA support via NVIDIA Container Toolkit
- HTTP server mode on port 3030 (
/generate endpoint)
- Voice cloning via multipart form data
- S2-Pro
[bracket] emotion tag syntax supported
- Runs on RTX 3060 12GB with room to spare
Quick start:
services:
s2-tts:
image: ghcr.io/orrinwitt/s2-tts:latest
ports:
- "3030:3030"
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['1']
capabilities: [gpu]
API-only (no WebUI) — designed for programmatic use. See the README for full docs including emotion tags, voice cloning, and text formatting rules.
Thanks for the great work on s2.cpp!
Hey — I built a Docker image that packages s2.cpp with the Q8_0 GGUF model for easy deployment on GPUs with 12GB VRAM.
Image:
ghcr.io/orrinwitt/s2-tts:latestRepo: https://github.com/orrinwitt/s2-tts
Features:
/generateendpoint)[bracket]emotion tag syntax supportedQuick start:
API-only (no WebUI) — designed for programmatic use. See the README for full docs including emotion tags, voice cloning, and text formatting rules.
Thanks for the great work on s2.cpp!