Self-hosted enterprise voice AI — neural TTS, voice cloning, conversational agents, and workflow automation, in one stack.
Built as the foundation for proprietary Voice Agent IP and Rapid Routing IP. Replaces ElevenLabs (TTS + cloning), n8n (workflow orchestration), and external agent platforms (conversational AI) with a single, fully-owned system.
| Capability | What it replaces | Status |
|---|---|---|
| Neural Text-to-Speech | ElevenLabs, OpenAI TTS | ✅ Working — Piper (CPU, 7 languages incl. Arabic) |
| Speech-to-Text | Deepgram, AssemblyAI | ✅ Working — faster-whisper (CPU/GPU) |
| Voice Cloning | ElevenLabs cloning | ✅ Endpoint live — XTTS-v2 ready for GPU |
| Conversational Agents | Vapi, Retell, custom builds | ✅ Working — pluggable LLM, sealed IP layer |
| Workflow Automation | n8n, Zapier | ✅ Working — 14 step types, webhook triggers |
| Multi-Channel Inbox | Intercom, custom CRM glue | ✅ Foundation — Voice/WhatsApp/Email/SMS/Web/IG |
| Sector Personas | Hand-built per client | ✅ 5 pre-built (Insurance, Auto, Edu, Telecom, FinServ) |
| Credentials Vault | n8n credentials | ✅ Encrypted at rest, 9 providers |
All open-source models. No third-party calls unless explicitly configured. Deployable on a single VPS or scaled with Kubernetes.
┌──────────────────────────────────────────────────────────────────────────┐
│ Voice Platform │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌────────────────────────────────────────┐ │
│ │ Dashboard │◄────► │ FastAPI │ │
│ │ Next.js 14 │ │ │ │
│ └──────────────────┘ │ /tts /stt /voices /clone │ │
│ │ /agents /agents/personas │ │
│ ┌──────────────────┐ │ /workflows + /webhooks/{slug} │ │
│ │ External: │◄────► │ /channels /contacts /credentials │ │
│ │ Twilio · SIP │ │ /conversations /ws/voice/{id} │ │
│ │ WhatsApp · SMS │ │ │ │
│ └──────────────────┘ └─┬──────────────────┬────────────────┬──┘ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌──────────────┐ ┌──────────┐│
│ │ Engines │ │ Plugins │ │ Stores ││
│ │ │ │ │ │ ││
│ │ Piper (TTS) │ │ voice_ │ │ Postgres ││
│ │ Whisper (STT) │ │ agent_ip ◄──┼──┤ Redis ││
│ │ Claude · GPT │ │ rapid_ │ │ MinIO ││
│ │ │ │ routing_ip │ │ ││
│ └─────────────────┘ └──────────────┘ └──────────┘│
│ ▲ │
│ (sealed surfaces ───┘ │
│ for proprietary IP) │
└──────────────────────────────────────────────────────────────────────────┘
Five pre-built agent identities — pattern-matched to Nineteen58 case studies — installable in one click.
| Persona | Industry | Channels | Use cases |
|---|---|---|---|
| Gabby | Insurance | Voice + WhatsApp | Free-consult activation, will drafting, renewal nudges |
| Hannah | Automotive | WhatsApp + SMS | Service reminders, booking, vehicle sales nurture |
| Beth | Higher Education | Voice + WhatsApp | Cold lead recovery, eligibility screening, hand-off |
| Mira | Financial Services | Voice + SMS + WhatsApp | First-stage collections, promise-to-pay, payment plans |
| Smiley | Telecom | Voice + SMS | First-line support, outbound upsell, account self-service |
Each persona ships with: tuned system prompt, default routing rules, recommended tools, KPI definitions, and compliance defaults.
n8n-equivalent for voice/AI pipelines. Definitions are JSON; triggered by REST, webhook, or schedule.
14 step types: tts · stt · agent_chat · http · anthropic · openai · voice_clone · audio_concat · twilio_sms · set_var · conditional · parallel · delay · log
Templating: {{input.x}} · {{vars.x}} · {{steps.<id>.output.<field>}}
Triggers: manual · webhook (POST /webhooks/<slug>) · schedule (cron)
9 pre-built templates: TTS, STT→Agent→TTS, voicemail summarization, article→podcast, bilingual greeting, IVR with SMS fallback, voice clone from URL, daily status broadcast.
This is the most important section for the joint venture conversation.
The platform has exactly two surfaces where proprietary IP plugs in. Everything else is generic, open-source, and replaceable.
The agent reasoning module. Override reason() to replace stock LLM behavior with proprietary planning, multi-step logic, custom retrieval, etc.
The intent classifier. Override route() to replace default keyword routing with low-latency proprietary classification.
Both surfaces have:
- Stable, documented contracts (see
docs/ip-integration.md) - Working fallback implementations so the platform runs without IP loaded
- Three deployment modes: git submodule, private pip package, or remote gRPC service
The IP never enters this repository. It is loaded as a sealed dependency, preserving the boundary for licensing and valuation.
Requires Docker Desktop.
git clone https://github.com/smizzle8/voice-platform.git
cd voice-platform
cp .env.example .env
docker compose up- Dashboard → http://localhost:3000
- API → http://localhost:8000/docs
- MinIO → http://localhost:9001 (admin/admin1234)
The first start downloads ~100MB of voice models (English + Arabic). Once ready, hit the Speech Studio and generate audio.
# .env
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...Then docker compose up -d api.
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up
# .env: TTS_ENGINE=xtts STT_ENGINE=whisper WHISPER_DEVICE=cuda- Backend: FastAPI · SQLAlchemy 2 · Pydantic 2 · Postgres · Redis · MinIO
- Engines: Piper TTS · faster-whisper · XTTS-v2 (GPU)
- LLM: Anthropic · OpenAI · pluggable (defaults to mock for offline dev)
- Frontend: Next.js 14 · React 18 · Tailwind · TypeScript · Lucide icons
- Telephony: Twilio Programmable Voice (TwiML stream → WebSocket)
- Infra: Docker Compose for dev, Kubernetes-ready
- GDPR + UAE PDPL ready (consent metadata + region-locked deployment)
- Cloned voices carry consent records and audio watermarking
- Encryption at rest for all credential secrets
- No telephony data leaves your infrastructure unless you configure a third-party LLM
- Configurable retention windows for transcripts, recordings, samples
Phase 1 (now) — TTS, STT, agents, workflows, channels, contacts, credentials, dashboard Phase 2 — Real Twilio inbound (TwiML stream live), WhatsApp Cloud API, scheduled workers, multi-tenant billing Phase 3 — SIP trunk, fine-tuned voice cloning, Voice Agent IP integration, Rapid Routing IP integration, GCC-localized models, marketplace
voice-platform/
├── api/ FastAPI backend
│ └── app/
│ ├── routes/ /tts, /stt, /agents, /workflows, /channels, /contacts, /credentials...
│ ├── models/ SQLAlchemy: Workspace, Agent, Voice, Workflow, Channel, Contact...
│ ├── engines/ Pluggable TTS/STT engines (mock + real)
│ ├── agents/ Runtime, LLM clients, sector personas
│ ├── workflows/ Workflow runner + templates
│ ├── plugins/ ⬅ proprietary IP boundaries
│ └── security/ Symmetric encryption for credentials
├── dashboard/ Next.js 14 dashboard
│ └── app/ Pages: Overview, Speech, Voices, Clone, Agents, Workflows, Channels, Contacts, Credentials, Settings
├── docs/ Architecture, IP integration, compliance
├── bruno/ API collection
├── scripts/ E2E demo + utilities
├── docker-compose.yml Full stack
└── Makefile dev / test / seed / logs
Proprietary — internal use. Not for redistribution.