Voice Platform

Self-hosted enterprise voice AI — neural TTS, voice cloning, conversational agents, and workflow automation, in one stack.

Built as the foundation for proprietary Voice Agent IP and Rapid Routing IP. Replaces ElevenLabs (TTS + cloning), n8n (workflow orchestration), and external agent platforms (conversational AI) with a single, fully-owned system.

What it does

Capability	What it replaces	Status
Neural Text-to-Speech	ElevenLabs, OpenAI TTS	✅ Working — Piper (CPU, 7 languages incl. Arabic)
Speech-to-Text	Deepgram, AssemblyAI	✅ Working — faster-whisper (CPU/GPU)
Voice Cloning	ElevenLabs cloning	✅ Endpoint live — XTTS-v2 ready for GPU
Conversational Agents	Vapi, Retell, custom builds	✅ Working — pluggable LLM, sealed IP layer
Workflow Automation	n8n, Zapier	✅ Working — 14 step types, webhook triggers
Multi-Channel Inbox	Intercom, custom CRM glue	✅ Foundation — Voice/WhatsApp/Email/SMS/Web/IG
Sector Personas	Hand-built per client	✅ 5 pre-built (Insurance, Auto, Edu, Telecom, FinServ)
Credentials Vault	n8n credentials	✅ Encrypted at rest, 9 providers

All open-source models. No third-party calls unless explicitly configured. Deployable on a single VPS or scaled with Kubernetes.

Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                          Voice Platform                                  │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌──────────────────┐       ┌────────────────────────────────────────┐  │
│   │   Dashboard      │◄────► │  FastAPI                               │  │
│   │   Next.js 14     │       │                                        │  │
│   └──────────────────┘       │  /tts /stt /voices /clone              │  │
│                              │  /agents /agents/personas              │  │
│   ┌──────────────────┐       │  /workflows  +  /webhooks/{slug}       │  │
│   │   External:      │◄────► │  /channels /contacts /credentials      │  │
│   │   Twilio · SIP   │       │  /conversations  /ws/voice/{id}        │  │
│   │   WhatsApp · SMS │       │                                        │  │
│   └──────────────────┘       └─┬──────────────────┬────────────────┬──┘  │
│                                ▼                  ▼                ▼      │
│                       ┌─────────────────┐  ┌──────────────┐  ┌──────────┐│
│                       │   Engines       │  │   Plugins    │  │  Stores  ││
│                       │                 │  │              │  │          ││
│                       │   Piper (TTS)   │  │  voice_      │  │ Postgres ││
│                       │   Whisper (STT) │  │  agent_ip ◄──┼──┤  Redis   ││
│                       │   Claude · GPT  │  │  rapid_      │  │  MinIO   ││
│                       │                 │  │  routing_ip  │  │          ││
│                       └─────────────────┘  └──────────────┘  └──────────┘│
│                                              ▲                           │
│                          (sealed surfaces ───┘                           │
│                           for proprietary IP)                            │
└──────────────────────────────────────────────────────────────────────────┘

Sector Personas

Five pre-built agent identities — pattern-matched to Nineteen58 case studies — installable in one click.

Persona	Industry	Channels	Use cases
Gabby	Insurance	Voice + WhatsApp	Free-consult activation, will drafting, renewal nudges
Hannah	Automotive	WhatsApp + SMS	Service reminders, booking, vehicle sales nurture
Beth	Higher Education	Voice + WhatsApp	Cold lead recovery, eligibility screening, hand-off
Mira	Financial Services	Voice + SMS + WhatsApp	First-stage collections, promise-to-pay, payment plans
Smiley	Telecom	Voice + SMS	First-line support, outbound upsell, account self-service

Each persona ships with: tuned system prompt, default routing rules, recommended tools, KPI definitions, and compliance defaults.

Workflow Engine

n8n-equivalent for voice/AI pipelines. Definitions are JSON; triggered by REST, webhook, or schedule.

14 step types: tts · stt · agent_chat · http · anthropic · openai · voice_clone · audio_concat · twilio_sms · set_var · conditional · parallel · delay · log

Templating: {{input.x}} · {{vars.x}} · {{steps.<id>.output.<field>}}

Triggers: manual · webhook (POST /webhooks/<slug>) · schedule (cron)

9 pre-built templates: TTS, STT→Agent→TTS, voicemail summarization, article→podcast, bilingual greeting, IVR with SMS fallback, voice clone from URL, daily status broadcast.

Proprietary IP Integration

This is the most important section for the joint venture conversation.

The platform has exactly two surfaces where proprietary IP plugs in. Everything else is generic, open-source, and replaceable.

Surface 1 — `api/app/plugins/voice_agent_ip.py`

The agent reasoning module. Override reason() to replace stock LLM behavior with proprietary planning, multi-step logic, custom retrieval, etc.

Surface 2 — `api/app/plugins/rapid_routing_ip.py`

The intent classifier. Override route() to replace default keyword routing with low-latency proprietary classification.

Both surfaces have:

Stable, documented contracts (see docs/ip-integration.md)
Working fallback implementations so the platform runs without IP loaded
Three deployment modes: git submodule, private pip package, or remote gRPC service

The IP never enters this repository. It is loaded as a sealed dependency, preserving the boundary for licensing and valuation.

Quick start

Requires Docker Desktop.

git clone https://github.com/smizzle8/voice-platform.git
cd voice-platform
cp .env.example .env
docker compose up

Dashboard → http://localhost:3000
API → http://localhost:8000/docs
MinIO → http://localhost:9001 (admin/admin1234)

The first start downloads ~100MB of voice models (English + Arabic). Once ready, hit the Speech Studio and generate audio.

Real LLM responses

# .env
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...

Then docker compose up -d api.

GPU acceleration

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up
# .env: TTS_ENGINE=xtts STT_ENGINE=whisper WHISPER_DEVICE=cuda

Tech stack

Backend: FastAPI · SQLAlchemy 2 · Pydantic 2 · Postgres · Redis · MinIO
Engines: Piper TTS · faster-whisper · XTTS-v2 (GPU)
LLM: Anthropic · OpenAI · pluggable (defaults to mock for offline dev)
Frontend: Next.js 14 · React 18 · Tailwind · TypeScript · Lucide icons
Telephony: Twilio Programmable Voice (TwiML stream → WebSocket)
Infra: Docker Compose for dev, Kubernetes-ready

Compliance

GDPR + UAE PDPL ready (consent metadata + region-locked deployment)
Cloned voices carry consent records and audio watermarking
Encryption at rest for all credential secrets
No telephony data leaves your infrastructure unless you configure a third-party LLM
Configurable retention windows for transcripts, recordings, samples

Roadmap

Phase 1 (now) — TTS, STT, agents, workflows, channels, contacts, credentials, dashboard Phase 2 — Real Twilio inbound (TwiML stream live), WhatsApp Cloud API, scheduled workers, multi-tenant billing Phase 3 — SIP trunk, fine-tuned voice cloning, Voice Agent IP integration, Rapid Routing IP integration, GCC-localized models, marketplace

Repository structure

voice-platform/
├── api/                  FastAPI backend
│   └── app/
│       ├── routes/       /tts, /stt, /agents, /workflows, /channels, /contacts, /credentials...
│       ├── models/       SQLAlchemy: Workspace, Agent, Voice, Workflow, Channel, Contact...
│       ├── engines/      Pluggable TTS/STT engines (mock + real)
│       ├── agents/       Runtime, LLM clients, sector personas
│       ├── workflows/    Workflow runner + templates
│       ├── plugins/      ⬅ proprietary IP boundaries
│       └── security/     Symmetric encryption for credentials
├── dashboard/            Next.js 14 dashboard
│   └── app/              Pages: Overview, Speech, Voices, Clone, Agents, Workflows, Channels, Contacts, Credentials, Settings
├── docs/                 Architecture, IP integration, compliance
├── bruno/                API collection
├── scripts/              E2E demo + utilities
├── docker-compose.yml    Full stack
└── Makefile              dev / test / seed / logs

License

Proprietary — internal use. Not for redistribution.

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.github/workflows		.github/workflows
api		api
bruno/voice-platform		bruno/voice-platform
dashboard-vite		dashboard-vite
dashboard		dashboard
docs		docs
infra/helm/voice-platform		infra/helm/voice-platform
observability		observability
perf		perf
phases		phases
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.phase		.phase
HANDOFF.md		HANDOFF.md
Makefile		Makefile
PLAN_n8n_integration.md		PLAN_n8n_integration.md
README.md		README.md
SESSION_RESUME.md		SESSION_RESUME.md
conductor.py		conductor.py
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Platform

What it does

Architecture

Sector Personas

Workflow Engine

Proprietary IP Integration

Surface 1 — `api/app/plugins/voice_agent_ip.py`

Surface 2 — `api/app/plugins/rapid_routing_ip.py`

Quick start

Real LLM responses

GPU acceleration

Tech stack

Compliance

Roadmap

Repository structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Platform

What it does

Architecture

Sector Personas

Workflow Engine

Proprietary IP Integration

Surface 1 — api/app/plugins/voice_agent_ip.py

Surface 2 — api/app/plugins/rapid_routing_ip.py

Quick start

Real LLM responses

GPU acceleration

Tech stack

Compliance

Roadmap

Repository structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Surface 1 — `api/app/plugins/voice_agent_ip.py`

Surface 2 — `api/app/plugins/rapid_routing_ip.py`

Packages