Skip to content

smizzle8/voice-platform

Repository files navigation

Voice Platform

Self-hosted enterprise voice AI — neural TTS, voice cloning, conversational agents, and workflow automation, in one stack.

Built as the foundation for proprietary Voice Agent IP and Rapid Routing IP. Replaces ElevenLabs (TTS + cloning), n8n (workflow orchestration), and external agent platforms (conversational AI) with a single, fully-owned system.


What it does

Capability What it replaces Status
Neural Text-to-Speech ElevenLabs, OpenAI TTS ✅ Working — Piper (CPU, 7 languages incl. Arabic)
Speech-to-Text Deepgram, AssemblyAI ✅ Working — faster-whisper (CPU/GPU)
Voice Cloning ElevenLabs cloning ✅ Endpoint live — XTTS-v2 ready for GPU
Conversational Agents Vapi, Retell, custom builds ✅ Working — pluggable LLM, sealed IP layer
Workflow Automation n8n, Zapier ✅ Working — 14 step types, webhook triggers
Multi-Channel Inbox Intercom, custom CRM glue ✅ Foundation — Voice/WhatsApp/Email/SMS/Web/IG
Sector Personas Hand-built per client ✅ 5 pre-built (Insurance, Auto, Edu, Telecom, FinServ)
Credentials Vault n8n credentials ✅ Encrypted at rest, 9 providers

All open-source models. No third-party calls unless explicitly configured. Deployable on a single VPS or scaled with Kubernetes.


Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                          Voice Platform                                  │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌──────────────────┐       ┌────────────────────────────────────────┐  │
│   │   Dashboard      │◄────► │  FastAPI                               │  │
│   │   Next.js 14     │       │                                        │  │
│   └──────────────────┘       │  /tts /stt /voices /clone              │  │
│                              │  /agents /agents/personas              │  │
│   ┌──────────────────┐       │  /workflows  +  /webhooks/{slug}       │  │
│   │   External:      │◄────► │  /channels /contacts /credentials      │  │
│   │   Twilio · SIP   │       │  /conversations  /ws/voice/{id}        │  │
│   │   WhatsApp · SMS │       │                                        │  │
│   └──────────────────┘       └─┬──────────────────┬────────────────┬──┘  │
│                                ▼                  ▼                ▼      │
│                       ┌─────────────────┐  ┌──────────────┐  ┌──────────┐│
│                       │   Engines       │  │   Plugins    │  │  Stores  ││
│                       │                 │  │              │  │          ││
│                       │   Piper (TTS)   │  │  voice_      │  │ Postgres ││
│                       │   Whisper (STT) │  │  agent_ip ◄──┼──┤  Redis   ││
│                       │   Claude · GPT  │  │  rapid_      │  │  MinIO   ││
│                       │                 │  │  routing_ip  │  │          ││
│                       └─────────────────┘  └──────────────┘  └──────────┘│
│                                              ▲                           │
│                          (sealed surfaces ───┘                           │
│                           for proprietary IP)                            │
└──────────────────────────────────────────────────────────────────────────┘

Sector Personas

Five pre-built agent identities — pattern-matched to Nineteen58 case studies — installable in one click.

Persona Industry Channels Use cases
Gabby Insurance Voice + WhatsApp Free-consult activation, will drafting, renewal nudges
Hannah Automotive WhatsApp + SMS Service reminders, booking, vehicle sales nurture
Beth Higher Education Voice + WhatsApp Cold lead recovery, eligibility screening, hand-off
Mira Financial Services Voice + SMS + WhatsApp First-stage collections, promise-to-pay, payment plans
Smiley Telecom Voice + SMS First-line support, outbound upsell, account self-service

Each persona ships with: tuned system prompt, default routing rules, recommended tools, KPI definitions, and compliance defaults.


Workflow Engine

n8n-equivalent for voice/AI pipelines. Definitions are JSON; triggered by REST, webhook, or schedule.

14 step types: tts · stt · agent_chat · http · anthropic · openai · voice_clone · audio_concat · twilio_sms · set_var · conditional · parallel · delay · log

Templating: {{input.x}} · {{vars.x}} · {{steps.<id>.output.<field>}}

Triggers: manual · webhook (POST /webhooks/<slug>) · schedule (cron)

9 pre-built templates: TTS, STT→Agent→TTS, voicemail summarization, article→podcast, bilingual greeting, IVR with SMS fallback, voice clone from URL, daily status broadcast.


Proprietary IP Integration

This is the most important section for the joint venture conversation.

The platform has exactly two surfaces where proprietary IP plugs in. Everything else is generic, open-source, and replaceable.

Surface 1 — api/app/plugins/voice_agent_ip.py

The agent reasoning module. Override reason() to replace stock LLM behavior with proprietary planning, multi-step logic, custom retrieval, etc.

Surface 2 — api/app/plugins/rapid_routing_ip.py

The intent classifier. Override route() to replace default keyword routing with low-latency proprietary classification.

Both surfaces have:

  • Stable, documented contracts (see docs/ip-integration.md)
  • Working fallback implementations so the platform runs without IP loaded
  • Three deployment modes: git submodule, private pip package, or remote gRPC service

The IP never enters this repository. It is loaded as a sealed dependency, preserving the boundary for licensing and valuation.


Quick start

Requires Docker Desktop.

git clone https://github.com/smizzle8/voice-platform.git
cd voice-platform
cp .env.example .env
docker compose up

The first start downloads ~100MB of voice models (English + Arabic). Once ready, hit the Speech Studio and generate audio.

Real LLM responses

# .env
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...

Then docker compose up -d api.

GPU acceleration

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up
# .env: TTS_ENGINE=xtts STT_ENGINE=whisper WHISPER_DEVICE=cuda

Tech stack

  • Backend: FastAPI · SQLAlchemy 2 · Pydantic 2 · Postgres · Redis · MinIO
  • Engines: Piper TTS · faster-whisper · XTTS-v2 (GPU)
  • LLM: Anthropic · OpenAI · pluggable (defaults to mock for offline dev)
  • Frontend: Next.js 14 · React 18 · Tailwind · TypeScript · Lucide icons
  • Telephony: Twilio Programmable Voice (TwiML stream → WebSocket)
  • Infra: Docker Compose for dev, Kubernetes-ready

Compliance

  • GDPR + UAE PDPL ready (consent metadata + region-locked deployment)
  • Cloned voices carry consent records and audio watermarking
  • Encryption at rest for all credential secrets
  • No telephony data leaves your infrastructure unless you configure a third-party LLM
  • Configurable retention windows for transcripts, recordings, samples

Roadmap

Phase 1 (now) — TTS, STT, agents, workflows, channels, contacts, credentials, dashboard Phase 2 — Real Twilio inbound (TwiML stream live), WhatsApp Cloud API, scheduled workers, multi-tenant billing Phase 3 — SIP trunk, fine-tuned voice cloning, Voice Agent IP integration, Rapid Routing IP integration, GCC-localized models, marketplace


Repository structure

voice-platform/
├── api/                  FastAPI backend
│   └── app/
│       ├── routes/       /tts, /stt, /agents, /workflows, /channels, /contacts, /credentials...
│       ├── models/       SQLAlchemy: Workspace, Agent, Voice, Workflow, Channel, Contact...
│       ├── engines/      Pluggable TTS/STT engines (mock + real)
│       ├── agents/       Runtime, LLM clients, sector personas
│       ├── workflows/    Workflow runner + templates
│       ├── plugins/      ⬅ proprietary IP boundaries
│       └── security/     Symmetric encryption for credentials
├── dashboard/            Next.js 14 dashboard
│   └── app/              Pages: Overview, Speech, Voices, Clone, Agents, Workflows, Channels, Contacts, Credentials, Settings
├── docs/                 Architecture, IP integration, compliance
├── bruno/                API collection
├── scripts/              E2E demo + utilities
├── docker-compose.yml    Full stack
└── Makefile              dev / test / seed / logs

License

Proprietary — internal use. Not for redistribution.

About

Self-hosted voice AI — TTS, STT, voice cloning, conversational agents, workflows. Independent of ElevenLabs and n8n.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors