Skip to content

Roadmap

sarmakska edited this page Jun 7, 2026 · 3 revisions

Roadmap

Shipped (1.1.0)

  • Full-duplex orchestrator with an IDLE / LISTEN / THINK / SPEAK state machine
  • Stateful VAD with hysteresis and hangover and a trailing-silence flush
  • Streaming STT, LLM, and TTS pipeline with token-to-audio handoff on the first token
  • Self-hosted default stack: Groq Llama 4, Whisper.cpp, OpenTTS Coqui XTTS v2
  • Pluggable adapters per layer (Groq, SarmaLink-AI, OpenAI, Whisper.cpp, Deepgram, OpenAI Whisper, OpenTTS, Cartesia, ElevenLabs)
  • Barge-in handling that aborts the in-flight LLM and TTS streams and resets the adapters
  • Function-call passthrough with a server-side tool registry
  • Conversation history retained across turns, bounded to the recent window
  • End-to-end tests with fixtures, plus per-adapter unit tests
  • Lint, typecheck, build, and test in CI
  • Next.js web client

Planned

  • Production VAD (silero-vad-onnx) behind the existing frameRms seam, keeping the hysteresis and hangover layer on top
  • Word-level interim results via the Deepgram streaming WebSocket SDK
  • Multi-language hot swap mid-call
  • Per-adapter latency dashboards
  • mediasoup or LiveKit transport adapter at the edge for SFU fan-out
  • Native iOS and Android clients

Will not ship

  • Voice cloning marketplace (use the TTS provider directly)
  • Hosted SaaS layer (this is open-source infrastructure)
  • A LiveKit replacement (this works alongside LiveKit, not against it)

Contribute

Pick from Planned, open an issue, fork, branch, push, PR. Keep changes small and conventional.

I will not merge:

  • Framework swaps (Fastify and Next.js stay)
  • Sync handlers (everything is async and streaming)
  • Adapters for paid-only providers with no free-tier or self-hosted path

Releases: see GitHub Releases.

Clone this wiki locally