-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap
sarmakska edited this page Jun 7, 2026
·
3 revisions
- Full-duplex orchestrator with an IDLE / LISTEN / THINK / SPEAK state machine
- Stateful VAD with hysteresis and hangover and a trailing-silence flush
- Streaming STT, LLM, and TTS pipeline with token-to-audio handoff on the first token
- Self-hosted default stack: Groq Llama 4, Whisper.cpp, OpenTTS Coqui XTTS v2
- Pluggable adapters per layer (Groq, SarmaLink-AI, OpenAI, Whisper.cpp, Deepgram, OpenAI Whisper, OpenTTS, Cartesia, ElevenLabs)
- Barge-in handling that aborts the in-flight LLM and TTS streams and resets the adapters
- Function-call passthrough with a server-side tool registry
- Conversation history retained across turns, bounded to the recent window
- End-to-end tests with fixtures, plus per-adapter unit tests
- Lint, typecheck, build, and test in CI
- Next.js web client
- Production VAD (silero-vad-onnx) behind the existing
frameRmsseam, keeping the hysteresis and hangover layer on top - Word-level interim results via the Deepgram streaming WebSocket SDK
- Multi-language hot swap mid-call
- Per-adapter latency dashboards
- mediasoup or LiveKit transport adapter at the edge for SFU fan-out
- Native iOS and Android clients
- Voice cloning marketplace (use the TTS provider directly)
- Hosted SaaS layer (this is open-source infrastructure)
- A LiveKit replacement (this works alongside LiveKit, not against it)
Pick from Planned, open an issue, fork, branch, push, PR. Keep changes small and conventional.
I will not merge:
- Framework swaps (Fastify and Next.js stay)
- Sync handlers (everything is async and streaming)
- Adapters for paid-only providers with no free-tier or self-hosted path
Releases: see GitHub Releases.