-
Notifications
You must be signed in to change notification settings - Fork 0
Quick Start
git clone https://github.com/sarmakska/voice-agent-starter.git
cd voice-agent-starter
pnpm install
cp .env.example .env
pnpm devThis starts the server on :3001 and the web client on :3000. Open http://localhost:3000, click Start, grant microphone access, and talk.
The defaults run a fully self-hosted, open-source stack with no per-minute provider fees:
-
LLM: Groq Llama 4. Set
GROQ_API_KEYfrom the Groq console. Groq is a hosted API but has a generous free tier and is the fastest path to a sub-300ms first token. -
STT: Whisper.cpp. Run a
whisper-serverand pointWHISPERCPP_URLat it (defaulthttp://localhost:8090). -
TTS: OpenTTS Coqui XTTS v2. Run an OpenTTS server and point
OPENTTS_URLat it (defaulthttp://localhost:5500). The default voice iscoqui-tts:en_vctk#xtts_v2.
The quickest way to stand up the two self-hosted servers is their official containers: ghcr.io/ggml-org/whisper.cpp for Whisper.cpp and synesthesiam/opentts for OpenTTS. Expose them on the ports above and the defaults work with no further configuration.
Every layer is one environment variable. To run entirely on hosted APIs:
STT_PROVIDER=deepgram
LLM_PROVIDER=openai
TTS_PROVIDER=elevenlabs
DEEPGRAM_API_KEY=...
OPENAI_API_KEY=...
ELEVENLABS_API_KEY=...
See .env.example for the full list.
If you have not configured any providers yet, the pipeline still runs end-to-end. The LLM adapter yields a single configuration message and the STT and TTS adapters return nothing, but the IDLE/LISTEN/THINK/SPEAK transitions, barge-in, and tool-call routing all work. This is useful for verifying the transport and state machine before standing up the servers, and it is exactly what the end-to-end test suite drives.
- Browser status flips to listening when you start talking.
- Partial transcripts appear as you speak.
- THINK starts when you stop (trailing-silence flush produces the final transcript).
- SPEAK plays back the response, starting from the first sentence.
- Interrupting the response cancels it and returns to LISTEN.
pnpm lint
pnpm typecheck
pnpm build
pnpm testThe test suite runs the full pipeline through fake adapters, so it passes with no provider keys.
See the Troubleshooting table on the Home page for the common symptoms and fixes.