voice-agent

A real-time spoken assistant shell: mic → speech-to-text → brain → text-to-speech, with voice-activity detection, turn-taking, wake-word gating, and (optionally) voice-driven machine control. Local-first; built on Pipecat.

It's a pluggable voice shell — it owns the audio loop and turn-taking and delegates cognition to a swappable "brain" over a small HTTP/SSE protocol. Point it at a raw LLM (BRAIN=local) or at a full tool-using agent. There is no code dependency on any particular brain.

Companion project

gabagent is the reference brain — a tool-using coding/desktop agent with an escalating-tier safety model. The two are loosely coupled — docs and protocol only, no code dependency in either direction. The brain↔shell contract lives in gabagent's docs/VOICE_PROTOCOL.md. Run voice-agent with BRAIN=local and never touch gabagent, or wire them together for a full voice-driven agent.

Brain-agnostic, with known rough edges. The design is brain-agnostic (the brains/ seam, BRAIN=local default), but some gabagent-specific naming has crept in (e.g. a gabagent.duck_exclude output-stream property, the /media/* duck contract). Renaming these to neutral terms is tracked for a later pass.

Stack

Audio / pipeline: Pipecat 1.3.x — local audio transport, VAD (Silero), turn-taking (SmartTurn v3), half-duplex with optional barge-in
STT: Whisper (local) — swappable (e.g. Deepgram) via .env
TTS: Kokoro (local) — swappable
LLM (BRAIN=local): Claude (claude-sonnet-4-6), or any OpenAI-compatible / local Ollama endpoint
Wake word: openWakeWord / nanowakeword / Porcupine, behind one gate

Everything is selected by environment variables — see .env.example.

Quick start

Requires Python 3.12 (via uv), system portaudio and espeak-ng, and an ANTHROPIC_API_KEY for the default brain.

cp .env.example .env        # set ANTHROPIC_API_KEY, then pick STT / TTS / LLM / brain
uv sync
./run.sh                    # or: uv run python main.py

./run.sh modes: no arg = brain from .env; ./run.sh local = raw LLM; ./run.sh gab = gabagent brain.

Wake word

While media is playing, the agent requires a wake word before commands reach STT (sidestepping speech-over-music mis-transcription) and pre-ducks the audio on wake. A bare openWakeWord wakewords/aria.onnx ships as a starting point; train your own (e.g. "hey aria") per wakewords/README.md and the wake-train/ recipe. Speaker-specific voice models are kept local (not committed) — train one for your own voice.

Safety

When driven by a tool-using brain, machine control sits behind a 3-tier guardrail: hard denylist → verbal-confirmation gate → read-only auto-run. The guardrail is brain-owned — review the brain's denylist before the first "full control" run.

Status

Active development — the APIs and the brain protocol may still change. See PLAN.md for the architecture and roadmap.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
brains		brains
scratch		scratch
tests		tests
tools		tools
wakewords		wakewords
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
audio_resample.py		audio_resample.py
config.py		config.py
input_watchdog.py		input_watchdog.py
main.py		main.py
pyproject.toml		pyproject.toml
response_latency.py		response_latency.py
run.sh		run.sh
tts_gain.py		tts_gain.py
turn_cap.py		turn_cap.py
uv.lock		uv.lock
vad_diag.py		vad_diag.py
wake_nano.py		wake_nano.py
wake_porcupine.py		wake_porcupine.py
wake_word.py		wake_word.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-agent

Companion project

Stack

Quick start

Wake word

Safety

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

voice-agent

Companion project

Stack

Quick start

Wake word

Safety

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages