Speech-MCP

A modern multi-provider speech gateway featuring Gemini Live real-time voice chat, Gemini 3.1 Flash TTS, Hume AI Octave, and ElevenLabs voice cloning.

The Dual-Core Experience

MCP Server — Advanced speech, RAG, and state management for agents and IDEs (Claude Desktop, Cursor, Windsurf).

Modern Webapp — A browser-based cockpit for real-time voice conversations, Creative Labs polyglot synthesis, voice clone management, and system monitoring.

Providers

Provider	Mode	Quality	Key
`gemini_live`	Real-time conversation	Very good	`GOOGLE_API_KEY`
`gemini`	Batch TTS	Highest	`GOOGLE_API_KEY`
`hume`	Batch TTS (Octave)	High	`HUME_API_KEY`
`elevenlabs`	Batch TTS + voice cloning	High	`ELEVENLABS_API_KEY`
`windows`	Batch TTS (SAPI5)	Low	None

Key Features

Gemini Live Real-Time Voice Chat — Full-duplex WebSocket session with gemini-3.1-flash-live-preview. Sub-second latency, barge-in interruption, affective dialog, input/output transcripts. Designed for robot control (Yahboom) and conversational agents.

Gemini 3.1 Flash TTS — Highest-quality batch synthesis (gemini-3.1-flash-tts-preview, released 2026-04-15). 31 prebuilt voices, 100+ languages, expressive audio tags ([whispers], [excited], [dramatically], etc.).

Creative Labs — Polyglot synthesis demo with 19 languages (European, Slavic, Classical, Experimental, Global), literary samples, voice selection, prosody slider, and tongue-twister panel.

Voice Cloning — ElevenLabs Instant Voice Clone (IVC) via file upload. 5-second minimum audio sample. Cloned voices appear in the voice library immediately.

Offline Wake-Word — Privacy-first detection using openWakeWord (fully offline, Apache 2.0, no API key).

RAG / Semantic Search — LanceDB + FastEmbed knowledge base over project docs. ask_docs tool uses Claude sampling for grounded Q&A.

Local AI — Ollama and LM Studio model discovery and grounded generation.

Documentation

Installation
Configuration reference
Local voice alternatives ← kyutai-mcp / offline
Gemini Live voice chat ← new
Architecture
openWakeWord
Yahboom robot integration
RAG technical overview
Modern speech AI

Quick Start

# Clone and install
git clone https://github.com/sandraschi/speech-mcp
cd speech-mcp
uv sync

# Configure keys
cp .env.example .env
# Edit .env — add GOOGLE_API_KEY at minimum

# Start backend
uv run python -m speech_mcp.webapp

# Start frontend (separate terminal)
cd web && npm install && npm run dev

Backend: http://localhost:10918 — Frontend: http://localhost:10917

For Claude Desktop MCP integration see docs/configuration.md.

License

MIT — see LICENSE.

Contributors: @sandraschi. PRs welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
data/lancedb/speech_docs.lance/_versions		data/lancedb/speech_docs.lance/_versions
docs		docs
examples		examples
scratch		scratch
scripts		scripts
src/speech_mcp		src/speech_mcp
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
GEMINI.md		GEMINI.md
INSTALL.md		INSTALL.md
MANIFEST.md		MANIFEST.md
PRD.md		PRD.md
README.md		README.md
SECURITY.md		SECURITY.md
arazzo.yaml		arazzo.yaml
fix_and_start.ps1		fix_and_start.ps1
glama.json		glama.json
inspect_routes.py		inspect_routes.py
justfile		justfile
llms-full.txt		llms-full.txt
llms.txt		llms.txt
pyproject.toml		pyproject.toml
release.ps1		release.ps1
repo_assessment.md		repo_assessment.md
server_advanced.py_tmp		server_advanced.py_tmp
start.bat		start.bat
start.ps1		start.ps1
test_gemini.wav		test_gemini.wav
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-MCP

The Dual-Core Experience

Providers

Key Features

Documentation

Quick Start

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-MCP

The Dual-Core Experience

Providers

Key Features

Documentation

Quick Start

License

About

Topics

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages