Python 3.13.1 app that:
- Continuously records microphone audio to a WAV file
- Streams audio to Deepgram for real-time speech-to-text
- Broadcasts each transcript segment over a local WebSocket server
- Includes a minimal browser client that prints incoming text
Concurrency model: asyncio + threads
- Thread (microphone): captures raw PCM (
linear16) chunks from the microphone into queues - Thread (WAV writer): writes audio to disk continuously (no interruptions)
- Async task (Deepgram): streams audio to Deepgram over WebSocket + parses transcript messages
- Async task (WS server): broadcasts transcript events to all connected clients
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
pip install -e ".[dev]"Create a .env file (auto-loaded) from the template:
Copy-Item .env.example .envThen edit .env and set DEEPGRAM_API_KEY.
You can also set it directly:
$env:DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"Common optional settings:
WS_HOST(default0.0.0.0)WS_PORT(default8765)WS_PATH(default/ws)OUT_DIR(defaultrecordings)AUDIO_INPUT_DEVICE(optional int index or device name)SEND_INTERIM(defaultfalse) – settrueto stream interim results to clients
See .env.example for the full list.
speech-stream-processorOpen web/client.html (connects to ws://localhost:8765/ws).
Each message broadcasted to clients is JSON:
{"type":"transcript","text":"hello world","is_final":true,"confidence":0.98,"start":0.0,"duration":1.2,"received_at":"2025-01-01T00:00:00+00:00"}docker build -t speech-stream-processor .
docker run -e DEEPGRAM_API_KEY="your_key" -p 8765:8765 speech-stream-processorNote: microphone access from Docker containers varies by OS.
python -m pytestsrc/speech_stream_processor/
audio/ # Microphone capture + WAV recording
broadcast/ # Fan-out broadcaster
deepgram/ # Deepgram streaming client
server/ # Local WebSocket server
app.py # Orchestrator
config.py # Environment configuration
web/client.html # Browser client
tests/ # Unit tests
