A Chrome extension + FastAPI backend that translates live audio from web chat apps (Google Meet, Zoom, Discord, etc.) in real time. Hear everything in your native language, or route translated audio into the call.
flowchart LR
subgraph ext ["Chrome Extension (MV3)"]
A["Tab Audio\n(tabCapture)"] --> B["Offscreen Doc\n(PCM extract)"]
B --> C["Service Worker\n(orchestrator)"]
G["Translated Audio"] --> C
C --> H["Offscreen Playback\n(selected output device)"]
H --> I["BlackHole / Speakers"]
end
subgraph web ["Web Dashboard"]
J["Sign In\n(Clerk Auth)"] --> K["Profile Setup\n(language, name)"]
K --> L["Voice Warmup\n(record & clone)"]
end
C -- WebSocket --> D["FastAPI Backend"]
D --> E["Speechmatics STT\n+ RT Translation"]
E --> F["TTS Provider\n(MiniMax / Speechmatics)"]
F --> G
L -- "Voice profile" --> M["Convex DB"]
D -- "Lookup voice profile" --> M
Audio routing flow:
| Route | What happens |
|---|---|
| Speakers (default) | You hear the translated audio locally |
| BlackHole → Meet | Translated audio is routed into the call as your "microphone" input |
cd backend
cp .env.example .env # Add your API keys
uv sync
uv run uvicorn main:app --reload --port 8000cd extension
bun install
bun run devThen load in Chrome:
- Go to
chrome://extensions - Enable Developer Mode
- Click Load unpacked → select
extension/distfolder - Allow microphone permission when prompted (needed for device enumeration)
cd web
npm install
npx convex dev # Start Convex dev server (needs CONVEX_DEPLOYMENT in .env.local)
npm run dev # Starts on http://localhost:5174The web dashboard requires a .env.local file with:
VITE_CONVEX_URL=<your Convex deployment URL>
VITE_CLERK_PUBLISHABLE_KEY=<your Clerk publishable key>
VITE_SERVER_URL=http://localhost:8000
VITE_CONVEX_SITE_URL=<your Convex site URL>
CONVEX_DEPLOYMENT=<your Convex deployment>- Open any web chat (Google Meet, YouTube, etc.)
- Click the Interpreter extension icon
- Pick source + target language
- Select an output device (speakers or BlackHole)
- Hit Start Translation
- Hear translated audio live 🎧
Notes:
- Original tab audio passthrough is disabled in offscreen capture, so you should not hear untranslated + translated from the extension at the same time.
- If output is set to BlackHole 2ch, local speakers are silent by design unless you monitor with a Multi-Output device.
The web dashboard (web/) is a React app for managing user profiles and voice cloning. It lets users:
- Sign in via Clerk authentication
- Set a display name and preferred language
- Record a voice sample and create a MiniMax voice clone
When voice cloning is enabled, the backend looks up the speaker's voice profile from Convex during translation so listeners hear translated speech rendered in the original speaker's cloned voice.
Stack: React 18, Vite, Convex (database + file storage), Clerk (auth)
How it connects:
- User signs in and creates a profile on the web dashboard.
- User records a voice sample; the app uploads it to Convex storage and sends it to MiniMax to create a voice clone (
voiceProfileId). - During a live call, the backend queries the Convex HTTP endpoint (
GET /api/voice-profile?userId=...) to fetch the speaker'svoiceProfileId. - If a valid profile exists, TTS renders in the cloned voice; otherwise it falls back to a standard voice.
To let other meeting participants hear the translated audio:
brew install blackhole-2chOr download from existential.audio/blackhole.
- Extension: In the popup, select BlackHole 2ch as the Translation Output device
- Google Meet: Go to Meet Settings → Audio → set Microphone to BlackHole 2ch
Now when you start translation, the translated audio plays into BlackHole, which Meet picks up as your microphone input. Other participants hear the translation.
Tip: To hear the call yourself while routing audio into Meet, create a macOS Multi-Output Device in Audio MIDI Setup that combines your speakers + BlackHole.
| Service | Credit | How to Get |
|---|---|---|
| Speechmatics | $200 (code: VOICEAGENT200) |
portal.speechmatics.com |
| MiniMax | $20 | minimax.io |
You can tune backend chunking and partial update behavior in backend/.env:
# Translation chunking
TRANSLATION_TRIGGER_CHAR_THRESHOLD=24
# Partial translated-text UI throttling
TRANSLATION_PARTIAL_MIN_DELTA_CHARS=12
TRANSLATION_PARTIAL_MIN_INTERVAL_MS=300
# Speechmatics finalization speed
SPEECHMATICS_MAX_DELAY=1.0
SPEECHMATICS_RT_WS_URL=wss://eu.rt.speechmatics.com/v2/
# Translation provider mode
USE_SPEECHMATICS_TRANSLATION=1
SPEECHMATICS_TRANSLATION_ENABLE_PARTIALS=1
# TTS provider mode
TTS_PROVIDER=speechmatics
# TTS_PROVIDER=minimax
# SPEECHMATICS_TTS_OUTPUT_FORMAT=wav_16000
# SPEECHMATICS_TTS_VOICE_ID=sarahGuidance:
- Keep
USE_SPEECHMATICS_TRANSLATION=1for the lowest end-to-end delay. TTS_PROVIDER=speechmaticsis now the default.- Switch to
TTS_PROVIDER=minimaxfor broader multilingual voice coverage. - Lower
TRANSLATION_TRIGGER_CHAR_THRESHOLDfor faster response. - Higher
TRANSLATION_TRIGGER_CHAR_THRESHOLDfor fewer, larger chunks. - Lower
SPEECHMATICS_MAX_DELAYfor faster final transcripts. - Raise partial throttles if live text appears to constantly rewrite.
- Extension: React, TypeScript, Vite, CRXJS, Chrome MV3
- Backend: Python, FastAPI, WebSocket, uv
- Web Dashboard: React 18, TypeScript, Vite, Convex, Clerk
- STT: Speechmatics Real-time API
- Translation: Speechmatics RT Translation (recommended low-latency mode) or MiniMax M2 fallback
- TTS: MiniMax Speech 2.8 Turbo (default) or Speechmatics preview TTS (test option)
- Voice Cloning: MiniMax Voice Clone API (via web dashboard voice warmup flow)
- Audio Routing: BlackHole (macOS virtual audio loopback)