Real-time speech-to-text translation overlay for live events. Captures audio from a microphone, transcribes via ElevenLabs Scribe, translates to a target language, and displays subtitles as an OBS browser source overlay.
Built for meetups and presentations where the audience speaks a different language than the presenter.
Mic Audio ──▶ Browser (getUserMedia)
│
▼
ElevenLabs Scribe v2
(WebSocket, realtime STT)
│
▼
English Text
│
┌─────────┴─────────┐
▼ ▼
Google Cloud DeepL
Translation (REST)
(v2 / v3)
│ │
▼ ▼
Translated Text Translated Text
│ │
└─────────┬─────────┘
▼
Subtitle / Comparison
│
▼
OBS Browser Source
git clone https://github.com/nfrith/live-translation.git
cd live-translation
cp .env.example .env
# Fill in your API keys in .envELEVENLABS_API_KEY=your_key DEEPL_API_KEY=your_key GOOGLE_API_KEY=your_key node token-server.js --serverNavigate to http://localhost:3847. Click Start to begin transcription.
Single translated subtitle bar at the bottom of the screen. Designed as an OBS browser source overlay on top of a presentation.
Side-by-side evaluation of translation providers (Google v2, Google v3, DeepL). Each utterance shows both translations with per-provider latency. Useful for evaluating translation quality before an event.
Toggle with the Toggle Compare button or ?compare=1 URL param.
Add as Browser Source:
- URL:
http://localhost:3847?obs=1 - Width: 1920
- Height: 1080
The ?obs=1 parameter hides controls and makes the background transparent.
| Parameter | Effect |
|---|---|
?obs=1 |
Hide controls, transparent background (OBS mode) |
?compare=1 |
Start in comparison mode |
?autostart=1 |
Auto-start transcription |
Presenter ──▶ Wireless Mic ──▶ Mixer ──▶ USB Audio Interface ──▶ Your Laptop
│
Translation App
│
Presenter Laptop ──▶ Capture Card ──▶ OBS (your laptop) ──▶ Projector
│
Subtitle overlay composited here
The presenter doesn't need to run any software. Their laptop goes through a capture card into OBS on the operator's machine, where the subtitle overlay is composited on top.
| Service | Purpose | Cost |
|---|---|---|
| ElevenLabs | Speech-to-text (Scribe v2 Realtime) | ~$0.28/hour |
| Google Cloud Translation | Translation (v2 REST / v3 TLLM) | ~$20/1M chars |
| DeepL | Translation (comparison/alternative) | Free tier available |
| File | Purpose |
|---|---|
index.html |
Main app — all CSS and JS inline |
token-server.js |
Serves the app + generates ElevenLabs tokens + proxies translation APIs |
.env.example |
API key configuration template |
The server reads API keys from environment variables. See .env.example for the shape.
For Google Cloud Translation v3 (Gemini-powered TLLM), you also need:
gcloud auth application-default loginMIT