Skip to content

nfrith/live-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Live Translation Overlay

Real-time speech-to-text translation overlay for live events. Captures audio from a microphone, transcribes via ElevenLabs Scribe, translates to a target language, and displays subtitles as an OBS browser source overlay.

Built for meetups and presentations where the audience speaks a different language than the presenter.

How It Works

Mic Audio ──▶ Browser (getUserMedia)
                    │
                    ▼
             ElevenLabs Scribe v2
             (WebSocket, realtime STT)
                    │
                    ▼
              English Text
                    │
          ┌─────────┴─────────┐
          ▼                   ▼
    Google Cloud          DeepL
    Translation           (REST)
    (v2 / v3)
          │                   │
          ▼                   ▼
     Translated Text     Translated Text
          │                   │
          └─────────┬─────────┘
                    ▼
          Subtitle / Comparison
                    │
                    ▼
          OBS Browser Source

Quick Start

1. Clone and configure

git clone https://github.com/nfrith/live-translation.git
cd live-translation
cp .env.example .env
# Fill in your API keys in .env

2. Start the server

ELEVENLABS_API_KEY=your_key DEEPL_API_KEY=your_key GOOGLE_API_KEY=your_key node token-server.js --server

3. Open the app

Navigate to http://localhost:3847. Click Start to begin transcription.

Modes

Subtitle Mode (default)

Single translated subtitle bar at the bottom of the screen. Designed as an OBS browser source overlay on top of a presentation.

Comparison Mode

Side-by-side evaluation of translation providers (Google v2, Google v3, DeepL). Each utterance shows both translations with per-provider latency. Useful for evaluating translation quality before an event.

Toggle with the Toggle Compare button or ?compare=1 URL param.

OBS Setup

Add as Browser Source:

  • URL: http://localhost:3847?obs=1
  • Width: 1920
  • Height: 1080

The ?obs=1 parameter hides controls and makes the background transparent.

URL Parameters

Parameter Effect
?obs=1 Hide controls, transparent background (OBS mode)
?compare=1 Start in comparison mode
?autostart=1 Auto-start transcription

Physical Setup for Events

Presenter ──▶ Wireless Mic ──▶ Mixer ──▶ USB Audio Interface ──▶ Your Laptop
                                                                       │
                                                            Translation App
                                                                       │
Presenter Laptop ──▶ Capture Card ──▶ OBS (your laptop) ──▶ Projector
                                           │
                                    Subtitle overlay composited here

The presenter doesn't need to run any software. Their laptop goes through a capture card into OBS on the operator's machine, where the subtitle overlay is composited on top.

API Keys Required

Service Purpose Cost
ElevenLabs Speech-to-text (Scribe v2 Realtime) ~$0.28/hour
Google Cloud Translation Translation (v2 REST / v3 TLLM) ~$20/1M chars
DeepL Translation (comparison/alternative) Free tier available

Files

File Purpose
index.html Main app — all CSS and JS inline
token-server.js Serves the app + generates ElevenLabs tokens + proxies translation APIs
.env.example API key configuration template

Configuration

The server reads API keys from environment variables. See .env.example for the shape.

For Google Cloud Translation v3 (Gemini-powered TLLM), you also need:

gcloud auth application-default login

License

MIT

About

Real-time speech-to-text translation overlay for live events

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors