Speech to text using Deepgram API with speaker diarization.
pip install -r requirements.txtCreate a .env file with your Deepgram API key:
DEEPGRAM_API_KEY=your_api_key_here
python speech_to_text.py <audio_file> [-l <language>]Examples:
# Transcribe English audio (default)
python speech_to_text.py interview.wav
# Transcribe Russian audio
python speech_to_text.py podcast.mp3 -l ruOutput: Creates output/<filename>/ folder with:
<filename>.json- Raw API response<filename>_speakers.txt- Speaker-labeled transcript<filename>_youtube.txt- YouTube timestamp format
Merge consecutive same-speaker lines into cleaner format:
python clean_transcript.py <folder>Example:
# Clean transcript for interview.wav (uses output/interview/)
python clean_transcript.py interviewOutput: Creates output/<folder>/<folder>_clean.txt
en- English (default)uk- Ukrainianru- Russiande- Germanfr- Frenches- Spanish
Real-time speech-to-text transcription with speaker diarization using Deepgram's WebSocket API.
- Real-time microphone transcription
- Speaker diarization (identifies different speakers)
- Color-coded speaker labels in terminal
- Interim results (see words as you speak)
- Automatic transcript saving
- Install dependencies:
pip install -r requirements.txt- Create
.envfile with your Deepgram API key:
DEEPGRAM_API_KEY=your_api_key_here
Or copy from the deepgram-stt folder:
cp ../deepgram-stt/.env .Basic usage (English):
python realtime_stt.pyWith different language:
python realtime_stt.py -l ru # Russian
python realtime_stt.py -l de # German
python realtime_stt.py -l fr # FrenchWithout saving transcript:
python realtime_stt.py --no-saveList available audio devices:
python realtime_stt.py --list-devicesUse specific audio device:
python realtime_stt.py -d 2 # Use device ID 2Transcripts are saved to the output/ folder:
YYYYMMDD_HHMMSS_realtime.txt- Speaker-labeled transcriptYYYYMMDD_HHMMSS_realtime.json- Full metadata with timestamps
- Ctrl+C - Stop recording and save transcript
Each speaker is shown with a different color:
- Speaker 0: Cyan
- Speaker 1: Green
- Speaker 2: Yellow
- Speaker 3: Magenta
- etc.
Interim results are shown with ... suffix and update in place.
Final results are printed on a new line.