deepgram-stt

Speech to text using Deepgram API with speaker diarization.

Installation

pip install -r requirements.txt

Configuration

Create a .env file with your Deepgram API key:

DEEPGRAM_API_KEY=your_api_key_here

Usage

1. Transcribe audio

python speech_to_text.py <audio_file> [-l <language>]

Examples:

# Transcribe English audio (default)
python speech_to_text.py interview.wav

# Transcribe Russian audio
python speech_to_text.py podcast.mp3 -l ru

Output: Creates output/<filename>/ folder with:

<filename>.json - Raw API response
<filename>_speakers.txt - Speaker-labeled transcript
<filename>_youtube.txt - YouTube timestamp format

2. Clean transcript

Merge consecutive same-speaker lines into cleaner format:

python clean_transcript.py <folder>

Example:

# Clean transcript for interview.wav (uses output/interview/)
python clean_transcript.py interview

Output: Creates output/<folder>/<folder>_clean.txt

Language codes

en - English (default)
uk - Ukrainian
ru - Russian
de - German
fr - French
es - Spanish

Deepgram Real-time STT with Diarization

Real-time speech-to-text transcription with speaker diarization using Deepgram's WebSocket API.

Features

Real-time microphone transcription
Speaker diarization (identifies different speakers)
Color-coded speaker labels in terminal
Interim results (see words as you speak)
Automatic transcript saving

Setup

Install dependencies:

pip install -r requirements.txt

Create .env file with your Deepgram API key:

DEEPGRAM_API_KEY=your_api_key_here

Or copy from the deepgram-stt folder:

cp ../deepgram-stt/.env .

Usage

Basic usage (English):

python realtime_stt.py

With different language:

python realtime_stt.py -l ru    # Russian
python realtime_stt.py -l de    # German
python realtime_stt.py -l fr    # French

Without saving transcript:

python realtime_stt.py --no-save

List available audio devices:

python realtime_stt.py --list-devices

Use specific audio device:

python realtime_stt.py -d 2     # Use device ID 2

Output

Transcripts are saved to the output/ folder:

YYYYMMDD_HHMMSS_realtime.txt - Speaker-labeled transcript
YYYYMMDD_HHMMSS_realtime.json - Full metadata with timestamps

Controls

Ctrl+C - Stop recording and save transcript

Terminal Display

Each speaker is shown with a different color:

Speaker 0: Cyan
Speaker 1: Green
Speaker 2: Yellow
Speaker 3: Magenta
etc.

Interim results are shown with ... suffix and update in place. Final results are printed on a new line.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
clean_transcript.py		clean_transcript.py
realtime_stt.py		realtime_stt.py
requirements.txt		requirements.txt
speech_to_text.py		speech_to_text.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deepgram-stt

Installation

Configuration

Usage

1. Transcribe audio

2. Clean transcript

Language codes

Deepgram Real-time STT with Diarization

Features

Setup

Usage

Output

Controls

Terminal Display

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

deepgram-stt

Installation

Configuration

Usage

1. Transcribe audio

2. Clean transcript

Language codes

Deepgram Real-time STT with Diarization

Features

Setup

Usage

Output

Controls

Terminal Display

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages