Skip to content

format37/deepgram-stt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deepgram-stt

Speech to text using Deepgram API with speaker diarization.

Installation

pip install -r requirements.txt

Configuration

Create a .env file with your Deepgram API key:

DEEPGRAM_API_KEY=your_api_key_here

Usage

1. Transcribe audio

python speech_to_text.py <audio_file> [-l <language>]

Examples:

# Transcribe English audio (default)
python speech_to_text.py interview.wav

# Transcribe Russian audio
python speech_to_text.py podcast.mp3 -l ru

Output: Creates output/<filename>/ folder with:

  • <filename>.json - Raw API response
  • <filename>_speakers.txt - Speaker-labeled transcript
  • <filename>_youtube.txt - YouTube timestamp format

2. Clean transcript

Merge consecutive same-speaker lines into cleaner format:

python clean_transcript.py <folder>

Example:

# Clean transcript for interview.wav (uses output/interview/)
python clean_transcript.py interview

Output: Creates output/<folder>/<folder>_clean.txt

Language codes

  • en - English (default)
  • uk - Ukrainian
  • ru - Russian
  • de - German
  • fr - French
  • es - Spanish

Deepgram Real-time STT with Diarization

Real-time speech-to-text transcription with speaker diarization using Deepgram's WebSocket API.

Features

  • Real-time microphone transcription
  • Speaker diarization (identifies different speakers)
  • Color-coded speaker labels in terminal
  • Interim results (see words as you speak)
  • Automatic transcript saving

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Create .env file with your Deepgram API key:
DEEPGRAM_API_KEY=your_api_key_here

Or copy from the deepgram-stt folder:

cp ../deepgram-stt/.env .

Usage

Basic usage (English):

python realtime_stt.py

With different language:

python realtime_stt.py -l ru    # Russian
python realtime_stt.py -l de    # German
python realtime_stt.py -l fr    # French

Without saving transcript:

python realtime_stt.py --no-save

List available audio devices:

python realtime_stt.py --list-devices

Use specific audio device:

python realtime_stt.py -d 2     # Use device ID 2

Output

Transcripts are saved to the output/ folder:

  • YYYYMMDD_HHMMSS_realtime.txt - Speaker-labeled transcript
  • YYYYMMDD_HHMMSS_realtime.json - Full metadata with timestamps

Controls

  • Ctrl+C - Stop recording and save transcript

Terminal Display

Each speaker is shown with a different color:

  • Speaker 0: Cyan
  • Speaker 1: Green
  • Speaker 2: Yellow
  • Speaker 3: Magenta
  • etc.

Interim results are shown with ... suffix and update in place. Final results are printed on a new line.

About

Speech to text using deepgram API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages