Skip to content

eszpee/transcribe

Repository files navigation

Transcribe

A privacy-focused audio transcription toolkit that converts video recordings into searchable, summarized notes in Obsidian. Runs entirely locally with optional cloud summarization.

What It Does

Video Recording → Audio Extraction → Transcription → AI Summary → Obsidian Note
    (.mov)           (.mp3)            (.md)         (Claude/   (Person note
                                                      Ollama)    updated)

Key Features:

  • Local transcription - Uses faster-whisper (large-v3 model) for high-quality, offline speech-to-text
  • Multi-track support - Handles separate mic/system audio tracks from screen recordings
  • Speaker identification - Merges tracks into a speaker-tagged conversation
  • AI summarization - Generates structured summaries using Claude API or local Ollama
  • Obsidian integration - Automatically updates person notes with dated summaries
  • Hungarian language - Optimized for Hungarian transcription and summaries
  • Fully airgapped option - Can run 100% offline with Ollama

Quick Start

# Process all new recordings interactively
./process_calls.js

Or step by step:

./1_video2audio.sh recording.mov      # Extract audio tracks
node 2_audio2text.js recording        # Transcribe to text
node 3_text2notes.js recording.md     # Summarize and add to Obsidian

Installation

Prerequisites

  • macOS (tested on Apple Silicon)
  • Node.js 18+
  • Python 3
  • ffmpeg (brew install ffmpeg)

Setup

  1. Clone and install dependencies:

    git clone <repo-url>
    cd transcribe
    npm install
    pip3 install faster-whisper
  2. Configure environment:

    cp .env.example .env

    Edit .env with your settings (see Configuration below).

  3. First run - The Whisper model (~3GB) downloads automatically on first transcription.

Configuration

Edit .env to configure the toolkit:

LLM Backend (for summarization)

Choose between cloud or local summarization:

# Option 1: Claude API (default, requires internet)
LLM_BACKEND=claude
ANTHROPIC_API_KEY=sk-ant-...

# Option 2: Ollama (local, airgapped)
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen2.5:32b

Ollama setup (for airgapped operation):

brew install ollama
ollama serve               # Start the server (runs in foreground)
# In a new terminal:
ollama pull qwen2.5:32b    # Downloads ~20GB, requires ~20GB RAM to run

For better quality with more RAM: ollama pull qwen2.5:72b (~40GB+ RAM)

Note: ollama serve must be running whenever you use the Ollama backend. You can also install Ollama as an app from ollama.ai which runs automatically in the menu bar.

Obsidian Integration

OBSIDIAN_VAULT_PATH=/path/to/your/vault/Call - meet notes
TRANSCRIPTS_FOLDER=Transcripts

Customizing Prompts

Edit prompts.yaml to customize summarization scenarios:

  • leadership_mentoring - Engineering leadership discussions
  • first_meeting - Initial introductory calls
  • technical_discussion - Architecture and tech decisions
  • career_planning - Career development sessions
  • general - Catch-all for other conversations

Usage

Automated Pipeline (Recommended)

./process_calls.js

Scans recordings/ for .mov files and processes them interactively:

  1. Asks which files to process
  2. Runs the full pipeline
  3. Moves completed files to Trash
  4. Shows summary at the end

Individual Scripts

Extract audio from video:

./1_video2audio.sh recording.mov
# Creates: recording-mic.mp3, recording-mac.mp3

Transcribe audio:

node 2_audio2text.js recording
# Creates: recording.json, recording.md (merged transcript)

Summarize and add to Obsidian:

node 3_text2notes.js recordings/recording.md
# Interactive: select person, scenario, descriptor
# Updates person note and moves transcript to vault

Options

All scripts support:

  • -v, --verbose - Show detailed output
  • -f, --force - Overwrite existing files without prompting
  • -h, --help - Show help

How It Works

1. Video to Audio (1_video2audio.sh)

Extracts audio tracks from video files using ffmpeg:

  • First track → -mic.mp3 (usually your microphone)
  • Second track → -mac.mp3 (system audio / other participant)
  • Additional tracks → -mac-2.mp3, -mac-3.mp3, etc.

2. Audio to Text (2_audio2text.js)

Transcribes using faster-whisper with the large-v3 model:

  • Runs locally on CPU (no GPU required, but slower)
  • ~2-3x realtime on Apple Silicon (1 hour audio ≈ 2-3 hours processing)
  • Anti-hallucination parameters for reliable output
  • Prompts for speaker names to label the conversation

3. Text to Notes (3_text2notes.js)

Summarizes transcripts and integrates with Obsidian:

  • Generates context-aware summaries based on conversation type
  • Creates/updates person notes with dated sections
  • Links to full transcript for reference
  • All output in Hungarian

Recording with OBS Studio

The toolkit expects MOV files with two separate audio tracks for speaker separation:

  • Track 1: Your microphone (you)
  • Track 2: System/desktop audio (other participants on the call)

This simple setup provides effective speaker identification without complex diarization - each track becomes a separate speaker in the transcript.

OBS Configuration

OBS Studio is recommended for recording. An example profile is provided in examples/obs-profile-basic.ini.

Quick setup:

  1. Install the profile:

    • Copy examples/obs-profile-basic.ini to your OBS profiles folder:
      • macOS: ~/Library/Application Support/obs-studio/basic/profiles/CallRecording/basic.ini
    • Edit the file and update FilePath to point to your recordings/ folder
  2. Configure audio routing in OBS:

    • Go to Settings > Audio
    • Set "Mic/Auxiliary Audio" to your microphone
    • Set "Desktop Audio" to capture system sound
  3. Assign tracks to sources:

    • Edit > Advanced Audio Properties
    • Your mic: Enable Track 1 only
    • Desktop Audio: Enable Track 2 only
  4. Add a video source (required by OBS, but we only use the audio):

    • Add a Display Capture or Window Capture source
    • The example profile uses low video settings (720p, 30fps, 2Mbps) to minimize file size

Profile Optimizations

The example profile is optimized for this audio-focused workflow:

Setting Value Why
Resolution 720p Minimum needed; video is incidental
Frame rate 30 fps Sufficient for screen content
Video bitrate 2 Mbps Low since video isn't used
Audio bitrate 160 kbps Good quality for voice
Format MOV (hybrid) Supports multiple audio tracks
Audio tracks 1 + 2 Mic and system audio separated

A 1-hour call produces roughly 1-1.5 GB with these settings (mostly video). The extracted audio is ~15 MB per track.

Directory Structure

transcribe/
├── recordings/           # Input/output directory (gitignored)
│   ├── *.mov            # Input video files
│   ├── *-mic.mp3        # Extracted mic audio
│   ├── *-mac.mp3        # Extracted system audio
│   ├── *.json           # Transcription data
│   └── *.md             # Merged transcript
├── examples/
│   └── obs-profile-basic.ini  # OBS Studio profile for recording
├── 1_video2audio.sh     # Audio extraction script
├── 2_audio2text.js      # Transcription script
├── 2_transcribe.py      # Python transcription engine
├── 3_text2notes.js      # Summarization script
├── process_calls.js     # Pipeline orchestrator
├── prompts.yaml         # Summarization prompts
├── .env.example         # Environment template
└── .env                 # Your configuration (gitignored)

Performance

On Apple Silicon M-series with large-v3 model:

  • Transcription: ~2-3x realtime (1 hour audio takes 2-3 hours)
  • Summarization: ~10-30 seconds with Claude, longer with local Ollama

For faster transcription, you can switch to medium or small models in 2_audio2text.js, though with reduced accuracy.

Troubleshooting

"Cannot connect to Ollama"

  • Start the server: ollama serve
  • Check it's running: curl http://localhost:11434/api/tags

"Model not found"

  • Pull the model first: ollama pull qwen2.5:32b

Transcription is slow

  • This is expected with large-v3 on CPU
  • Consider using a smaller model for drafts

ffmpeg not found

  • Install with: brew install ffmpeg

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors