Transcribe

A privacy-focused audio transcription toolkit that converts video recordings into searchable, summarized notes in Obsidian. Runs entirely locally with optional cloud summarization.

What It Does

Video Recording → Audio Extraction → Transcription → AI Summary → Obsidian Note
    (.mov)           (.mp3)            (.md)         (Claude/   (Person note
                                                      Ollama)    updated)

Key Features:

Local transcription - Uses faster-whisper (large-v3 model) for high-quality, offline speech-to-text
Multi-track support - Handles separate mic/system audio tracks from screen recordings
Speaker identification - Merges tracks into a speaker-tagged conversation
AI summarization - Generates structured summaries using Claude API or local Ollama
Obsidian integration - Automatically updates person notes with dated summaries
Hungarian language - Optimized for Hungarian transcription and summaries
Fully airgapped option - Can run 100% offline with Ollama

Quick Start

# Process all new recordings interactively
./process_calls.js

Or step by step:

./1_video2audio.sh recording.mov      # Extract audio tracks
node 2_audio2text.js recording        # Transcribe to text
node 3_text2notes.js recording.md     # Summarize and add to Obsidian

Installation

Prerequisites

macOS (tested on Apple Silicon)
Node.js 18+
Python 3
ffmpeg (brew install ffmpeg)

Setup

Clone and install dependencies:

git clone <repo-url>
cd transcribe
npm install
pip3 install faster-whisper

Configure environment:
```
cp .env.example .env
```
Edit .env with your settings (see Configuration below).
First run - The Whisper model (~3GB) downloads automatically on first transcription.

Configuration

Edit .env to configure the toolkit:

LLM Backend (for summarization)

Choose between cloud or local summarization:

# Option 1: Claude API (default, requires internet)
LLM_BACKEND=claude
ANTHROPIC_API_KEY=sk-ant-...

# Option 2: Ollama (local, airgapped)
LLM_BACKEND=ollama
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen2.5:32b

Ollama setup (for airgapped operation):

brew install ollama
ollama serve               # Start the server (runs in foreground)
# In a new terminal:
ollama pull qwen2.5:32b    # Downloads ~20GB, requires ~20GB RAM to run

For better quality with more RAM: ollama pull qwen2.5:72b (~40GB+ RAM)

Note: ollama serve must be running whenever you use the Ollama backend. You can also install Ollama as an app from ollama.ai which runs automatically in the menu bar.

Obsidian Integration

OBSIDIAN_VAULT_PATH=/path/to/your/vault/Call - meet notes
TRANSCRIPTS_FOLDER=Transcripts

Customizing Prompts

Edit prompts.yaml to customize summarization scenarios:

leadership_mentoring - Engineering leadership discussions
first_meeting - Initial introductory calls
technical_discussion - Architecture and tech decisions
career_planning - Career development sessions
general - Catch-all for other conversations

Usage

Automated Pipeline (Recommended)

./process_calls.js

Scans recordings/ for .mov files and processes them interactively:

Asks which files to process
Runs the full pipeline
Moves completed files to Trash
Shows summary at the end

Individual Scripts

Extract audio from video:

./1_video2audio.sh recording.mov
# Creates: recording-mic.mp3, recording-mac.mp3

Transcribe audio:

node 2_audio2text.js recording
# Creates: recording.json, recording.md (merged transcript)

Summarize and add to Obsidian:

node 3_text2notes.js recordings/recording.md
# Interactive: select person, scenario, descriptor
# Updates person note and moves transcript to vault

Options

All scripts support:

-v, --verbose - Show detailed output
-f, --force - Overwrite existing files without prompting
-h, --help - Show help

How It Works

1. Video to Audio (`1_video2audio.sh`)

Extracts audio tracks from video files using ffmpeg:

First track → -mic.mp3 (usually your microphone)
Second track → -mac.mp3 (system audio / other participant)
Additional tracks → -mac-2.mp3, -mac-3.mp3, etc.

2. Audio to Text (`2_audio2text.js`)

Transcribes using faster-whisper with the large-v3 model:

Runs locally on CPU (no GPU required, but slower)
~2-3x realtime on Apple Silicon (1 hour audio ≈ 2-3 hours processing)
Anti-hallucination parameters for reliable output
Prompts for speaker names to label the conversation

3. Text to Notes (`3_text2notes.js`)

Summarizes transcripts and integrates with Obsidian:

Generates context-aware summaries based on conversation type
Creates/updates person notes with dated sections
Links to full transcript for reference
All output in Hungarian

Recording with OBS Studio

The toolkit expects MOV files with two separate audio tracks for speaker separation:

Track 1: Your microphone (you)
Track 2: System/desktop audio (other participants on the call)

This simple setup provides effective speaker identification without complex diarization - each track becomes a separate speaker in the transcript.

OBS Configuration

OBS Studio is recommended for recording. An example profile is provided in examples/obs-profile-basic.ini.

Quick setup:

Install the profile:
- Copy examples/obs-profile-basic.ini to your OBS profiles folder:
  - macOS: ~/Library/Application Support/obs-studio/basic/profiles/CallRecording/basic.ini
- Edit the file and update FilePath to point to your recordings/ folder
Configure audio routing in OBS:
- Go to Settings > Audio
- Set "Mic/Auxiliary Audio" to your microphone
- Set "Desktop Audio" to capture system sound
Assign tracks to sources:
- Edit > Advanced Audio Properties
- Your mic: Enable Track 1 only
- Desktop Audio: Enable Track 2 only
Add a video source (required by OBS, but we only use the audio):
- Add a Display Capture or Window Capture source
- The example profile uses low video settings (720p, 30fps, 2Mbps) to minimize file size

Profile Optimizations

The example profile is optimized for this audio-focused workflow:

Setting	Value	Why
Resolution	720p	Minimum needed; video is incidental
Frame rate	30 fps	Sufficient for screen content
Video bitrate	2 Mbps	Low since video isn't used
Audio bitrate	160 kbps	Good quality for voice
Format	MOV (hybrid)	Supports multiple audio tracks
Audio tracks	1 + 2	Mic and system audio separated

A 1-hour call produces roughly 1-1.5 GB with these settings (mostly video). The extracted audio is ~15 MB per track.

Directory Structure

transcribe/
├── recordings/           # Input/output directory (gitignored)
│   ├── *.mov            # Input video files
│   ├── *-mic.mp3        # Extracted mic audio
│   ├── *-mac.mp3        # Extracted system audio
│   ├── *.json           # Transcription data
│   └── *.md             # Merged transcript
├── examples/
│   └── obs-profile-basic.ini  # OBS Studio profile for recording
├── 1_video2audio.sh     # Audio extraction script
├── 2_audio2text.js      # Transcription script
├── 2_transcribe.py      # Python transcription engine
├── 3_text2notes.js      # Summarization script
├── process_calls.js     # Pipeline orchestrator
├── prompts.yaml         # Summarization prompts
├── .env.example         # Environment template
└── .env                 # Your configuration (gitignored)

Performance

On Apple Silicon M-series with large-v3 model:

Transcription: ~2-3x realtime (1 hour audio takes 2-3 hours)
Summarization: ~10-30 seconds with Claude, longer with local Ollama

For faster transcription, you can switch to medium or small models in 2_audio2text.js, though with reduced accuracy.

Troubleshooting

"Cannot connect to Ollama"

Start the server: ollama serve
Check it's running: curl http://localhost:11434/api/tags

"Model not found"

Pull the model first: ollama pull qwen2.5:32b

Transcription is slow

This is expected with large-v3 on CPU
Consider using a smaller model for drafts

ffmpeg not found

Install with: brew install ffmpeg

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
examples		examples
.env.example		.env.example
.gitignore		.gitignore
1_video2audio.sh		1_video2audio.sh
2_audio2text.js		2_audio2text.js
2_transcribe.py		2_transcribe.py
3_text2notes.js		3_text2notes.js
CLAUDE.md		CLAUDE.md
IMPLEMENTATION.md		IMPLEMENTATION.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
process_calls.js		process_calls.js
prompts.yaml		prompts.yaml
requirements.txt		requirements.txt
test_transcribe.py		test_transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcribe

What It Does

Quick Start

Installation

Prerequisites

Setup

Configuration

LLM Backend (for summarization)

Obsidian Integration

Customizing Prompts

Usage

Automated Pipeline (Recommended)

Individual Scripts

Options

How It Works

1. Video to Audio (`1_video2audio.sh`)

2. Audio to Text (`2_audio2text.js`)

3. Text to Notes (`3_text2notes.js`)

Recording with OBS Studio

OBS Configuration

Profile Optimizations

Directory Structure

Performance

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transcribe

What It Does

Quick Start

Installation

Prerequisites

Setup

Configuration

LLM Backend (for summarization)

Obsidian Integration

Customizing Prompts

Usage

Automated Pipeline (Recommended)

Individual Scripts

Options

How It Works

1. Video to Audio (1_video2audio.sh)

2. Audio to Text (2_audio2text.js)

3. Text to Notes (3_text2notes.js)

Recording with OBS Studio

OBS Configuration

Profile Optimizations

Directory Structure

Performance

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

1. Video to Audio (`1_video2audio.sh`)

2. Audio to Text (`2_audio2text.js`)

3. Text to Notes (`3_text2notes.js`)

Packages