transcribe-cli

Local audio/video transcription with speaker diarization. No API keys. No cloud. One command.

Quickstart

npm (Node.js API + CLI):

npm install transcribe-cli

Shell (standalone CLI):

curl -sSL https://raw.githubusercontent.com/robit-man/transcribe-cli/main/install.sh | bash

Then:

transcribe audio.mp3
transcribe meeting.wav --model medium --diarize --format json
transcribe batch ./recordings --recursive --format srt

Features

100% Local — Runs on your machine via faster-whisper (CTranslate2). No API keys, no cloud, no data leaves your system.
Speaker Diarization — Identify who said what with --diarize (via pyannote.audio)
Word-Level Timestamps — Precise per-word timing with --word-timestamps
Live Audio Streaming — Real-time transcription via Node.js streams
4 Output Formats — txt, srt (with speaker labels), vtt (with W3C voice tags), json (full metadata)
Audio + Video — MP3, WAV, FLAC, AAC, M4A, OGG, WMA, MP4, MKV, AVI, MOV, WebM, FLV
Batch Processing — Process entire directories with configurable concurrency
5 Model Sizes — tiny, base, small, medium, large-v3 (auto-downloads on first use)
Auto Audio Extraction — Videos are automatically handled via FFmpeg
Dual Interface — Use as CLI tool or Node.js API with full TypeScript types
Cross-Platform — Linux and macOS

Requirements

Python 3.9+
FFmpeg 4.0+
~1 GB disk (for base model; large-v3 needs ~3 GB)

The install script handles all dependencies automatically.

Installation

One-Line Install (Recommended)

curl -sSL https://raw.githubusercontent.com/robit-man/transcribe-cli/main/install.sh | bash

This will:

Install system dependencies (Python, FFmpeg, git) if missing
Clone the repository to ~/.local/share/transcribe-cli
Create a Python virtual environment with all packages
Pre-download the default Whisper model (base)
Create transcribe and transcribe-cli commands in ~/.local/bin
Add ~/.local/bin to your PATH if needed

Environment variables (optional):

TRANSCRIBE_INSTALL_DIR — Custom install location (default: ~/.local/share/transcribe-cli)
TRANSCRIBE_MODEL — Model to pre-download (default: base)

Manual Install

git clone https://github.com/robit-man/transcribe-cli.git
cd transcribe-cli
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

With Speaker Diarization

pip install -e ".[diarization]"

Usage

Transcribe a Single File

transcribe audio.mp3
transcribe video.mkv --format srt
transcribe recording.wav --output-dir ./transcripts
transcribe lecture.mp3 --model medium --language en

Speaker Diarization

transcribe meeting.wav --diarize --format srt
transcribe interview.mp3 --diarize --format json
transcribe podcast.mp3 --diarize --word-timestamps --format vtt

Batch Processing

transcribe batch ./recordings
transcribe batch ./videos --format srt --concurrency 3
transcribe batch ./media --recursive --dry-run
transcribe batch ./meetings --model medium --diarize --format json

Audio Extraction

transcribe extract video.mkv
transcribe extract video.mp4 --output audio.mp3
transcribe extract video.avi --format wav

Configuration

transcribe config --show        # Show current settings
transcribe config --init        # Create transcribe.toml in current directory
transcribe config --locations   # Show config file search paths

Dependency Check

transcribe setup --check

Node.js API

Install

npm install transcribe-cli

On install, the package automatically:

Creates a Python virtual environment
Installs faster-whisper and all Python dependencies
Downloads the default Whisper model (base)

Set TRANSCRIBE_VERBOSE=1 to see setup progress. Set TRANSCRIBE_MODEL=medium to pre-download a different model.

File Transcription

const { transcribe, transcribeBatch, shutdownBridge } = require('transcribe-cli');

// Transcribe a single file
const result = await transcribe('meeting.mp3', {
  model: 'base',           // tiny, base, small, medium, large-v3
  diarize: true,           // speaker identification
  wordTimestamps: true,    // per-word timing
  format: 'json',          // txt, srt, vtt, json
  language: 'auto',        // or 'en', 'es', etc.
});

console.log(result.text);
console.log(result.speakers);    // ['SPEAKER_00', 'SPEAKER_01']
console.log(result.segments);    // [{id, start, end, text, speaker, words}]

// Batch transcribe a directory
const batch = await transcribeBatch('./recordings', {
  recursive: true,
  concurrency: 3,
  format: 'srt',
});
console.log(`${batch.successful}/${batch.totalFiles} files transcribed`);

// Clean up when done
await shutdownBridge();

Live Audio Streaming

const { TranscribeLive } = require('transcribe-cli');

const live = new TranscribeLive({
  model: 'base',
  sampleRate: 16000,    // Hz
  channels: 1,          // mono
  sampleWidth: 2,       // 16-bit
  chunkDuration: 5,     // seconds per chunk
  wordTimestamps: true,
});

live.on('ready', () => {
  console.log('Model loaded, streaming...');
});

live.on('transcript', (event) => {
  console.log(`[${event.isFinal ? 'FINAL' : 'partial'}] ${event.text}`);
  // event.segments has full timing + speaker info
});

// Feed raw PCM audio buffers
live.write(pcmBuffer);

// Or pipe from any readable stream (microphone, file, etc.)
audioSource.pipe(live.stream);

// Finish and flush remaining audio
await live.finish();

TypeScript

Full type definitions included:

import { transcribe, TranscribeLive, TranscriptionResult, LiveTranscriptEvent } from 'transcribe-cli';

const result: TranscriptionResult = await transcribe('audio.mp3', { diarize: true });

CLI Reference

`transcribe <file> [OPTIONS]`

Option	Short	Description	Default
`--output-dir`	`-o`	Output directory	Current dir
`--format`	`-f`	Output format: `txt`, `srt`, `vtt`, `json`	`txt`
`--language`	`-l`	Language code or `auto`	`auto`
`--model`	`-m`	Model: `tiny`, `base`, `small`, `medium`, `large-v3`	`base`
`--diarize`		Enable speaker diarization	Off
`--word-timestamps`		Enable word-level timestamps	Off
`--verbose`		Verbose output	Off

`transcribe batch <directory> [OPTIONS]`

All options from transcribe plus:

Option	Short	Description	Default
`--concurrency`	`-c`	Max concurrent jobs (1-20)	`5`
`--recursive`	`-r`	Scan subdirectories	Off
`--dry-run`		Preview files without processing	Off

`transcribe extract <file> [OPTIONS]`

Option	Short	Description	Default
`--output`	`-o`	Output file path	Auto-generated
`--format`	`-f`	Audio format: `mp3`, `wav`	`mp3`

Output Formats

TXT — Plain text

Hello, welcome to the meeting. Today we'll discuss the quarterly results.

SRT — SubRip subtitles (with speaker labels when diarized)

1
00:00:00,000 --> 00:00:03,500
[SPEAKER_00] Hello, welcome to the meeting.

2
00:00:03,500 --> 00:00:07,200
[SPEAKER_01] Thanks for having me.

VTT — WebVTT (with W3C voice tags when diarized)

WEBVTT

00:00:00.000 --> 00:00:03.500
<v SPEAKER_00>Hello, welcome to the meeting.</v>

00:00:03.500 --> 00:00:07.200
<v SPEAKER_01>Thanks for having me.</v>

JSON — Full metadata

{
  "text": "Hello, welcome to the meeting...",
  "language": "en",
  "duration": 120.5,
  "speakers": ["SPEAKER_00", "SPEAKER_01"],
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 3.5,
      "text": "Hello, welcome to the meeting.",
      "speaker": "SPEAKER_00",
      "words": [
        {"word": "Hello,", "start": 0.1, "end": 0.5},
        {"word": "welcome", "start": 0.6, "end": 1.0}
      ]
    }
  ]
}

Configuration File

Create with transcribe config --init:

[output]
format = "txt"

[processing]
concurrency = 5
language = "auto"
recursive = false

[model]
size = "base"
device = "auto"
compute_type = "auto"

[features]
diarize = false
word_timestamps = false

Config files are searched in order:

./transcribe.toml
./.transcriberc
~/.config/transcribe/config.toml
~/.transcriberc

Environment Variables

Variable	Description	Default
`TRANSCRIBE_MODEL_SIZE`	Whisper model size	`base`
`TRANSCRIBE_DEVICE`	Compute device (`auto`/`cpu`/`cuda`)	`auto`
`TRANSCRIBE_COMPUTE_TYPE`	Compute type (`auto`/`int8`/`float16`/`float32`)	`auto`
`TRANSCRIBE_CONCURRENCY`	Max concurrent batch jobs	`5`
`TRANSCRIBE_LANGUAGE`	Default language	`auto`
`TRANSCRIBE_DIARIZE`	Enable diarization by default	`false`
`TRANSCRIBE_WORD_TIMESTAMPS`	Enable word timestamps by default	`false`

Model Sizes

Model	Size	English	Multilingual	Speed
`tiny`	~75 MB	Good	Fair	Fastest
`base`	~150 MB	Better	Good	Fast
`small`	~500 MB	Great	Great	Moderate
`medium`	~1.5 GB	Excellent	Excellent	Slower
`large-v3`	~3 GB	Best	Best	Slowest

Models are auto-downloaded on first use and cached locally.

Development

git clone https://github.com/robit-man/transcribe-cli.git
cd transcribe-cli
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# Run tests without coverage
pytest tests/unit/ -v --no-cov

Uninstall

rm -rf ~/.local/share/transcribe-cli
rm -f ~/.local/bin/transcribe ~/.local/bin/transcribe-cli

License

MIT

Acknowledgments

faster-whisper — CTranslate2 Whisper implementation
pyannote.audio — Speaker diarization
FFmpeg — Audio/video processing
Typer — CLI framework
Rich — Terminal formatting

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.aiwg		.aiwg
.claude		.claude
.github/workflows		.github/workflows
lib		lib
models		models
scripts		scripts
src/transcribe_cli		src/transcribe_cli
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmignore		.npmignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

transcribe-cli

Quickstart

Features

Requirements

Installation

One-Line Install (Recommended)

Manual Install

With Speaker Diarization

Usage

Transcribe a Single File

Speaker Diarization

Batch Processing

Audio Extraction

Configuration

Dependency Check

Node.js API

Install

File Transcription

Live Audio Streaming

TypeScript

CLI Reference

transcribe <file> [OPTIONS]

transcribe batch <directory> [OPTIONS]

transcribe extract <file> [OPTIONS]

Output Formats

TXT — Plain text

SRT — SubRip subtitles (with speaker labels when diarized)

VTT — WebVTT (with W3C voice tags when diarized)

JSON — Full metadata

Configuration File

Environment Variables

Model Sizes

Development

Uninstall

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`transcribe <file> [OPTIONS]`

`transcribe batch <directory> [OPTIONS]`

`transcribe extract <file> [OPTIONS]`

Packages