Local-first video editing toolkit for Apple Silicon Macs. It wraps ffmpeg, whisper-cpp, and auto-editor behind a single CLI, with rewrite-edit as the first-class agent workflow and word-editor as the main manual fallback for talking-head cleanup.
If you want Codex or Claude Code to take a video, work from a cleaned transcript, make the cuts, and deliver the export, start here:
toolkit rewrite-edit /path/to/input.mp4 --transcript-file final_draft.txtAlso supported:
toolkit rewrite-edit /path/to/input.mp4 --transcript "clean final draft transcript"
cat final_draft.txt | toolkit rewrite-edit /path/to/input.mp4 --stdinPreview the edit without rendering:
toolkit plan-edit /path/to/input.mp4 --transcript-file final_draft.txt --preset tight-social-clipplan-edit writes edit_plan.json and decision_report.html in the run folder. The JSON includes proposed word keep/cut decisions, detected silences, boundary scores, raw ranges, and padded clip ranges, so an agent can explain or adjust the edit before spending time on ffmpeg.
That flow will:
- ensure word-level transcription exists
- match the final-draft transcript against the spoken source with the shared Python rewrite matcher
- convert the kept words into exportable ranges
- render a final
transcript_edit.mp4 - write machine-friendly metadata into
run.json
Named presets are available for common edit styles:
toolkit presets
toolkit rewrite-edit /path/to/input.mp4 --transcript-file final_draft.txt --preset sermon-excerptPresets can still be overridden with explicit flags like --padding, --max-silence, --merge-gap, --silence-threshold-db, --min-silence-duration, and --weak-boundary-score.
For agent handoffs, use a structured JSON request:
toolkit request-schema
toolkit edit-request request.jsonExample request:
{
"source_path": "/path/to/input.mp4",
"workflow": "rewrite-edit",
"target_transcript_path": "final_draft.txt",
"preset": "gentle-talking-head-cleanup",
"output_style": "plan",
"notes": "Inspect before rendering."
}Use rewrite-edit when:
- Codex or Claude Code is acting as the editor
- you already have a final-draft transcript
- you want a CLI or API-shaped workflow instead of a browser review loop
Use word-editor when:
- you want to inspect cuts manually
- you want to click-drag transcript words in the browser
- you want one-click cleanup before export
If you want the closest thing to "open transcript, click words, remove the bad parts, export", start here:
toolkit word-editor /path/to/input.mp4That flow will:
- extract audio
- transcribe at word level
- open a browser editor over the source video
- let you cut words, sentences, fillers, and long pauses
- export a cleaned
transcript_edit.mp4
The word editor is the preferred workflow when you want:
- fast talking-head cleanup
- transcript-driven cutting instead of timeline trimming
- a lightweight local alternative to CapCut-style transcript editing
- a manual fallback after an agent-driven
rewrite-editpass
Inside word-editor you can:
- click-drag words to select and cut them
- cut a whole sentence/card in one click
- split cards into smaller chunks
- reorder cards
- search the transcript
- bulk cut obvious filler words
- bulk cut long pauses conservatively
- rewrite the transcript and apply approximate matching
Edits are undoable before export, and every run writes artifacts to a fresh output directory.
cd /Users/michaelpierre/Documents/coding-projects/video-cli-toolkit
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install .
toolkit setup
make install-globaltoolkit setup will:
- check for Homebrew
- install
ffmpegandwhisper-cppif missing - install Python packages into
.venv - warm the
auto-editorbinary inside the local venv - print model download instructions if no Whisper model file exists yet
Health check:
toolkit doctorOpen the main editor workflow:
toolkit word-editor /path/to/input.mp4Run the agent-first transcript rewrite workflow:
toolkit rewrite-edit /path/to/input.mp4 --transcript-file final_draft.txtUse a larger Whisper model for better word alignment:
toolkit word-editor /path/to/input.mp4 --model small.enIf you are using Claude Code in this repo, see CLAUDE.md for project-specific guidance. If you are using Codex, see AGENTS.md.
This repo also includes local skills written to work cleanly with both Codex and Claude Code:
- skills/guided-video-editor/SKILL.md: front-door skill for raw video in, transcript conversation, approved final draft, and rendered cut out
- skills/video-cli-toolkit/SKILL.md: general repo skill for setup, health checks, transcription, and core toolkit commands
- skills/rewrite-edit-video/SKILL.md: preferred skill when Claude should turn a final-draft transcript into the finished cut
- skills/word-editor-fallback/SKILL.md: manual browser-editor fallback when a human wants to inspect or adjust cuts
- skills/transcript-video-edit/SKILL.md: query, fuzzy-query, or manual ranges based transcript editing
- skills/transcript-review-cut/SKILL.md: older review-sheet flow for human-in-the-loop transcript review
- skills/social-clip-cutter/SKILL.md: short social-ready clip extraction
For agentic editing, the intended order is:
guided-video-editorfor raw-video-to-final-cut collaborationrewrite-editfor first-class CLI/API use when the final transcript already existsword-editorfor manual fallback- older review-sheet or query-first flows only when they are specifically the right tool
whisper-cpp requires GGML model files. Place your chosen model in:
/Users/michaelpierre/Documents/coding-projects/video-cli-toolkit/models/
Example default filename:
ggml-base.bin
These still work, but they are now secondary to rewrite-edit for agent-driven automation and secondary to word-editor for manual transcript editing.
Silence-cut edit:
toolkit edit /path/to/input.mp4
toolkit edit /path/to/input.mp4 --margin 0.2s,1.0sTranscript query editing:
toolkit transcript-edit /path/to/input.mp4 --query "chapter 16"
toolkit transcript-edit /path/to/input.mp4 --query "chapter" --query "david" --padding 0.4,0.8
toolkit transcript-edit /path/to/input.mp4 --fuzzy-query "chapter sixteen david" --fuzzy-threshold 0.55
toolkit transcript-edit /path/to/input.mp4 --ranges ranges.jsonOlder review-sheet flow:
toolkit review-sheet /path/to/input.mp4
toolkit ranges-from-review /path/to/input.mp4 --instructions "keep 3-6, 9, 12-14"
toolkit ranges-from-review /path/to/input.mp4 --instructions "cut 0-2, 5"Captions and transcription:
toolkit transcribe /path/to/input.mp4
toolkit captions /path/to/input.mp4
toolkit pipeline /path/to/input.mp4Matching notes for transcript-edit:
- exact and fuzzy matching normalize punctuation and simple number words like
sixteen -> 16 - transcript matching can span adjacent transcript segments instead of only matching one segment at a time
- fuzzy matching combines sequence similarity, token overlap, and normalized contains checks
Manual ranges JSON format:
[
{ "start": 12.4, "end": 18.9, "label": "hook" },
{ "start": 44.1, "end": 58.0, "label": "cta" }
]Each run writes to:
outputs/<source-stem>/<run-id>/
Typical files:
audio.wavtranscript.txtsegments.jsonword_segments.jsonranges.jsoncaptions.srtedited.mp4transcript_edit.mp4selected_segments.jsonclip_ranges.jsonrun.json
Some older flows also write review_sheet.txt.
Install a wrapper into /Users/michaelpierre/.local/bin/toolkit:
cd /Users/michaelpierre/Documents/coding-projects/video-cli-toolkit
make install-globalAfter that, toolkit will work from anywhere in your shell.
config.toml: project defaultssrc/video_cli_toolkit/: CLI implementationtests/: unit testsmodels/: Whisper model filesoutputs/: generated workflow artifacts
toolkit doctor checks for VideoToolbox encoder support in your local ffmpeg build. The default edit output is configured for h264_videotoolbox.