Skip to content

CLI tool that auto-processes video recordings: transcribes, removes silence, generates captions, creates shorts, social posts, and more

Notifications You must be signed in to change notification settings

htekdev/vidpipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

49 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

 โ–ˆโ–ˆโ•—   โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
 โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•
 โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  
 โ•šโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ• โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ• โ–ˆโ–ˆโ•”โ•โ•โ•  
  โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ• โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
   โ•šโ•โ•โ•โ•  โ•šโ•โ•โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•     โ•šโ•โ•โ•šโ•โ•     โ•šโ•โ•โ•โ•โ•โ•โ•

Your AI video editor โ€” turn raw recordings into shorts, reels, captions, social posts, and blog posts. Record once, publish everywhere.

An agentic video editor that watches for new recordings and edits them into social-media-ready content โ€” shorts, reels, captions, blog posts, and platform-tailored social posts โ€” using GitHub Copilot SDK AI agents and OpenAI Whisper.

CI npm version Node.js 20+ License: ISC Docs Last Updated

npm install -g vidpipe

โœจ Features

VidPipe Features โ€” Input โ†’ AI Processing โ†’ Outputs


๐ŸŽ™๏ธ Whisper Transcription โ€” Word-level timestamps ๐Ÿ“ Split-Screen Layouts โ€” Portrait, square, and feed
๐Ÿ”‡ AI Silence Removal โ€” Context-aware, capped at 20% ๐Ÿ’ฌ Karaoke Captions โ€” Word-by-word highlighting
โœ‚๏ธ Short Clips โ€” Best 15โ€“60s moments, multi-segment ๐ŸŽž๏ธ Medium Clips โ€” 1โ€“3 min with crossfade transitions
๐Ÿ“‘ Chapter Detection โ€” JSON, Markdown, YouTube, FFmeta ๐Ÿ“ฑ Social Posts โ€” TikTok, YouTube, Instagram, LinkedIn, X
๐Ÿ“ฐ Blog Post โ€” Dev.to style with web-sourced links ๐ŸŽจ Brand Voice โ€” Custom tone, hashtags via brand.json
๐Ÿ” Face Detection โ€” ONNX-based webcam cropping ๐Ÿš€ Auto-Publish โ€” Scheduled posting to TikTok, YouTube, Instagram, LinkedIn, X

๐Ÿš€ Quick Start

# Install globally
npm install -g vidpipe

# Set up your environment
# Unix/Mac
cp .env.example .env
# Windows (PowerShell)
Copy-Item .env.example .env

# Then edit .env and add your OpenAI API key (REQUIRED):
#   OPENAI_API_KEY=sk-your-key-here

# Verify all prerequisites are met
vidpipe --doctor

# Process a single video
vidpipe /path/to/video.mp4

# Watch a folder for new recordings
vidpipe --watch-dir ~/Videos/Recordings

# Full example with options
vidpipe \
  --watch-dir ~/Videos/Recordings \
  --output-dir ~/Content/processed \
  --openai-key sk-... \
  --brand ./brand.json \
  --verbose

Prerequisites:

  • Node.js 20+
  • FFmpeg 6.0+ โ€” Auto-bundled on common platforms (Windows x64, macOS, Linux x64) via ffmpeg-static. On other architectures, install system FFmpeg (see Troubleshooting). Override with FFMPEG_PATH env var if you need a specific build.
  • OpenAI API key (required) โ€” Get one at platform.openai.com/api-keys. Needed for Whisper transcription and all AI features.
  • GitHub Copilot subscription โ€” Required for AI agent features (shorts generation, social media posts, summaries, blog posts). See GitHub Copilot.

See Getting Started for full setup instructions.


๐ŸŽฎ CLI Usage

vidpipe [options] [video-path]
vidpipe init              # Interactive setup wizard
vidpipe review            # Open post review web app
vidpipe schedule          # View posting schedule
Option Description
--doctor Check that all prerequisites (FFmpeg, API keys, etc.) are installed and configured
[video-path] Process a specific video file (implies --once)
--watch-dir <path> Folder to watch for new recordings
--output-dir <path> Output directory (default: ./recordings)
--openai-key <key> OpenAI API key
--exa-key <key> Exa AI key for web search in social posts
--brand <path> Path to brand.json (default: ./brand.json)
--once Process next video and exit
--no-silence-removal Skip silence removal
--no-shorts Skip short clip extraction
--no-medium-clips Skip medium clip generation
--no-social Skip social media posts
--no-social-publish Skip social media queue-build stage
--late-api-key <key> Override Late API key
--no-captions Skip caption generation/burning
--no-git Skip git commit/push
-v, --verbose Debug-level logging

๐Ÿ“ Output Structure

recordings/
โ””โ”€โ”€ my-awesome-demo/
    โ”œโ”€โ”€ my-awesome-demo.mp4                  # Original video
    โ”œโ”€โ”€ my-awesome-demo-edited.mp4           # Silence-removed
    โ”œโ”€โ”€ my-awesome-demo-captioned.mp4        # With burned-in captions
    โ”œโ”€โ”€ transcript.json                      # Word-level transcript
    โ”œโ”€โ”€ transcript-edited.json               # Timestamps adjusted for silence removal
    โ”œโ”€โ”€ README.md                            # AI-generated summary with screenshots
    โ”œโ”€โ”€ captions/
    โ”‚   โ”œโ”€โ”€ captions.srt                     # SubRip subtitles
    โ”‚   โ”œโ”€โ”€ captions.vtt                     # WebVTT subtitles
    โ”‚   โ””โ”€โ”€ captions.ass                     # Advanced SSA (karaoke-style)
    โ”œโ”€โ”€ shorts/
    โ”‚   โ”œโ”€โ”€ catchy-title.mp4                 # Landscape base clip
    โ”‚   โ”œโ”€โ”€ catchy-title-captioned.mp4       # Landscape + burned captions
    โ”‚   โ”œโ”€โ”€ catchy-title-portrait.mp4        # 9:16 split-screen
    โ”‚   โ”œโ”€โ”€ catchy-title-portrait-captioned.mp4  # Portrait + captions + hook overlay
    โ”‚   โ”œโ”€โ”€ catchy-title-feed.mp4            # 4:5 split-screen
    โ”‚   โ”œโ”€โ”€ catchy-title-square.mp4          # 1:1 split-screen
    โ”‚   โ”œโ”€โ”€ catchy-title.md                  # Clip metadata
    โ”‚   โ””โ”€โ”€ catchy-title/
    โ”‚       โ””โ”€โ”€ posts/                       # Per-short social posts (5 platforms)
    โ”œโ”€โ”€ medium-clips/
    โ”‚   โ”œโ”€โ”€ deep-dive-topic.mp4              # Landscape base clip
    โ”‚   โ”œโ”€โ”€ deep-dive-topic-captioned.mp4    # With burned captions
    โ”‚   โ”œโ”€โ”€ deep-dive-topic.md               # Clip metadata
    โ”‚   โ””โ”€โ”€ deep-dive-topic/
    โ”‚       โ””โ”€โ”€ posts/                       # Per-clip social posts (5 platforms)
    โ”œโ”€โ”€ chapters/
    โ”‚   โ”œโ”€โ”€ chapters.json                    # Structured chapter data
    โ”‚   โ”œโ”€โ”€ chapters.md                      # Markdown table
    โ”‚   โ”œโ”€โ”€ chapters.ffmetadata              # FFmpeg metadata format
    โ”‚   โ””โ”€โ”€ chapters-youtube.txt             # YouTube description timestamps
    โ””โ”€โ”€ social-posts/
        โ”œโ”€โ”€ tiktok.md                        # Full-video social posts
        โ”œโ”€โ”€ youtube.md
        โ”œโ”€โ”€ instagram.md
        โ”œโ”€โ”€ linkedin.md
        โ”œโ”€โ”€ x.md
        โ””โ”€โ”€ devto.md                         # Dev.to blog post

๐Ÿ“บ Review App

VidPipe includes a built-in web app for reviewing, editing, and scheduling social media posts before publishing.

VidPipe Review UI
Review and approve posts across YouTube, TikTok, Instagram, LinkedIn, and X/Twitter
# Launch the review app
vidpipe review
  • Platform tabs โ€” Filter posts by platform (YouTube, TikTok, Instagram, LinkedIn, X)
  • Video preview โ€” See the video thumbnail and content before approving
  • Keyboard shortcuts โ€” Arrow keys to navigate, Enter to approve, Backspace to reject
  • Smart scheduling โ€” Posts are queued with optimal timing per platform

๐Ÿ”„ Pipeline

graph LR
    A[๐Ÿ“ฅ Ingest] --> B[๐ŸŽ™๏ธ Transcribe]
    B --> C[๐Ÿ”‡ Silence Removal]
    C --> D[๐Ÿ’ฌ Captions]
    D --> E[๐Ÿ”ฅ Caption Burn]
    E --> F[โœ‚๏ธ Shorts]
    F --> G[๐ŸŽž๏ธ Medium Clips]
    G --> H[๐Ÿ“‘ Chapters]
    H --> I[๐Ÿ“ Summary]
    I --> J[๐Ÿ“ฑ Social Media]
    J --> K[๐Ÿ“ฑ Short Posts]
    K --> L[๐Ÿ“ฑ Medium Posts]
    L --> M[๐Ÿ“ฐ Blog]
    M --> N[๐Ÿ“ฆ Queue Build]
    N --> O[๐Ÿ”„ Git Push]

    style A fill:#2d5a27,stroke:#4ade80
    style B fill:#1e3a5f,stroke:#60a5fa
    style E fill:#5a2d27,stroke:#f87171
    style F fill:#5a4d27,stroke:#fbbf24
    style O fill:#2d5a27,stroke:#4ade80
Loading
# Stage Description
1 Ingestion Copies video, extracts metadata with FFprobe
2 Transcription Extracts audio โ†’ OpenAI Whisper for word-level transcription
3 Silence Removal AI detects dead-air segments; context-aware removals capped at 20%
4 Captions Generates .srt, .vtt, and .ass subtitle files with karaoke word highlighting
5 Caption Burn Burns ASS captions into video (single-pass encode when silence was also removed)
6 Shorts AI identifies best 15โ€“60s moments; extracts single and composite clips with 6 variants per short
7 Medium Clips AI identifies 1โ€“3 min standalone segments with crossfade transitions
8 Chapters AI detects topic boundaries; outputs JSON, Markdown, FFmetadata, and YouTube timestamps
9 Summary AI writes a Markdown README with captured screenshots
10 Social Media Platform-tailored posts for TikTok, YouTube, Instagram, LinkedIn, and X
11 Short Posts Per-short social media posts for all 5 platforms
12 Medium Clip Posts Per-medium-clip social media posts for all 5 platforms
13 Blog Dev.to blog post with frontmatter, web-sourced links via Exa
14 Queue Build Builds publish queue from social posts with scheduled slots
15 Git Push Auto-commits and pushes to origin main

Each stage can be independently skipped with --no-* flags. A stage failure does not abort the pipeline โ€” subsequent stages proceed with whatever data is available.


๐Ÿค– LLM Providers

VidPipe supports multiple LLM providers:

Provider Env Var Default Model Notes
copilot (default) โ€” Claude Opus 4.6 Uses GitHub Copilot auth
openai OPENAI_API_KEY gpt-4o Direct OpenAI API
claude ANTHROPIC_API_KEY claude-opus-4.6 Direct Anthropic API

Set LLM_PROVIDER in your .env or pass via CLI. Override model with LLM_MODEL.

The pipeline tracks token usage and estimated cost across all providers, displaying a summary at the end of each run.


โš™๏ธ Configuration

Configuration is loaded from CLI flags โ†’ environment variables โ†’ .env file โ†’ defaults.

# .env
OPENAI_API_KEY=sk-your-key-here
WATCH_FOLDER=/path/to/recordings
OUTPUT_DIR=/path/to/output
# EXA_API_KEY=your-exa-key       # Optional: enables web search in social/blog posts
# BRAND_PATH=./brand.json         # Optional: path to brand voice config
# FFMPEG_PATH=/usr/local/bin/ffmpeg
# FFPROBE_PATH=/usr/local/bin/ffprobe
# LATE_API_KEY=sk_your_key_here   # Optional: Late API for social publishing

Social media publishing is configured via schedule.json and the Late API. See Social Publishing Guide for details.


๐Ÿ“š Documentation

Guide Description
Getting Started Prerequisites, installation, and first run
Configuration All CLI flags, env vars, skip options, and examples
FFmpeg Setup Platform-specific install (Windows, macOS, Linux, ARM64)
Brand Customization Customize AI voice, vocabulary, hashtags, and content style
Social Publishing Review, schedule, and publish social posts via Late API

๐Ÿ—๏ธ Architecture

Agentic architecture built on the GitHub Copilot SDK โ€” each editing task is handled by a specialized AI agent:

graph TD
    BP[๐Ÿง  BaseAgent] --> SRA[SilenceRemovalAgent]
    BP --> SA[SummaryAgent]
    BP --> SHA[ShortsAgent]
    BP --> MVA[MediumVideoAgent]
    BP --> CA[ChapterAgent]
    BP --> SMA[SocialMediaAgent]
    BP --> BA[BlogAgent]

    SRA -->|tools| T1[detect_silence, decide_removals]
    SHA -->|tools| T2[plan_shorts]
    MVA -->|tools| T3[plan_medium_clips]
    CA -->|tools| T4[generate_chapters]
    SA -->|tools| T5[capture_frame, write_summary]
    SMA -->|tools| T6[search_links, create_posts]
    BA -->|tools| T7[search_web, write_blog]

    style BP fill:#1e3a5f,stroke:#60a5fa,color:#fff
Loading

Each agent communicates with the LLM through structured tool calls, ensuring reliable, parseable outputs.


๐Ÿ› ๏ธ Tech Stack

Technology Purpose
TypeScript Language (ES2022, ESM)
GitHub Copilot SDK AI agent framework
OpenAI Whisper Speech-to-text
FFmpeg Video/audio processing
Sharp Image analysis (webcam detection)
Commander.js CLI framework
Chokidar File system watching
Winston Logging
Exa AI Web search for social posts and blog

๐Ÿ—บ๏ธ Roadmap

  • Automated social posting โ€” Publish directly to platforms via Late API
  • Multi-language support โ€” Transcription and summaries in multiple languages
  • Custom templates โ€” User-defined Markdown & social post templates
  • Web dashboard โ€” Browser UI for reviewing and editing outputs
  • Batch processing โ€” Process an entire folder of existing videos
  • Custom short criteria โ€” Configure what makes a "good" short for your content
  • Thumbnail generation โ€” Auto-generate branded thumbnails for shorts

๐Ÿ”ง Troubleshooting

No binary found for architecture during install

ffmpeg-static (an optional dependency) bundles FFmpeg for common platforms. On unsupported architectures, it skips gracefully and vidpipe falls back to your system FFmpeg.

Fix: Install FFmpeg on your system:

  • Windows: winget install Gyan.FFmpeg
  • macOS: brew install ffmpeg
  • Linux: sudo apt install ffmpeg (Debian/Ubuntu) or sudo dnf install ffmpeg (Fedora)

You can also point to a custom binary: export FFMPEG_PATH=/path/to/ffmpeg

Run vidpipe doctor to verify your setup.


๐Ÿ“„ License

ISC ยฉ htekdev

About

CLI tool that auto-processes video recordings: transcribes, removes silence, generates captions, creates shorts, social posts, and more

Resources

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages