Skip to content

Subtitle Learning Lab: extract, merge, and prepare bilingual subtitles (English/Chinese and more) for language learning.

Notifications You must be signed in to change notification settings

mxggle/subtitle-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Subtitle Pipeline

A practical subtitle engine for local media: list, extract, transcribe, translate, and merge subtitle tracks.

Core Workflow

flowchart TD
    A[Video/Audio File] --> B{Subtitles present?}
    B -->|Yes, multiple| C[Merge selected tracks]
    B -->|Yes, one| D[Translate if needed]
    B -->|No| E[Transcribe with Whisper]
    E --> D
    D --> F[Optional merge]
    C --> G[Output SRT]
    F --> G
Loading

Features

  • Inspect subtitle streams in MKV/MP4/etc.
  • Extract subtitle tracks by index or language
  • Convert subtitle tracks to SRT
  • Transcribe audio/video to SRT via Whisper
  • Translate subtitle files with OpenAI-compatible APIs
  • Merge tracks using overlap-based alignment

Prerequisites

  • Python 3.9+
  • ffmpeg + ffprobe
  • Optional for translate: openai, python-dotenv
  • Optional for transcribe: whisper CLI

Quick Start

# 1) List subtitle tracks
python scripts/pipeline.py list movie.mkv

# 2) Extract English subtitle track to SRT
python scripts/pipeline.py extract movie.mkv --language eng --to-srt

# 3) Translate SRT to Chinese
python scripts/pipeline.py translate movie.eng.srt --target-language "Chinese"

# 4) Merge English + Chinese tracks from container
python scripts/pipeline.py merge movie.mkv --languages eng chi

# 5) If no subtitles exist, transcribe first
python scripts/pipeline.py transcribe movie.mkv --model turbo

Install (Universal Agent Skills CLI)

Works across supported agents (OpenClaw, Codex, Claude Code, Cursor, etc.).

# Let skills CLI prompt/select target agents interactively
npx skills add https://github.com/mxggle/subtitle-pipeline --skill subtitle-pipeline --yes

Non-interactive examples:

# Install to specific agents
npx skills add https://github.com/mxggle/subtitle-pipeline --skill subtitle-pipeline --agent openclaw --agent codex --yes

# Install globally (all projects)
npx skills add https://github.com/mxggle/subtitle-pipeline --skill subtitle-pipeline --global --yes

# Install to all supported agents
npx skills add https://github.com/mxggle/subtitle-pipeline --skill subtitle-pipeline --agent '*' --yes

Publish & Discovery Essentials

If you want people to find and use this skill:

  1. Keep the repo public.
  2. Keep SKILL.md clean (name + clear description).
  3. Put the universal install snippet above in the README and share it in communities.
  4. Submit to curated skill lists/directories (skills ecosystem repos, community lists, directories) for discovery.

CLI Commands

list

python scripts/pipeline.py list <video>

extract

python scripts/pipeline.py extract <video> [--index N | --language CODE] [--to-srt] [--output PATH]

merge

python scripts/pipeline.py merge <video> [--indices N N | --languages CODE CODE] [--output PATH]

translate

python scripts/pipeline.py translate <srt-or-video> --target-language "Chinese" [--api-key ...] [--base-url ...] [--model ...] [--output ...]

transcribe

python scripts/pipeline.py transcribe <video-or-audio> [--model turbo] [--language en] [--output PATH]

Output Convention

  • movie.eng.srt
  • movie.zho.srt
  • movie.eng-chi.merged.srt

Example Output Files

When testing the skill on a sample video in the tests/ directory, the following files are naturally generated:

  • The best way to become good at something might surprise you - David Epstein.srt (generated by transcription)
  • The best way to become good at something might surprise you - David Epstein.chinese.srt (generated by translation)

Project Structure

subtitle-pipeline/
├── SKILL.md
├── README.md
├── scripts/
│   └── pipeline.py
├── tests/
│   ├── test_cli.py
│   ├── test_extract.py
│   ├── test_helpers.py
│   ├── test_merge.py
│   ├── test_merge_streams.py
│   ├── test_pick_stream.py
│   ├── test_probe_and_list.py
│   ├── test_transcribe.py
│   └── test_translation.py
└── references/
    └── subtitle-notes.md

Development

Run tests:

python3 -m pytest tests -v

Roadmap

  • list / extract / merge
  • OpenAI-compatible translation
  • Whisper transcription
  • API-backed transcription provider interface (optional backend)
  • Chunked long-video pipeline for reliability

License

Internal skill / private workflow project.

About

Subtitle Learning Lab: extract, merge, and prepare bilingual subtitles (English/Chinese and more) for language learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages