- Introduction
- Features
- Installation
- Quick Start
- Usage
- Configuration
- Project Structure
- Development
- System Requirements
- FAQ
- Contributing
- License
Video Subtitle Generator is a professional tool for automatically generating subtitles for videos using advanced AI speech recognition technology. Built on the state-of-the-art faster-whisper engine, it provides high-accuracy transcription with an intuitive interface for both beginners and professionals.
- 🚀 Lightning Fast: GPU-accelerated processing with optimized performance
- 🎯 High Accuracy: Powered by Whisper models for superior recognition
- 🛠️ Professional Features: Advanced audio enhancement and VAD support
- 💻 Dual Interface: Both CLI and GUI for different workflows
- 🌍 Multilingual: Supports multiple languages with auto-detection
- ⚡ High-Efficiency Processing: Batch process multiple video files with a single command
- 🎙️ Advanced ASR Engine: Powered by faster-whisper for exceptional accuracy
- 📝 Multiple Formats: Export to SRT (universal) or ASS (styled subtitles)
- 🎨 Quality Presets: Four optimization modes (pro, quality, balanced, speed)
- 🔊 Audio Enhancement: Voice optimization and noise reduction profiles
- 🎚️ VAD Support: Voice Activity Detection for cleaner subtitles
- 🖥️ Dual Interface: Command-line and graphical interfaces
- 🌐 Multi-language: Automatic language detection and support
| Mode | Use Case | Description |
|---|---|---|
pro |
Professional Editing | Highest accuracy + advanced post-processing |
quality |
Final Production | High-quality recognition + strict deduplication |
balanced |
Default | Optimal balance between quality and speed |
speed |
Quick Drafts | Faster processing with acceptable accuracy |
- SRT: Universal compatibility, simple format
- ASS: Advanced styling, positioning, and effects
- Python 3.9 or higher
- FFmpeg (required for audio extraction)
- CUDA 11.8+ (optional, for GPU acceleration)
Windows:
# Using Chocolatey
choco install ffmpeg
# Or download from https://ffmpeg.org/download.htmlmacOS:
brew install ffmpegLinux:
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# CentOS/RHEL
sudo yum install ffmpegBasic Installation:
pip install -e .Development Installation (includes GUI and dev tools):
pip install -e ".[dev,gui]"# Check CLI
video-subtitle --help
# Launch GUI (if installed)
video-subtitle-gui# Basic usage - process a single video
video-subtitle video.mp4
# Specify output format (SRT or ASS)
video-subtitle video.mp4 -f ass
# Use high-quality mode with voice enhancement
video-subtitle video.mp4 -q pro --audio-enhance strong
# Batch process multiple videos
video-subtitle video1.mp4 video2.mp4 video3.mp4 -o ./subtitles
# Use voice priority template (recommended for dialogue)
video-subtitle video.mp4 -q pro --audio-enhance strong --vad-profile voice_focus# Launch the graphical interface
video-subtitle-guiThe GUI provides:
- 📁 Drag-and-drop file management
- ⚙️ Visual configuration panel
▶️ Real-time processing progress- 📊 Step-by-step timing statistics
video-subtitle <video_files> [options]
Positional Arguments:
video_files One or more video files to process
Options:
-h, --help Show help message and exit
-f, --format FORMAT Subtitle format: srt, ass (default: srt)
-o, --output DIR Output directory (default: same as video)
-q, --quality MODE Quality mode: pro, quality, balanced, speed (default: balanced)
--audio-enhance MODE Audio enhancement: off, voice, strong (default: off)
--vad-profile PROFILE VAD profile: voice_focus, balanced, noise_robust, fast
--overwrite Overwrite existing subtitle files
--device DEVICE Processing device: cuda, cpu (default: auto-detect)
--language LANG Source language: auto-detect, en, zh, ja, ko, etc.
--model MODEL Model size: large_v3_turbo, large_v3, medium, small, base, tiny
1. Process with default settings:
video-subtitle my_video.mp42. Generate ASS format with styling:
video-subtitle lecture.mp4 -f ass -o ./subtitles3. Maximum quality for professional use:
video-subtitle documentary.mp4 -q pro --audio-enhance strong4. Fast processing for quick preview:
video-subtitle webinar.mp4 -q speed5. Process entire directory:
video-subtitle *.mp4 -o ./output| Mode | Accuracy | Speed | Best For |
|---|---|---|---|
pro |
★★★★★ | ★★☆☆☆ | Professional subtitles, final releases |
quality |
★★★★☆ | ★★★☆☆ | High-quality productions |
balanced |
★★★☆☆ | ★★★★☆ | General use (recommended default) |
speed |
★★☆☆☆ | ★★★★★ | Quick drafts, long videos |
off: No enhancement (fastest)voice: Voice optimization (light processing)strong: Aggressive voice enhancement (best for noisy audio)
voice_focus: High sensitivity, prioritizes human voicebalanced: Standard detection (default)noise_robust: Low sensitivity, filters background noisefast: Quick detection for real-time processing
| Model | Size | Speed | Accuracy | VRAM Usage |
|---|---|---|---|---|
large_v3_turbo |
1.5 GB | ⚡⚡⚡ | ★★★★★ | ~8 GB |
large_v3 |
3.1 GB | ⚡⚡ | ★★★★★+ | ~12 GB |
medium |
1.5 GB | ⚡⚡⚡ | ★★★★☆ | ~6 GB |
small |
480 MB | ⚡⚡⚡⚡ | ★★★☆☆ | ~3 GB |
base |
140 MB | ⚡⚡⚡⚡⚡ | ★★☆☆☆ | ~1 GB |
tiny |
75 MB | ⚡⚡⚡⚡⚡ | ★☆☆☆☆ | ~500 MB |
Recommendation: Use large_v3_turbo for the best balance of speed and accuracy.
video-subtitle-generator/
├── src/
│ └── video_subtitle/
│ ├── __init__.py # Package initialization
│ ├── config.py # Configuration definitions
│ ├── asr.py # ASR engine wrapper
│ ├── subtitle.py # Subtitle generation logic
│ ├── audio.py # Audio extraction/enhancement
│ ├── processor.py # Main processing pipeline
│ ├── cache.py # Model caching utilities
│ ├── config_manager.py # Configuration persistence
│ ├── cli.py # Command-line interface
│ ├── gui.py # Graphical user interface
│ ├── i18n.py # Internationalization system
│ └── locales/
│ ├── en_US.json # English translations
│ └── zh_CN.json # Chinese translations
├── tests/
│ ├── test_config.py
│ ├── test_subtitle.py
│ ├── test_asr.py
│ ├── test_audio.py
│ ├── test_processor.py
│ ├── test_cli.py
│ └── test_integration.py
├── docs/ # Documentation
├── pyproject.toml # Project configuration
├── requirements.txt # Dependencies
├── start.bat # Windows launcher
└── README.md # This file
# Clone the repository
git clone https://github.com/yourusername/video-subtitle-generator.git
cd video-subtitle-generator
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install development dependencies
pip install -e ".[dev,gui]"# Run all tests
pytest
# Run with coverage report
pytest --cov=src/video_subtitle --cov-report=html
# Run specific test file
pytest tests/test_asr.py -v# Format code
black src tests
# Lint code
ruff check src tests
# Type checking
mypy src/video_subtitle# Generate API documentation
# (Add your preferred documentation tool here)- OS: Windows 10 / macOS 11 / Linux (Ubuntu 20.04+)
- CPU: Dual-core processor (Quad-core recommended)
- RAM: 4 GB (8 GB recommended)
- Storage: 2 GB free space
- Python: 3.9 or higher
- GPU: NVIDIA GPU with 4 GB+ VRAM
- CUDA: 11.8 or higher
- Driver: Latest NVIDIA driver
- FFmpeg: Required for audio extraction
- CUDA Toolkit: For GPU acceleration (included with PyTorch)
A: Processing speed depends on several factors:
- Hardware: GPU acceleration provides 10-50x speedup
- Model size: Smaller models (tiny, base) are faster
- Video length: Longer videos naturally take more time
- Quality mode:
speedmode is significantly faster
Solutions:
- Enable GPU acceleration (ensure CUDA is installed)
- Use a smaller model:
--model base - Switch to speed mode:
-q speed
A: This is a common network issue. Try these solutions:
Option 1: Manual Download
# Download from Hugging Face
# https://huggingface.co/guillaumekln/faster-whisper-large-v3-turbo
# Place in cache directory:
# Windows: C:\Users\<user>\.cache\huggingface\hub\
# macOS/Linux: ~/.cache/huggingface/hub/Option 2: Use Mirror
# Set HF_ENDPOINT environment variable
export HF_ENDPOINT=https://hf-mirror.comA: Accuracy varies by language and audio quality:
- English: ~95%+ with
large_v3_turbo - Chinese: ~90%+ with proper language setting
- Other languages: 85-95% depending on training data
For best results:
- Use
proorqualitymode - Enable audio enhancement for noisy sources
- Specify the correct language if auto-detection fails
A: Yes, when using ASS format:
- Edit the generated
.assfile in a text editor - Modify the
[V4+ Styles]section - Use tools like Aegisub for advanced styling
A: Currently, it's designed for batch processing. Real-time processing is planned for future releases.
We welcome contributions from the community! Here's how you can help:
- 🐛 Report Bugs: Submit issues with reproduction steps
- 💡 Suggest Features: Share your ideas for improvements
- 📝 Fix Typos: Correct documentation or comments
- 🔧 Submit PRs: Implement features or fix bugs
- 🌍 Translations: Help localize the application
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 style guidelines
- Write tests for new features
- Document public APIs
- Keep commits atomic and well-described
This project is licensed under the MIT License - see the LICENSE file for details.
- ✅ Free to use for personal and commercial projects
- ✅ Modify and distribute
- ✅ No warranty provided
- ✅ Include license notice in distributions
- faster-whisper: High-performance Whisper inference
- OpenAI Whisper: Revolutionary speech recognition model
- FFmpeg: Multimedia processing toolkit
- All Contributors: Thank you for your support!
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: tech@example.com
Made with ❤️ by the Video Subtitle Generator Team
⭐ If you find this project helpful, please give it a star!