Subtitle

Open-source subtitle generation for seamless content translation.

Key Features:

Open-source: Freely available for use, modification, and distribution.
Self-hosted: Run the tool on your own servers for enhanced control and privacy.
AI-powered: Leverage advanced machine learning for accurate and natural-sounding subtitles.
Multilingual support: Generate subtitles for videos in a wide range of languages.
Easy integration: Seamlessly integrates into your existing workflow.

I made this project for fun, but I think it could also be useful for other people.

Installation

FFmpeg

First, you need to install FFmpeg. Here's how you can do it:

# On Linux
sudo apt install ffmpeg

Run

You can run the script from the command line using the following command:

python subtitle.py <filepath | video_url> [--model <modelname>]

Replace <filepath | video_url> with the path to your video file. The --model argument is optional. If not provided, it will use 'base' as the default model.

For example:

python subtitle.py /path/to/your/video.mp4 --model base

This will run the script on the video at /path/to/your/video.mp4 using the base model. Please replace /path/to/your/video.mp4 with the actual path to your video file.

Models

Here are the models you can use: Note: Use the .en model only when the video is in English.

tiny.en
tiny
tiny-q5_1
tiny.en-q5_1
base.en
base
base-q5_1
base.en-q5_1
small.en
small.en-tdrz
small
small-q5_1
small.en-q5_1
medium
medium.en
medium-q5_0
medium.en-q5_0
large-v1
large-v2
large
large-q5_0

Advance

You can modify the behaviour by using these parameters whisper binary as follows:

./whisper [options] file0.wav file1.wav ...

Options

Here are the options you can use with the whisper binary:

Option	Default	Description
`-h, --help`		Show help message and exit
`-t N, --threads N`	4	Number of threads to use during computation
`-p N, --processors N`	1	Number of processors to use during computation
`-ot N, --offset-t N`	0	Time offset in milliseconds
`-on N, --offset-n N`	0	Segment index offset
`-d N, --duration N`	0	Duration of audio to process in milliseconds
`-mc N, --max-context N`	-1	Maximum number of text context tokens to store
`-ml N, --max-len N`	0	Maximum segment length in characters
`-sow, --split-on-word`	false	Split on word rather than on token
`-bo N, --best-of N`	2	Number of best candidates to keep
`-bs N, --beam-size N`	-1	Beam size for beam search
`-wt N, --word-thold N`	0.01	Word timestamp probability threshold
`-et N, --entropy-thold N`	2.40	Entropy threshold for decoder fail
`-lpt N, --logprob-thold N`	-1.00	Log probability threshold for decoder fail
`-debug, --debug-mode`	false	Enable debug mode (eg. dump log_mel)
`-tr, --translate`	false	Translate from source language to English
`-di, --diarize`	false	Stereo audio diarization
`-tdrz, --tinydiarize`	false	Enable tinydiarize (requires a tdrz model)
`-nf, --no-fallback`	false	Do not use temperature fallback while decoding
`-otxt, --output-txt`	true	Output result in a text file
`-ovtt, --output-vtt`	false	Output result in a vtt file
`-osrt, --output-srt`	false	Output result in a srt file
`-olrc, --output-lrc`	false	Output result in a lrc file
`-owts, --output-words`	false	Output script for generating karaoke video
`-fp, --font-path`	/System/Library/Fonts/Supplemental/Courier New Bold.ttf	Path to a monospace font for karaoke video
`-ocsv, --output-csv`	false	Output result in a CSV file
`-oj, --output-json`	false	Output result in a JSON file
`-ojf, --output-json-full`	false	Include more information in the JSON file
`-of FNAME, --output-file FNAME`		Output file path (without file extension)
`-ps, --print-special`	false	Print special tokens
`-pc, --print-colors`	false	Print colors
`-pp, --print-progress`	false	Print progress
`-nt, --no-timestamps`	false	Do not print timestamps
`-l LANG, --language LANG`	en	Spoken language ('auto' for auto-detect)
`-dl, --detect-language`	false	Exit after automatically detecting language
`--prompt PROMPT`		Initial prompt
`-m FNAME, --model FNAME`	models/ggml-base.en.bin	Model path
`-f FNAME, --file FNAME`		Input WAV file path
`-oved D, --ov-e-device DNAME`	CPU	The OpenVINO device used for encode inference
`-ls, --log-score`	false	Log best decoder scores of tokens
`-ng, --no-gpu`	false	Disable GPU

Example for running Binary

Here's an example of how to use the whisper binary:

./whisper -m models/ggml-tiny.en.bin -f Rev.mp3 out.wav -nt --output-vtt

License

MIT

Reference & Credits

Authors

Ved Gupta

🚀 About Me

Just try to being a Developer!

Support

For support, email vedgupta@protonmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
.vscode		.vscode
app		app
binary		binary
data		data
example		example
models		models
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt
subtitle.py		subtitle.py

License

innovatorved/subtitle

Folders and files

Latest commit

History

Repository files navigation

Subtitle

Installation

FFmpeg

Run

Models

Advance

Options

Example for running Binary

License

Reference & Credits

Authors

🚀 About Me

Support

About

Resources

License

Stars

Watchers

Forks

Languages