Command-line program to download and segment Youtube videos automatically.
Follow these instructions.
Store your API key as YT_API_KEY
in a .env
file.
# Make sure ffmpeg is installed.
sudo apt install ffmpeg
virtualenv venv
source venv/bin/activate
pip install -r envs/requirements.txt
ytcompdl -h
# Setup env.
conda env create -f envs/env.yaml -n ytcompdl
conda activate ytcompdl
ytcompdl -h
ffmpeg
comes installed with the docker image.
Arguments are passed after the image name.
# Image wd set to /ytcompdl
docker run --rm -v /$PWD:/ytcompdl koisland/ytcompdl:latest -h
To build the image locally.
docker build . -t ytcompdl:latest
# Download audio of video.
ytcompdl -u "https://www.youtube.com/watch?v=gIsHl7swEgk" -k .env -o "audio" -x config/config_regex.yaml
# Download split audio of video and save comment/desc used to timestamp.
ytcompdl -u "https://www.youtube.com/watch?v=gIsHl7swEgk" \
-k .env \
-o "audio" \
-x config/config_regex.yaml \
-t -s
usage: ytcompdl [-h] -k KEY -u URL -o OUTPUT_TYPE -x REGEX_CFG [-d DIRECTORY] [-n N_CORES] [-r RESOLUTION] [-m METADATA] [-c] [-t] [-s] [-f FADE] [-ft FADE_TIME] [-rm]
Command-line program to download and segment Youtube videos.
optional arguments:
-h, --help show this help message and exit
-k KEY, --key KEY Youtube API key as .env file.
-u URL, --url URL Youtube URL
-o OUTPUT_TYPE, --output_type OUTPUT_TYPE
Desired output (audio/video)
-x REGEX_CFG, --regex_cfg REGEX_CFG
Path to regex config file (.yaml)
-d DIRECTORY, --directory DIRECTORY
Output directory.
-n N_CORES, --n_cores N_CORES
Use n cores to process tracks in parallel.
-r RESOLUTION, --resolution RESOLUTION
Desired resolution (video only)
-m METADATA, --metadata METADATA
Path to optional metadata (.json)
-c, --comment Select comment.
-t, --timestamps Save timestamps as .txt file.
-s, --slice Slice output.
-f FADE, --fade FADE Fade (in/out/both/none)
-ft FADE_TIME, --fade_time FADE_TIME
Fade time in seconds.
-rm, --rm_src Remove downloaded source file after processing.
To set your own regular expressions to search for in video comments/descriptions, modify config/config_regex.yaml
.
config/config_regex.yaml
ignored_spacers: # Optional
- "―"
- "―"
- "-"
- "\\s"
- "["
- "]"
time: "\\d{1,2}:?\\d*:\\d{2}" # Optional
# Required
start_timestamp: "(.*?)(?:{ignored_spacers})*({time})(?:{ignored_spacers})*(.*)"
duration_timestamp: "(.*?)(?:{ignored_spacers})*({time})(?:{ignored_spacers})*({time})(?:{ignored_spacers})*(.*)"
For some examples, check these patterns below:
- Query YouTube's Data API for selected video.
- Search description and comments for timestamps ranked by similarity to video duration.
- Parse timestamps with regular expresions.
- Download video and/or audio streams from Youtube.
- Process streams.
- Merge or convert streams.
- Slice by found timestamps.
- Apply file metadata.
- Add audio and/or video fade.
- Cleanup
- Remove intermediate outputs.
virtualenv venv && source venv/bin/activate
python setup.py sdist bdist_wheel
ytcompdl -h
- Testing
- Add unittests.