<a href="https://colab.research.google.com/github/intellectuellthinkingbeing/thinking-writing.md/blob/main/yt_dlp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# yt-dlp Demonstration Notebook

In this Jupyter notebook, we will explore the capabilities of yt-dlp, a powerful command-line tool that extends the functionality of YouTube-DL, allowing users to download videos from various online sources with additional features and options.

See [here](https://github.com/yt-dlp/yt-dlp) for more information.

In this notebook, we shall:

1. Install yt-dlp.
2. Showcase how to download videos from different sources.
3. Demonstrate how to fetch subtitles.
4. Discuss some other helpful commands.

## 1. Installation

In [None]:
!pip install yt-dlp

Collecting yt-dlp
  Downloading yt_dlp-2025.3.31-py3-none-any.whl.metadata (172 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/172.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m163.8/172.2 kB[0m [31m8.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m172.2/172.2 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading yt_dlp-2025.3.31-py3-none-any.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: yt-dlp
Successfully installed yt-dlp-2025.3.31


## 2. Downloading videos from different sources

The resultant video file should be in your session storage if you run this from Google Cloab, or just in the same folder as this script if you run it elsewhere.

The basic usage of the command is:

```
yt-dlp [OPTIONS] [--] URL [URL...]
```

You can explore options [here](https://pypi.org/project/yt-dlp/#video-selection).

### Basic Youtube Example


In [None]:
!yt-dlp -f mp4  "https://www.youtube.com/watch?v=FdpueKtaBEc&t=27505s"


[youtube] Extracting URL: https://www.youtube.com/watch?v=FdpueKtaBEc&t=27505s
[youtube] FdpueKtaBEc: Downloading webpage
[youtube] FdpueKtaBEc: Downloading tv client config
[youtube] FdpueKtaBEc: Downloading player 6450230e-main
[youtube] FdpueKtaBEc: Downloading tv player API JSON
[youtube] FdpueKtaBEc: Downloading ios player API JSON
[youtube] FdpueKtaBEc: Downloading m3u8 information
[info] FdpueKtaBEc: Downloading 1 format(s): 18
[download] Destination: Aidan Andrews ｜ Special Relativity & Neural Nets from scratch ｜ Deep Work Study Stream [FdpueKtaBEc].mp4
[K[download] 100% of  722.59MiB in [1;37m00:03:40[0m at [0;32m3.28MiB/s[0m


In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Downloading from Vimeo, and also setting the format to MP4.

In [None]:
!yt-dlp --format mp4 https://vimeo.com/794492622

## 3. Fetching Subtitles
This example wil download a video from Youtube, and if there is a subtitle track available, it will download that also. Depending on the avilability, you may also be able to specify the language of the subtitle track. Also note that we use `--embed-subs` to embed the subtitles into the mp4.

You can explore options [here](https://pypi.org/project/yt-dlp/#video-selection).

In [None]:
!yt-dlp --write-subs --embed-subs --format mp4  https://www.youtube.com/watch?v=020g-0hhCAU&ab_channel=Cocomelon-NurseryRhymes

If you used the example video above, you will can run the following cell to display the subtitles in the notebook. Otherwise, update the `subtitle_file_path` to match your subtitles file name (.vtt).

In [None]:

subtitle_file_path = "Baby Shark ｜ @CoComelon Nursery Rhymes & Kids Songs [020g-0hhCAU].en.vtt"

with open(subtitle_file_path, "r", encoding="utf-8") as file:
    subtitle_text = file.read()

print(subtitle_text)


## 4. Other helpful commands
These are **advanced** materials using yt-dlp's Python library.

### Modifying Metadata

In [None]:
# Interpret the title as "Artist - Title"
!yt-dlp --parse-metadata "%(series)s S%(season_number)02dE%(episode_number)02d:%(title)s" https://www.youtube.com/watch?v=BaW_jenozKc

### Extract information to JSON with Python

In [None]:
import json
import yt_dlp

URL = 'https://www.youtube.com/watch?v=BaW_jenozKc'

# ℹ️ See help(yt_dlp.YoutubeDL) for a list of available options and public functions
ydl_opts = {}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(URL, download=False)

    # ℹ️ ydl.sanitize_info makes the info json-serializable
    print(json.dumps(ydl.sanitize_info(info)))

### Extract audio with Python

In [None]:
import yt_dlp

URLS = ['https://www.youtube.com/watch?v=BaW_jenozKc']

ydl_opts = {
    'format': 'm4a/bestaudio/best',
    # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
    'postprocessors': [{  # Extract audio using ffmpeg
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'm4a',
    }]
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    error_code = ydl.download(URLS)

### Filter video with Python

In [None]:
import yt_dlp

URLS = ['https://www.youtube.com/watch?v=BaW_jenozKc']

def longer_than_a_minute(info, *, incomplete):
    """Download only videos longer than a minute (or with unknown duration)"""
    duration = info.get('duration')
    if duration and duration < 60:
        return 'The video is too short'

ydl_opts = {
    'match_filter': longer_than_a_minute,
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    error_code = ydl.download(URLS)

### Adding logger and progress hook

In [None]:
import yt_dlp

URLS = ['https://www.youtube.com/watch?v=BaW_jenozKc']

class MyLogger:
    def debug(self, msg):
        # For compatibility with youtube-dl, both debug and info are passed into debug
        # You can distinguish them by the prefix '[debug] '
        if msg.startswith('[debug] '):
            pass
        else:
            self.info(msg)

    def info(self, msg):
        pass

    def warning(self, msg):
        pass

    def error(self, msg):
        print(msg)


# ℹ️ See "progress_hooks" in help(yt_dlp.YoutubeDL)
def my_hook(d):
    if d['status'] == 'finished':
        print('Done downloading, now post-processing ...')


ydl_opts = {
    'logger': MyLogger(),
    'progress_hooks': [my_hook],
}

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    ydl.download(URLS)

### Add a custom PostProcessor

In [None]:
import yt_dlp

URLS = ['https://www.youtube.com/watch?v=BaW_jenozKc']

# ℹ️ See help(yt_dlp.postprocessor.PostProcessor)
class MyCustomPP(yt_dlp.postprocessor.PostProcessor):
    def run(self, info):
        self.to_screen('Doing stuff')
        return [], info


with yt_dlp.YoutubeDL() as ydl:
    # ℹ️ "when" can take any value in yt_dlp.utils.POSTPROCESS_WHEN
    ydl.add_post_processor(MyCustomPP(), when='pre_process')
    ydl.download(URLS)