## Download the video

We're going to use [yt-dlp](https://github.com/yt-dlp/yt-dlp) to download a video. It's not very fun to use so I recommend taking any troubles/questions to ChatGPT. It's a pretty popular piece of software so ChatGPT usually has good answers.

First we'll install it...

In [None]:
%pip install --quiet --upgrade "yt-dlp[default]"

...then we'll use it to download [this video](https://www.youtube.com/shorts/rDXubdQdJYs). If you want a different video, you just change the URL inside of the quotes. It also works for TikTok!

In [None]:
!yt-dlp "https://www.youtube.com/shorts/rDXubdQdJYs"

## Split the scenes

Our question this time is: who got more screen time in this video, Joe Biden or Donald Trump? We're going to measure it by counting the length of scenes for Biden and the number of scenes for Trump.

We're going to use [PySceneDetect](https://www.scenedetect.com/) to split our scenes.

We'll first download it...

In [None]:
%pip install --upgrade --quiet scenedetect

...then use it to split the video into separate scenes (and few other things, too!).

When running the code below, pay attention to the **Merging formats into...** line from yt-dlp above. That's how you know know what filename to use! Sometimes the video is an mp4, but sometimes it's a webm or other format.

When we run the command below, it will...

- `detect-content` will split the scenes in a flexible way (there are other options, too)
- `save-images` will save five images for each scene. They'll be in the `output` folder and be 300 pixels wide.
- `export-html` will save an HTML file that we can use to see an overview of each scene
- `list-scenes` will save a CSV file that lists each scene, along with details

If you wanted to see separate video files for each scene you could also add `split-video` at the end.

In [None]:
!scenedetect -i "The CNN Presidential debate in 60 seconds [rDXubdQdJYs].webm" \
    detect-content \
    save-images --output output --width 300 --num-images 5 \
    export-html --image-width 300 \
    list-scenes --skip-cuts

## Optional: Download the images and HTML if you're on Google Colab

If you're running this notebook online, maybe you want to download the output to be able to play with it.

Honestly, you can also give up coding at this point! It's 100% possible to open up the image files directly and edit the CSV file in Excel, instead of doing all of the AI stuff we're about to do.

In [None]:
try:
    from google.colab import files

    !zip -r --quiet output.zip output

    files.download("output.zip")
except:
    print("Not on Google Colab! Not downloading")

## Looking at one image

We'll start by inspecting **a single image**. You will need to change the filename if you're using a different video than me.

In [None]:
%pip install -q "transformers[torch]" pillow

In [None]:
from IPython.display import Image

filename = "output/The CNN Presidential debate in 60 seconds [rDXubdQdJYs]-Scene-002-04.jpg"
Image(filename=filename) 

We want to know who's in the image, but we can't use Claude or GPT – it cares too much about privacy and won't talk about people!

Instead, we're lucky that [this smaller, open-source model](https://huggingface.co/openai/clip-vit-large-patch14) can do the detecting for us. You can test it out on the right-hand side of that page under **Inference API**.

This model is a "zero-shot classifier," which means we don't need to teach it what we're looking for, it already knows what (many) things in the world are.

> There are lot of different ways of analyzing images, including:
>
> - **Classification:** Put this image into a category
> - **Object detection:** Find specific objects in the image
> - **Semantic segmentation:** See what pixels belong to what (cars, people, the sky, etc)
>
> You can see a few examples at [normalai.org](https://normalai.org/), but it also might be useful to look at the [Hugging Face tasks page](https://huggingface.co/tasks). You can also email me, I'm happy to chat! You can find me at [js4571@columbia.edu](mailto:js4571@columbia.edu)

It's good to test our model first, because it might actually *not* understand what you want it to identify. While most models are good at things like cats and dogs and boats, our specific use case might be outside of its knowledge.

In [None]:
from transformers import pipeline
from PIL import Image

image = Image.open(filename)

detector = pipeline("zero-shot-image-classification", model="openai/clip-vit-large-patch14") 
results = detector(image, candidate_labels=["donald trump", "joe biden"])
results

According to the model, it's almost 100% certain that the image is of Joe Biden.

## Looking at all of the images

Right now we're creating five images for each scene, `Scene-XXX-01.jpg` through `Scene-XXX-05.jpg`. I'm going to take the middle image – `Scene-XXX-03.jpg` and say it's representative of the rest of the scene.

If we were doing this "correctly" we'd probably look at all five images and pick the most popular label.......but this is easier!

In [None]:
import glob

filenames = glob.glob("output/*-Scene-*-03.jpg")
filenames.sort()
filenames

In [None]:
from transformers import pipeline
from PIL import Image

answers = []
for filename in filenames:
    image = Image.open(filename)
    
    detector = pipeline("zero-shot-image-classification", model="openai/clip-vit-large-patch14") 
    results = detector(image, candidate_labels=["donald trump", "joe biden"])
    top_result = results[0]

    if top_result['score'] > 0.9:
        label = top_result['label']
    else:
        label = 'unknown'

    answers.append({
        'filename': filename,
        'label': label
    })

    print(filename, label)

## Combining our data

Let's turn those answers into a **dataframe**, the Python equivalent to a CSV.

In [None]:
import pandas as pd

pd.options.display.max_colwidth = 200

results_df = pd.DataFrame(answers)
results_df

Remember when we first analyzed our video, and it gave us a CSV with information about each scene? It included start times, end times, length in seconds, etc. We can read that into a dataframe, too, and combine it with our labels.

In [None]:
df = pd.read_csv("The CNN Presidential debate in 60 seconds [rDXubdQdJYs]-Scenes.csv")
df

In [None]:
merged = results_df.join(df)
merged

We'll now save it to a CSV in case we want to look at it in Excel or Google Sheets.

In [None]:
merged.to_csv("merged.csv", index=False)

And if we are currently on Google Colab, it might make sense to download it to our own computer.

In [None]:
try:
    from google.colab import files

    files.download("merged.csv")
except:
    print("Not on Google Colab! Not downloading")

# Do our final analysis

Now let's finally get an answer to the question: who got more screen time, Joe Biden or Donald Trump?

In [None]:
merged.groupby('label')['Length (seconds)'].sum().reset_index()