# Explore and Run Shot Detection

Running facial regonition and embedding on every single frame is expensive. Not to mention, having to query a database or even store information for every frame would get dense.

Can we use shot detection/boundaries to determine when sequential frames show significant variation? If yes, we can run facial detection for key frames from these shots, which will *hopefully* contain all individuals appearing in a given shot.

In [1]:
import sys 

sys.path.append("/Users/srmarshall/Desktop/code/pbs/pbs-passthrough/")

In [2]:
# set mp4 filepath - in this case a path pointing to a grantchester episode 
mp4_filepath = "/Users/srmarshall/Desktop/data-dump/dynamic_recaps/video_assets/full_length/30d5ccbd-f2ce-4fd7-99d9-ee28236d9af9.mp4"

## Shot-Based Detection

Try to find shot boundaries using different metrics

### Process MP4 File 

In order to perform shot/scene detection we need to attach some information that will help us quantify the content appearing in a frame 

Using these metrics we can calculate when a significant shift occurs between frames and mark it as a shot boundary

In [3]:
from utils.helpers import video_procesisng_pipeline

# run video processing pipeline
df = video_procesisng_pipeline(mp4_filepath=mp4_filepath, load_previous=True)

Processing MP4 Frames: 100%|██████████| 95025/95025 [01:42<00:00, 924.36frames/s]
Extracting Features:   0%|          | 16/95025 [00:00<10:19, 153.24 frames/s]

Error extracting features: cannot access local variable 'edge' where it is not associated with a value


Extracting Features: 95026 frames [10:15, 154.34 frames/s]                      


In [4]:
## write to df to avoid reprocessing 
# df.to_csv("../assets/grantchester_sample_features.csv", index=False) ## SM: only needs to be done once

In [5]:
df.head()

Unnamed: 0,frame_number,timestamp,edges,pixel_diffs,bhattacharyya_distance
0,1,0.0,0,0.0,0.0
1,2,33.366667,0,0.0,0.0
2,3,66.733333,0,0.0,0.0
3,4,100.1,0,0.0,0.0
4,5,133.466667,0,0.0,0.0


In [6]:
df.tail()

Unnamed: 0,frame_number,timestamp,edges,pixel_diffs,bhattacharyya_distance
95020,95021,3170501.0,0,0.0,0.0
95021,95022,3170534.0,0,0.0,0.0
95022,95023,3170567.0,0,0.0,0.0
95023,95024,3170601.0,0,0.0,0.0
95024,95025,3170634.0,0,,


### Detect Key Frames

Use the metrics we have available to us to try and determine when a frame marks a shot boundary 

For now, let's use the `bhattacharyya_distance`

In [2]:
import pandas as pd 

# read features dataset
df = pd.read_csv("../assets/grantchester_sample_features.csv")

In [3]:
# summary stats
df.describe()

Unnamed: 0,frame_number,timestamp,edges,pixel_diffs,bhattacharyya_distance
count,95025.0,95025.0,95025.0,95024.0,93983.0
mean,47513.0,1585317.0,12918.538374,18438.00222,0.021828
std,27431.499002,915297.7,16596.805285,56571.436779,0.040937
min,1.0,0.0,0.0,0.0,0.0
25%,23757.0,792658.5,4045.0,3.0,0.009606
50%,47513.0,1585317.0,7504.0,1522.0,0.013762
75%,71269.0,2377976.0,14369.0,12630.25,0.020771
max,95025.0,3170634.0,143585.0,830705.0,0.994448


In [4]:
# pull mean and std 
mean = df["bhattacharyya_distance"].mean()
std = df["bhattacharyya_distance"].std()

# set threshold to 2 std above the mean
threshold = mean + 2*std

In [5]:
import numpy as np 

# mark shot boundaries using the threshold 
df["is_keyframe"] = np.where(df["bhattacharyya_distance"] > threshold, 1, 0)

In [6]:
# how many shots do we have?
df.value_counts("is_keyframe")

is_keyframe
0    93307
1     1718
Name: count, dtype: int64

In [7]:
# extract keyframes for analysis
keyframe_df = df[df["is_keyframe"] == 1]

## Embed Key Frames

For each key frame we want to know who (if anyone) appears in the frame 

Run facial recognition on the frame. If at least 1 face is found:
- For each face in the image, query the database and find the top 3 matches for the face 
    - Show as a dictionary `{id: similarity_score, id2: similarity_score, id3: similarity_score}`
- For each frame in `keyframe_df` insert this dictionary into a `matches` column

In [4]:
from utils.facial_recognition import FacialRecognition

# create facial recognition instance
facial_recognition = FacialRecognition()

In [3]:
from utils.pg_client import PGClient
import os 

# instantiate pg client
db = PGClient(
    host=os.getenv("PG_HOST"),
    db_name=os.getenv("PG_DB"),
    user=os.getenv("PG_USER"),
    password=os.getenv("PG_PASSWORD")
)

In [34]:
for number in range(1, 500 + 1, 100):
    print(number)

1
101
201
301
401


Right now we're taking every `x` frames from the video 

A future iteration would refine this to capture informationally dense scenes:
- Tried histogram based shot extraction but got some odd results and reverted to a set increment 
- Might want to use edge detection and capture edge-dense frames!

In [39]:
import cv2 

# start a video capture 
cap = cv2.VideoCapture(mp4_filepath)

# initialize a list to hold our information 
data = []

# generate frame numbers list 
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
frame_numbers = list(range(1, total_frames, 100))

# process ever 100th frame to get some actor/actress information
for frame_number in frame_numbers:

    # set to frame number
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_number)

    # read frame
    read, frame = cap.read()

    # write frame to the processing_image slot 
    cv2.imwrite("../assets/processing_image.jpg", frame)

    # run facial embedding pipeline
    boxed_image, faces, embeddings = facial_recognition.embedding_pipeline("../assets/processing_image.jpg")

    # skip over the frame if no faces were found
    if len(embeddings) == 0:
        pass 

    # if there are embeddings, process them
    for embedding in embeddings:

        # convert the embedding to a string so we can query
        embedding_str =", ".join(map(str, embedding))
        
        # fill in the query with out string embedding 
        query = f"SELECT id, actor_actress, about, 1-(facial_embedding <=>'[{embedding_str}]') AS score FROM masterpiece_grantchester ORDER BY facial_embedding <=> '[{embedding_str}]' LIMIT 1;"
        
        # parse query results 
        result = db.execute_query(query)
        id = result[0][0]
        actor_actress = result[0][1]    
        about = result[0][2]    
        score = result[0][3]

        # add dictionary of information to data list 
        data.append({
            "frame_number": frame_number,
            "id": id, 
            "actor_actress": actor_actress, 
            "about": about, 
            "score": score
        })



In [41]:
import pandas as pd 

# convert data to a data frame
frame_metadata  = pd.DataFrame(data)

In [46]:
# subset to above 60% confidence on facial matches 
confident_metadata = frame_metadata[frame_metadata["score"] > 0.60]

# write to csv 
# confident_metadata.to_csv("../flask_video_player/data/confident_metadata.csv", index=False)

In [50]:
confident_metadata

Unnamed: 0,frame_number,id,actor_actress,about,score
14,2001,f9400dff-116f-419d-8788-dd92172e3872,Morven Christie,"{""About the Character: Amanda is the beautiful...",0.663547
16,2501,cd10ef26-e72e-4664-a747-a76f4c2ef89b,James Norton,"{""About the Character: Sidney is a young man ...",0.621140
20,3201,cd10ef26-e72e-4664-a747-a76f4c2ef89b,James Norton,"{""About the Character: Sidney is a young man ...",0.632106
34,4301,cd10ef26-e72e-4664-a747-a76f4c2ef89b,James Norton,"{""About the Character: Sidney is a young man ...",0.603185
43,5601,cd10ef26-e72e-4664-a747-a76f4c2ef89b,James Norton,"{""About the Character: Sidney is a young man ...",0.689416
...,...,...,...,...,...
731,92301,cd10ef26-e72e-4664-a747-a76f4c2ef89b,James Norton,"{""About the Character: Sidney is a young man ...",0.666566
732,92401,f4da19ce-8add-4c5f-a9fd-7606ae4ba279,Robson Green,"{""About the Character: Geordie is the lovably ...",0.762335
733,92401,cd10ef26-e72e-4664-a747-a76f4c2ef89b,James Norton,"{""About the Character: Sidney is a young man ...",0.638086
734,92501,f4da19ce-8add-4c5f-a9fd-7606ae4ba279,Robson Green,"{""About the Character: Geordie is the lovably ...",0.734837


In [62]:
texts = []

for item in confident_metadata["about"]:
    new_text = ""
    for subset in item.split('"'):
        if len(subset) < 10:
            pass 
        else:
            new_text += subset.replace(",", " ")
    texts.append(new_text)

In [63]:
confident_metadata["about"] = texts

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  confident_metadata["about"] = texts


In [64]:
confident_metadata.to_csv("../flask_video_player/data/confident_metadata.csv", index=False)