## Processing the videos

This notebook contains info on reading the video files preprocessing them and feed the frames to the trained network and returning the timestamp (in milliseconds) when credits start running and its frame ID.

In [9]:
# import sys
# sys.path.append('./Lib/site-packages')
# !{sys.executable} -m pip install -r requirements.txt

In [10]:
import cv2
import math
import numpy as np
from keras import models

**Loading the ResNet50 model created in the previous notebook**

In [2]:
model = models.load_model('model/closing_credits_Resnet50.h5')

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


### Openning the video file and initializing the required variables
Since the model has been trained on square images we want to only extract the center square from the frame. Therefore, _cutoff_ variable is defined to skip columns from left and right of the frame later on. To test this pipeline, an open source movie called _Sintel_ is used which is available for downloading at https://durian.blender.org/download/ (Specifically, 2048 x 872 (270 Mb, mp4, 5.1) version was used here.

In [3]:
metadata = [] # Contains the timestamp (in milliseconds) and frame ID of all frames fed into the model
frames = [] # Contains the frames themselves

video_file = '../data/sintel-2048-surround.mp4'
capture = cv2.VideoCapture(video_file)

width = capture.get(3)
height = capture.get(4)
cutoff = int((width - height)/2)
frame_rate = capture.get(5)
total_frames = capture.get(7)

### Reading and formatting the frames

Here we read the frames one by one and only store the frames from the last 25% of the video because credits wouldn't normally start any earlier of that point in series and movies. In addition we capture a square at the center of the frame and resize is to be fed into ResNet50.

In [4]:
while(capture.isOpened()):
    frame_info = {"time_progress": capture.get(0),
                  "frame_id": capture.get(1)}
    ret, frame = capture.read()
    if ret != True:
        break
    if frame_info['frame_id']/total_frames > 0.75 and frame_info['frame_id'] % math.floor(frame_rate/10) == 0:
        metadata.append(frame_info)
        frame = frame[:, cutoff:-cutoff, :]
        frame = cv2.resize(frame, (224, 224))/255.0
        frames.append(frame)

frames = np.array(frames)
capture.release()

### Predicting the classes

In [5]:
prediction_classes = model.predict_classes(frames)
estimates = np.array([x[0] for x in prediction_classes])

### Extracting the frame ID where the credits start rolling

The following function gets the predictions and runs a sliding window to check where we would have 50 (window_size) consecutive frames classified as credits and returns the beginning index.

In [6]:
def get_starting_index(estimates, window_size=50):
    window = np.zeros((window_size,))
    count = 0
    for i in range(estimates.shape[0]-window_size):
        if count == 10:
            return index
        if np.sum(estimates[i:(i+window_size)] == window)/window_size > 0.95:
            if count == 0:
                index = i
            count += 1
        else:
            count = 0
            index = None
    return None

### Extracting the metadata

Finally, given the index we can return the timestamp in the movie where credits start rolling and its frame number which in this case it starts at 743916 milliseconds into the movie at frame 17854. Which is accurate! :)

In [7]:
metadata[get_starting_index(estimates)]

{'time_progress': 743916.6666666666, 'frame_id': 17854.0}