# Session 16: Moving Images with the Distant Viewing Toolkit

We further extend our techniques to working with moving images.

In [None]:
%pylab inline

import numpy as np
import scipy as sp
import pandas as pd
import sklearn
from sklearn import linear_model
import urllib
import keras
import dvt

import os
from os.path import join

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches

plt.rcParams["figure.figsize"] = (12,12)

In [None]:
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

## DVT Demo

We are going to look at a very short clip of an episode of Friends. Let's
load in the functions that we will use.

In [None]:
from dvt.annotate.core import FrameProcessor, FrameInput
from dvt.annotate.diff import DiffAnnotator
from dvt.annotate.face import FaceAnnotator, FaceDetectDlib, FaceEmbedVgg2
from dvt.annotate.meta import MetaAnnotator
from dvt.annotate.png import PngAnnotator
from dvt.aggregate.cut import CutAggregator

import logging
logging.basicConfig(level='INFO')

Start by constructing a frame input object attached to the video file. The bsize argument indicates that we will work with the video by looking through batches of 128 frames.

In [None]:
finput = FrameInput(join("..", "data", "video-clip.mp4"), bsize=128)

Now, create a frame processor and add four annotators: (i) metadata, (ii) png files, (iii) differences between successive frames, and (iv) faces. The quantiles input to the DiffAnnotator indicates that we want to compute the 40th percentile in differences between frames. The face detector take a long time to run when not on a GPU, so we restrict it to running only every 64 frames.

In [None]:
fpobj = FrameProcessor()
fpobj.load_annotator(PngAnnotator(output_dir=join("..", "video-clip-frames")))
fpobj.load_annotator(MetaAnnotator())
fpobj.load_annotator(DiffAnnotator(quantiles=[40]))
fpobj.load_annotator(FaceAnnotator(detector=FaceDetectDlib(), freq=64))

Now, we can run the pipeline of annotators over the input object. We will turn on logging here to see the output as Python processes each annotator over a batch of frames. The max_batch argument restricts the number of batches for testing purposes; set to None (default) to process the entire video file.

In [None]:
fpobj.process(finput, max_batch=2)

The output is now stored in the fpobj object. To access it, we call its collect_all method. This method returns a dictionary of custom objects (DictFrame, an extension of an ordered dictionary). Each can be converted to a Pandas data frame for ease of viewing the output or saving as a csv file.

In [None]:
obj = fpobj.collect_all()

We will not look at each output type.

### Metadata

The metadata is not very exciting, but is useful for downstream tasks:

In [None]:
obj['meta'].todf()

### Png

The png annotator does not return any data:

In [None]:
obj['png'].todf()

Instead, its used for its side-effects. You will see that there are individual frames from the video now saved in the directory "video-clip-frames".

### Difference

The difference annotator indicates the differences between successive frames, as well as information about the average value (brightness) of each frame.

In [None]:
obj['diff'].todf().head()

What if we want to find video cuts using these values? In order to aggregate the values into cuts, use the CutAggregator module. Here we have configured it to break a cut whenever the q40 key is at least 3.

In [None]:
cagg = CutAggregator(cut_vals={'q40': 3})
cagg.aggregate(obj).todf()

If you look at the constructed frames in "video-clip-frames", you should see that there are in fact breaks at frames 75 and 155.

### Face

The face annotator detects faces in the frames. We configured it to only run every 64 frames, so there is only output in frames 0, 64, 128, and 192.

In [None]:
obj['face'].todf()

Notice that there are two faces in frame 0, 64, and 192 but four faces detected in frame 128. In fact, all six of the main cast members are in frame 128, but two are two small and obscured to be found by the dlib algorithm.