<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Supercuts" data-toc-modified-id="Supercuts-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Supercuts</a></span><ul class="toc-item"><li><span><a href="#Get-all-intervals-of-person-P" data-toc-modified-id="Get-all-intervals-of-person-P-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Get all intervals of person P</a></span></li><li><span><a href="#For-each-word-W-in-sentence,-create-list-of-intervals-for-W" data-toc-modified-id="For-each-word-W-in-sentence,-create-list-of-intervals-for-W-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>For each word W in sentence, create list of intervals for W</a></span></li><li><span><a href="#For-each-word-W,-intersect-its-interval-list-with-person-P-intervals-to-get-P-+-W-intervals" data-toc-modified-id="For-each-word-W,-intersect-its-interval-list-with-person-P-intervals-to-get-P-+-W-intervals-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>For each word W, intersect its interval list with person P intervals to get P + W intervals</a></span></li><li><span><a href="#Get-all-intervals-where-there-is-exactly-one-face-on-screen" data-toc-modified-id="Get-all-intervals-where-there-is-exactly-one-face-on-screen-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Get all intervals where there is exactly one face on screen</a></span></li><li><span><a href="#For-each-word-W-in-sentence,-intersect-P-with-word-intervals-with-one-face-intervals" data-toc-modified-id="For-each-word-W-in-sentence,-intersect-P-with-word-intervals-with-one-face-intervals-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>For each word W in sentence, intersect P with word intervals with one face intervals</a></span></li><li><span><a href="#Random-sample-one-element-from-each-P-+-W-alone-interval-list" data-toc-modified-id="Random-sample-one-element-from-each-P-+-W-alone-interval-list-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Random sample one element from each P + W alone interval list</a></span></li></ul></li></ul></div>

# Supercuts

Our goal is to get intervals for a short supercut video of a certain person (e.g, Anderson Cooper) saying a funny sentence, like:

```
P = a person in the dataset
sentence = "Intel is great because they fund Stanford."
```

We'll use `rekall` to get the candidate intervals and the caption index to get caption intervals. Make sure the caption index and `rekall` are installed in your Esper instance before running this notebook. If they aren't, the imports will fail.

Strategy:
1. Get all intervals where person P is on screen
2. For each word W in sentence, create list of intervals for W 
3. For each word W, intersect its interval list with person P intervals to get P + W intervals
4. Get all intervals where there is exactly one face on screen
5. For each word W, intersect P + W intervals with one face intervals to get P + W alone intervals
6. Random sample one element from each P + W alone interval list

In [None]:
# import rekall
from esper.rekall import *
from rekall.video_interval_collection import VideoIntervalCollection
from rekall.interval_list import Interval, IntervalList
from rekall.temporal_predicates import *
from rekall.spatial_predicates import *
from esper.utility import *
# import caption search
from esper.captions import *

# import face identities for person search
from query.models import Video, Face, FaceIdentity

# import esper widget for debugging
from esper.prelude import esper_widget

import random
import os
import pickle
import tempfile
from multiprocessing import Pool

In [None]:
# Set these parameters for the notebook.
person_name = "Anderson Cooper"
sentence = "Make america great again"
# video_list = pickle.load(open('/app/data/tvnews_std_sample.pkl', 'rb'))['sample_100']

In [None]:
videos = [Video.objects.filter(path__contains=video_name)[0] for video_name in video_list]
video_ids = [video.id for video in videos if video.threeyears_dataset]
print(len(video_ids))

## Get all intervals of person P

In [None]:
person_intrvllists = qs_to_intrvllists(
    FaceIdentity.objects
#         .filter(face__shot__video_id__in=video_ids)
        .filter(identity__name=person_name.lower())
        .filter(probability__gt=0.99)
        .annotate(video_id=F("face__shot__video_id"))
        .annotate(shot_id=F("face__shot_id"))
        .annotate(min_frame=F("face__shot__min_frame"))
        .annotate(max_frame=F("face__shot__max_frame")),
    schema={
        'start': 'min_frame',
        'end': 'max_frame',
        'payload': 'shot_id'
    })
person_intrvlcol = VideoIntervalCollection(person_intrvllists)
print("Got all occurrences of {}".format(person_name))

## For each word W in sentence, create list of intervals for W 

In [None]:
# helper function for 2. to convert caption search to dict mapping from video ID to IntervalList
def caption_to_intrvllists(search_term, dilation=0, video_ids=None):
    results = topic_search([search_term], dilation)
    if video_ids == None:
        videos = {v.id: v for v in Video.objects.all()}
    else:
        videos = {v.id: v for v in Video.objects.filter(id__in=video_ids).all()}
    
    def convert_time(k, t):
        return int(t * videos[k].fps)
    
    segments_by_video = {}
    flattened = [
        (v.id, convert_time(v.id, l.start), convert_time(v.id, l.end)) 
        for v in results.documents if v.id in videos
        for l in v.locations
    ]
    
    for video_id, t1, t2 in flattened:
        if video_id in segments_by_video:
            segments_by_video[video_id].append((t1, t2, 0))
        else:
            segments_by_video[video_id] = [(t1, t2, 0)]
    
    for video in segments_by_video:
        segments_by_video[video] = IntervalList(segments_by_video[video])
        
    print("Got all occurrences of the word {} by searching".format(search_term))
    
    return segments_by_video

# scans for search terms across videos in parallel
def scan_for_search_terms_intrvllist(search_terms, video_ids, dilation=0):
    results = scan_for_ngrams_in_parallel(search_terms, video_ids)
    
    search_terms_intrvllists = [{} for term in search_terms]
    videos = {v.id: v for v in Video.objects.filter(id__in=video_ids).all()}
    def convert_time(k, t):
        return int(t * videos[k].fps)
    
    for video_id, result in results:
        if result == []:
            continue
        for i, term in enumerate(search_terms):
            term_result = result[i]
            interval_list = IntervalList([
                (convert_time(video_id, start - dilation),
                convert_time(video_id, end + dilation),
                0)
                for start, end in term_result
            ])
            if interval_list.size() > 0:
                search_terms_intrvllists[i][video_id] = interval_list
        
    print("Got all occurrences of the words {} by scanning".format(search_terms))
    
    return search_terms_intrvllists

import pysrt
def scan_aligned_transcript_intrvllist(search_terms, video_ids):
    word_intrvllists = {term: {} for term in search_terms}
    for video_id in video_ids:
        video = Video.objects.filter(id=video_id)[0]
        video_name = os.path.basename(video.path)[:-4]
        print(video_name)
        word_lists = {term: [] for term in search_terms}
        transcript_path = os.path.join('/app/result/aligned_transcript_100/', video_name+'.word.srt')
        if not os.path.exists(transcript_path):
            continue
        subs = pysrt.open(transcript_path)
        for sub in subs:
            for term in search_terms:
                if term in sub.text:
                    word_lists[term].append((time2second(tuple(sub.start)[:4])*video.fps, time2second(tuple(sub.end)[:4])*video.fps, 0))
#         print(word_lists)
        for term, value in word_lists.items():
            if len(value) > 0:
                word_intrvllists[term][video_id] = IntervalList(value)
    
    return  [ VideoIntervalCollection(intrvllist) for intrvllist in word_intrvllists.values()]

In [None]:
# search words from caption index

# Get extremely frequent words
EXTREMELY_FREQUENT_WORDS = {
    w.token for w in caption_util.frequent_words(LEXICON, 99.997)
}

# Split words into words to search by index and words to scan through documents for
words = [word.upper() for word in sentence.split()]
words_to_scan = set()
words_to_search_by_index = set()
for word in words:
    if word in EXTREMELY_FREQUENT_WORDS:
        words_to_scan.add(word)
    else:
        words_to_search_by_index.add(word)
words_to_scan = list(words_to_scan)
words_to_search_by_index = list(words_to_search_by_index)

video_ids = list(person_intrvllists.keys())

scanned_words = caption_scan_to_intrvllists(
    scan_for_ngrams_in_parallel(words_to_scan, video_ids),
    words_to_scan,
    video_ids)
searched_words = [
    topic_search_to_intrvllists(topic_search([word], 0), video_ids)
    for word in words_to_search_by_index 
]

sentence_intrvllists = [
    scanned_words[words_to_scan.index(word)]
    if word in words_to_scan else
    searched_words[words_to_search_by_index.index(word)]
    for word in words
]
sentence_intrvlcol = [VideoIntervalCollection(intrvllist) for intrvllist in sentence_intrvllists]

In [None]:
# search words from aligned transcript

words = [word.upper() for word in sentence.split()]

sentence_intrvlcol = scan_aligned_transcript_intrvllist(words, video_ids)

## For each word W, intersect its interval list with person P intervals to get P + W intervals

In [None]:
# person_with_sentence_intrvlcol = []
# for i, word_intrvlcol in enumerate(sentence_intrvlcol):
#     person_with_word_intrvlcol = person_intrvlcol.overlaps(word_intrvlcol)
#     print(len(person_with_word_intrvlcol.get_allintervals()))
#     if len(person_with_word_intrvlcol.get_allintervals()) == 0:
#         print("Could not find instance of person {} with word {}".format(person_name, words[i]))
#     else:
#         person_with_sentence_intrvlcol.append(person_with_word_intrvlcol)

person_with_sentence_intrvlcol = sentence_intrvlcol

## Get all intervals where there is exactly one face on screen

In [None]:
from rekall.parsers import in_array, bbox_payload_parser
from rekall.merge_ops import payload_plus
from rekall.payload_predicates import payload_satisfies
from rekall.list_predicates import length_exactly

relevant_shots = set()
for person_with_word_intrvlcol in person_with_sentence_intrvlcol:
     for intrvllist in list(person_with_word_intrvlcol.get_allintervals().values()):
        for interval in intrvllist.get_intervals():
            relevant_shots.add(interval.get_payload())
print(len(relevant_shots))
            
faces = Face.objects.filter(shot__in=list(relevant_shots)) \
        .annotate(video_id=F('shot__video_id')) \
        .annotate(min_frame=F('shot__min_frame')) \
        .annotate(max_frame=F('shot__max_frame'))

# Materialize all the faces and load them into rekall with bounding box payloads
# Then coalesce them so that all faces in the same frame are in the same interval
# NOTE that this is slow right now since we're loading all faces!
oneface_intrvlcol = VideoIntervalCollection.from_django_qs(
    faces,
    with_payload=in_array(
        bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus).filter(payload_satisfies(length_exactly(1)))


In [None]:
len(oneface_intrvlcol.get_allintervals())

## For each word W in sentence, intersect P with word intervals with one face intervals

In [None]:
person_with_sentence_alone_intrvlcol = []
for i, person_with_word_intrvlcol in enumerate(person_with_sentence_intrvlcol):
    person_alone_intrvlcol = person_with_word_intrvlcol.overlaps(oneface_intrvlcol)
    print(len(person_alone_intrvlcol.get_allintervals()))
    if len(person_alone_intrvlcol.get_allintervals()) == 0:
        print("Could not find instance of person {} along with word {}".format(person_name, words[i]))
    else:
        person_with_sentence_alone_intrvlcol.append(person_alone_intrvlcol)

In [None]:
supercut_intervals_all = []
for i, person_with_word_alone_intrvlcol in enumerate(sentence_intrvlcol):
    supercut_intervals = []
    for video, intrvllist in person_with_word_alone_intrvlcol.intervals.items():
        for interval in intrvllist.get_intervals():
            supercut_intervals.append((video, interval.get_start(), interval.get_end()))
    supercut_intervals_all.append(supercut_intervals)



## Random sample one element from each P + W alone interval list

In [None]:
supercut_intervals = [random.choice(intervals) for intervals in supercut_intervals_all]
print("Supercut intervals: ", supercut_intervals)

In [None]:
# Display the supercut intervals in Esper widget for debugging
supercut_intrvllists = {}
for video, start, end in supercut_intervals:
    supercut_intrvllists[video] = IntervalList([(start, end, 0)])
esper_widget(intrvllists_to_result(supercut_intrvllists,
                                   video_order = [video for video, start, end in supercut_intervals]))

In [None]:
# make supercut video 
supercut_path = '/app/result/supercut.mp4'
local_cut_list = []
local_cut_list_path = tempfile.NamedTemporaryFile(suffix='.txt').name.replace('tmp/', 'app/result/')
flist = open(local_cut_list_path, 'w')
for video_id, sfid, efid in supercut_intervals:
    video = Video.objects.filter(id=video_id)[0]
    filename = tempfile.NamedTemporaryFile(suffix='.mp4').name.replace('tmp/', 'app/result/')
   
    cmd = 'ffmpeg -y -i ' + '\"' + video.url() + '\"' + ' -async 1 '
    cmd += '-ss {:s} -t {:s} '.format(second2time(sfid/video.fps, '.'), second2time((efid-sfid)/video.fps, '.'))
    cmd += filename
    print(cmd)
    os.system(cmd)
    
    local_cut_list.append(filename)
    flist.write('file ' + filename + '\n')
flist.close()

os.system('ffmpeg -y -f concat -safe 0 -i ' + local_cut_list_path + ' -c copy ' + supercut_path)