# Hands-On Time: Writing New Queries

Now we've reached the hands-on portion of the workshop. In this section, you'll spend some time writing new queries.

This notebook loads up our TV news subset, with a number of primitive annotations over them:
* Face detections and identities (same as from the interview example)
* Object detections out of Mask R-CNN (same as the parking spaces example)
* Closed captions
* Commercial segments

Here are some suggested queries that you may want to write - pick one or two and get started!
* Find commercials for drugs
* Find political commercials
* Find commercials about phones or carriers (Verizon, Sprint, etc)
* Find instances of panels (multiple people brought on to talk about one subject)
* Find segments about guns

# Load Data

In [1]:
from vgrid import VGridSpec, VideoMetadata, VideoBlockFormat, SpatialType_Bbox, SpatialType_Caption
from vgrid_jupyter import VGridWidget
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from rekall.stdlib import ingest
from rekall.predicates import *
import urllib3, requests, os
urllib3.disable_warnings()

VIDEO_COLLECTION_BASEURL = "http://olimar.stanford.edu/hdd/rekall_tutorials/workshop"
VIDEO_METADATA_FILENAME = "data/video_meta.json"
VIDEO_ENDPOINT = "http://olimar.stanford.edu/hdd/rekall_tutorials/workshop/videos"

req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = req.json()

video_metadata = [
    VideoMetadata(v["path"], v["id"], v["fps"], int(v["num_frames"]), v["width"], v["height"])
    for v in video_collection
]

# Load the JSON file from the server
FACES_JSON = "data/faces.json"
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, FACES_JSON), verify=False)
faces_json = req.json()

# Load the face bounding boxes into Rekall
faces_ism = ingest.ism_from_iterable_with_schema_bounds3D(
    faces_json,
    ingest.getter_accessor,
    {
        'key': 'video_id',
        't1': 'frame_number', # NOTE that the JSON format has frame timestamps!
        't2': 'frame_number',
        'x1': 'x1',
        'x2': 'x2',
        'y1': 'y1',
        'y2': 'y2'
    },
    with_payload = lambda json_obj: json_obj,
    progress = True
)

# Convert from frames to seconds
video_meta_by_id = {
    vm.id: vm
    for vm in video_metadata
}

faces_ism = faces_ism.map(
    lambda face: Interval(
        Bounds3D(
            # We convert from frames to seconds, and account for temporal downsampling
            face['t1'] / video_meta_by_id[face['payload']['video_id']].fps - 1.5,
            face['t2'] / video_meta_by_id[face['payload']['video_id']].fps + 1.5,
            face['x1'], face['x2'], face['y1'], face['y2']
        ),
        face['payload']
    )
)

# Load objects
OBJECTS_JSON = "data/objects.json"
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, OBJECTS_JSON), verify=False)
objects_json = req.json()

# Load the face bounding boxes into Rekall
objects_ism = ingest.ism_from_iterable_with_schema_bounds3D(
    objects_json,
    ingest.getter_accessor,
    {
        'key': 'video_id',
        't1': 'frame_number', # NOTE that the JSON format has frame timestamps!
        't2': 'frame_number',
        'x1': 'x1',
        'x2': 'x2',
        'y1': 'y1',
        'y2': 'y2'
    },
    with_payload = lambda json_obj: json_obj,
    progress = True
).map(
    lambda obj: Interval(
        Bounds3D(
            # We convert from frames to seconds, and account for temporal downsampling
            obj['t1'] / video_meta_by_id[obj['payload']['video_id']].fps - 1.5,
            obj['t2'] / video_meta_by_id[obj['payload']['video_id']].fps + 1.5,
            obj['x1'], obj['x2'], obj['y1'], obj['y2']
        ),
        obj['payload']
    )
)

# Load captions
CAPTIONS_JSON = "data/captions.json"
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, CAPTIONS_JSON), verify=False)
captions_json = req.json()

captions_ism = ingest.ism_from_iterable_with_schema_bounds3D(
    captions_json,
    ingest.getter_accessor,
    {
        'key': 'video_id',
        't1': 'start',
        't2': 'end'
    },
    with_payload = lambda item: item,
    progress = True
).map(
    lambda caption: Interval(caption['bounds'], caption['payload']['caption'])
).coalesce(
    ('t1', 't2'),
    Bounds3D.span,
    lambda p1, p2: p1 + ' ' + p2,
    predicate = lambda i1, i2: '>>' not in i2['payload'],
    epsilon = 1.0
)

def vgrid_captions(caption_ism):
    from vgrid import SpatialType_Caption
    
    return caption_ism.map(
        lambda caption: Interval(
            caption['bounds'],
            {
                'spatial_type': SpatialType_Caption(caption['payload']),
                'metadata': {}
            }
        )
    )

# Load commercial annotations
COMMERCIAL_JSON = "data/commercials.json"
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, COMMERCIAL_JSON), verify=False)
commercials_json = req.json()

commercials_ism = ingest.ism_from_iterable_with_schema_bounds3D(
    commercials_json,
    ingest.getter_accessor,
    {
        'key': 'video_id',
        't1': 'start',
        't2': 'end'
    },
    with_payload = lambda item: item,
    progress = True
)

def generate_spec(isms):
    return VGridSpec(
        video_meta = video_metadata,
        vis_format = VideoBlockFormat(imaps = [
            (str(i), ism)
            for i, ism in enumerate(isms)
        ]),
        video_endpoint = VIDEO_ENDPOINT
    ).to_json_compressed()

100%|██████████| 12619/12619 [00:00<00:00, 211550.06it/s]
100%|██████████| 46632/46632 [00:00<00:00, 207607.60it/s]
100%|██████████| 69076/69076 [00:00<00:00, 232810.55it/s]
100%|██████████| 49/49 [00:00<00:00, 94405.56it/s]


# Primitives

The above code has generated a number of Rekall objects that you can play around with:
* `faces_ism`: one Interval for each face
* `objects_ism`: one Interval for each object
* `captions_ism`: one Interval for each line in the captions (lines delineated by news files)
* `commercials_ism`: one Interval for each commercial break
* `black_frames_ism`: one Interval for each black frame

Let's inspect the payloads of some of these sets and visualize them.

### Faces and objects

In [2]:
keys = sorted(list(faces_ism.keys()))

In [3]:
faces_ism[keys[0]].get_intervals()[0]['payload']

{'video_id': 17458,
 'x1': 0.532769680023193,
 'y1': 0.080485574901104,
 'x2': 0.723754584789276,
 'y2': 0.450597137212753,
 'score': 1.0,
 'frame_number': 716,
 'gender': 'M',
 'gender_score': 1.0,
 'identity': 'tony popovic',
 'identity_score': 0.74}

In [4]:
objects_ism[keys[0]].get_intervals()[0]['payload']

{'x1': 0.08963130712509156,
 'y1': 0.40898751152886287,
 'x2': 0.4166428565979004,
 'y2': 0.9777291191948785,
 'class': 'person',
 'score': 0.9880803823471069,
 'frame_number': 0,
 'video_id': 17458}

In [5]:
VGridWidget(vgrid_spec = generate_spec([
    faces_ism,
    objects_ism.map(lambda interval: Interval(
        interval['bounds'],
        { 'spatial_type': SpatialType_Bbox(text=interval['payload']['class']) }
    ))
]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

### Captions

Notice the spatial type!

In [6]:
captions_ism[keys[0]].get_intervals()[0]['payload']

'>> HAPPENING NOW IN THE NEWSROOM .'

In [7]:
VGridWidget(vgrid_spec = generate_spec([
    captions_ism.map(lambda interval: Interval(
        interval['bounds'],
        { 'spatial_type': SpatialType_Caption(text=interval['payload']) }
    ))
]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xec\xbd[\x93\xe2\xc8\x92.\xfaWd\xebe^0\x19\xe8\x82…

### Commercials

This one doesn't have anything interesting in the payload, so let's just look at the Interval:

In [8]:
commercials_ism[keys[0]].get_intervals()[0]

<Interval t1:0.0 t2:42.0 x1:0.0 x2:1.0 y1:0.0 y2:1.0 payload:{'video_id': 17458, 'start': 0.0, 'end': 42.0}>

In [9]:
VGridWidget(vgrid_spec = generate_spec([commercials_ism]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcd\x9a\xdfO\xe38\x10\xc7\xff\x95U\x9fO\xd4\x1e\xf…

# Visualize them all at once

And of course, we can visualize all these datasets at the same time!

In [10]:
VGridWidget(vgrid_spec = generate_spec([
    faces_ism,
    objects_ism.map(lambda interval: Interval(
        interval['bounds'],
        { 'spatial_type': SpatialType_Bbox(text=interval['payload']['class']) }
    )),
    captions_ism.map(lambda interval: Interval(
        interval['bounds'],
        { 'spatial_type': SpatialType_Caption(text=interval['payload']) }
    )),
    commercials_ism
]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

# Start Writing!

Go ahead and start writing queries! Once you've decided on what you want the query for, we recommend clicking through the videos and trying to find some examples of them - or writing some simple exploratory queries to narrow down your search space (similar to how we started the interviews query by searching for every time Bernie Sanders appeared on screen).

And don't hesitate to ask us if you need help or advice!

In [None]:
# Your Rekall queries go here!