# Your First (Complex) Rekall Query: Bernie Sanders Interviews

Now that you've loaded in some data and written a first simple query, let's kick it up a notch and write a query to detect all the interviews with Bernie Sanders in a small dataset. If you've read the [Rekall tech report](https://arxiv.org/abs/1910.02993), you'll already know what the end query looks like, but in this notebook we'll walk you through the whole process to arrive at that query.

We'll also take this query as an opportunity to teach you common concepts and operations that will be useful to you as you write new Rekall queries!

It may be useful to have a reference to the documentation as you go through this tutorial. Check it out [here](https://rekallpy.readthedocs.io/en/latest/)!

## Load up the data

First, let's load up data, like we did in the data loading tutorial. Run the next cell to load up face detections and visualize them.

In [1]:
from vgrid import VGridSpec, VideoMetadata, VideoBlockFormat
from vgrid_jupyter import VGridWidget
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from rekall.stdlib import ingest
from rekall.predicates import *
import urllib3, requests, os
urllib3.disable_warnings()

VIDEO_COLLECTION_BASEURL = "http://olimar.stanford.edu/hdd/rekall_tutorials/workshop"
VIDEO_METADATA_FILENAME = "data/video_meta.json"
VIDEO_ENDPOINT = "http://olimar.stanford.edu/hdd/rekall_tutorials/workshop/videos"

req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = req.json()

video_metadata = [
    VideoMetadata(v["path"], v["id"], v["fps"], int(v["num_frames"]), v["width"], v["height"])
    for v in video_collection
]

# Load the JSON file from the server
FACES_JSON = "data/faces.json"
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, FACES_JSON), verify=False)
faces_json = req.json()

# Load the face bounding boxes into Rekall
faces_ism = ingest.ism_from_iterable_with_schema_bounds3D(
    faces_json,
    ingest.getter_accessor,
    {
        'key': 'video_id',
        't1': 'frame_number', # NOTE that the JSON format has frame timestamps!
        't2': 'frame_number',
        'x1': 'x1',
        'x2': 'x2',
        'y1': 'y1',
        'y2': 'y2'
    },
    with_payload = lambda json_obj: json_obj
)

# Convert from frames to seconds
video_meta_by_id = {
    vm.id: vm
    for vm in video_metadata
}

faces_ism = faces_ism.map(
    lambda face: Interval(
        Bounds3D(
            # We convert from frames to seconds, and account for temporal downsampling
            face['t1'] / video_meta_by_id[face['payload']['video_id']].fps - 1.5,
            face['t2'] / video_meta_by_id[face['payload']['video_id']].fps + 1.5,
            face['x1'], face['x2'], face['y1'], face['y2']
        ),
        face['payload']
    )
)

# Display in VGrid
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('faces', faces_ism)
    ]),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

# Exploratory Queries

The first step in writing a good query is often to play around with the data a little bit - write down some simple queries, and click around to see what the data look like.

If we're looking for interviews with Bernie Sanders, a decent place to start would be to look at every time that Bernie Sanders appears:

In [2]:
# Filter for all the faces where the identity is 'bernie sanders'
bernie = faces_ism.filter(
    lambda interval: interval['payload']['identity'] == 'bernie sanders'
)

# Display in VGrid
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('all_faces', faces_ism),
        ('bernie', bernie)
    ]),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

Let's take a quick look at that `filter` function to make sure that we know what's going on:

```Python
bernie = faces_ism.filter(
    lambda interval: interval['payload']['identity'] == 'bernie sanders'
)
```

`faces_ism` contains one `Interval` for every face bounding box. Each Interval has `bounds` (x1, x2, y1, y2, t1, t2), and a `payload`. The filter function takes every `Interval` in `faces_ism` and runs it through a function that returns `True` if the identity in the payload is equal to `"bernie sanders"`.

While we're at it, we may as well write a function to get rid of some of that boiler-plate visualization code:

In [3]:
def generate_spec(isms):
    return VGridSpec(
        video_meta = video_metadata,
        vis_format = VideoBlockFormat(imaps = [
            (str(i), ism)
            for i, ism in enumerate(isms)
        ]),
        video_endpoint = VIDEO_ENDPOINT
    ).to_json_compressed()

In [4]:
VGridWidget(vgrid_spec = generate_spec([faces_ism, bernie]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

# Bernie with Host

There appear to be four interviews in our dataset. Let's see if we can write a query to just return the interviews.

If you click through some of the interviews, you'll notice that during interviews, the guest is often shown split-screen with the host:

![](https://olimar.stanford.edu/hdd/rekall_tutorials/workshop/bernie_jake.png)

We can use Rekall's `join` function to find all the times when two events happen at the same time:

![](https://olimar.stanford.edu/hdd/rekall_tutorials/simple_join.png)

Go ahead and run the cell below to a) find all the faces that have been classified as Jake Tapper, and b) find all the times when Bernie Sanders and Jake Tapper are on screen together.

In [5]:
jake = faces_ism.filter(
    lambda interval: interval['payload']['identity'] == 'jake tapper'
)

bernie_with_jake = bernie.join(
    jake,
    predicate = Bounds3D.T(overlaps()),
    merge_op = lambda bernie_interval, jake_interval: Interval(
        Bounds3D.span(bernie_interval['bounds'], jake_interval['bounds'])
    )
)

In [6]:
VGridWidget(vgrid_spec = generate_spec([faces_ism, bernie, jake, bernie_with_jake]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

Let's take a look at that join function to understand what's going on:

```Python
bernie_with_jake = bernie.join(
    jake,
    predicate = Bounds3D.T(overlaps()),
    merge_op = lambda bernie_interval, jake_interval: Interval(
        Bounds3D.span(bernie_interval['bounds'], jake_interval['bounds'])
    )
)
```

Conceptually, the `join` function is computing a cross product between every `Interval` in `bernie` and `jake`, filtering the pairs by a `predicate`, and merging the pairs back into a single `Interval` with a `merge_op`.

The predicate that we've written is filtering pairs of `Interval`s to those that overlap in time (`Bounds3D.T(overlaps())`), and we are merging the intervals together to cover the `span` of both intervals in the pair. That's why the teal boxes now cover both faces.

# Don't Forget Rachel Maddow!

You'll notice that the teal segments are no longer covering the interview with Rachel Maddow - since we wrote the query ourselves in a completely interpretable way, we know exactly why this is happening. Let's include Rachel Maddow in our join as well.

In [7]:
rachel = faces_ism.filter(
    lambda interval: interval['payload']['identity'] == 'rachel maddow'
)

hosts = rachel.union(jake)

bernie_with_host = bernie.join(
    hosts,
    predicate = Bounds3D.T(overlaps()),
    merge_op = lambda bernie_interval, host_interval: Interval(
        Bounds3D.span(bernie_interval['bounds'], host_interval['bounds'])
    ),
    window = 0
)

In [8]:
VGridWidget(vgrid_spec = generate_spec([faces_ism, bernie, hosts, bernie_with_host]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

### Join Optimizations: The `window` parameter

Notice this time, we also passed in an extra parameter to the `join` function: `window = 0`. We're taking advantage of the fact that no pair of Intervals is going to pass the predicate if they don't overlap in time.

**The `window` parameter is an optimization over the join cross product ignores any pair of Intervals that differ in time by more than `window`.** This allows us to compute joins much more efficiently (which is particularly useful for large video collections).

# Coalesce: Merge Nearby or Overlapping Intervals

Take a look at some of the teal intervals. You'll notice that there appear to be overlapping bounding boxes - and in fact, there are! That's because, at any given time, each face bounding box just barely overlaps in time with the end of the face bounding box before it, and the face bounding box after it. We can use the `coalesce` function to smooth these boxes together into continuous segments:

![](https://olimar.stanford.edu/hdd/rekall_tutorials/simple_coalesce.png)

In [9]:
bernie_with_host_segments = bernie_with_host.coalesce(
    ('t1', 't2'),
    bounds_merge_op = Bounds3D.span
)

In [10]:
VGridWidget(vgrid_spec = generate_spec([
    faces_ism, bernie, hosts, bernie_with_host_segments]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

# Minus: Subtracting Two Sets

Let's get rid of some of those pesky gaps in the teal sets. Interviews tend to cut between the host and the guest together, and the guest alone, so let's use a `minus` operation to find all the times when Bernie Sanders appears without one of the hosts:

![](https://olimar.stanford.edu/hdd/rekall_tutorials/simple_minus.png)

In [11]:
bernie_alone = bernie.minus(
    bernie_with_host
)

In [12]:
VGridWidget(vgrid_spec = generate_spec([
    faces_ism, bernie, hosts, bernie_with_host_segments, bernie_alone]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

# The final query

Now let's put this all together and write our final query. Notice that the interviews tend to cut between segments Bernie with a host, and Bernie alone. Let's capture this temporal pattern by looking for times when Bernie is with a host for some amount of time, and then he's alone for some amount of time (or vice versa)!

We'll first use `coalesce` to create segments when Bernie is alone:

In [13]:
bernie_alone_segments = bernie_alone.coalesce(
    ('t1', 't2'),
    bounds_merge_op = Bounds3D.span
)

And then we'll use a `join` operation to get all the times when Bernie is alone, before or after Bernie is with a host (notice that we look with a five-second window this time):

In [14]:
interview_candidates = bernie_with_host_segments.join(
    bernie_alone_segments,
    predicate = or_pred(
        before(max_dist=5),
        after(max_dist=5)
    ),
    merge_op = lambda bernie_interval, host_interval: Interval(
        Bounds3D.span(bernie_interval['bounds'], host_interval['bounds'])
    ),
    window = 5
)

In [15]:
VGridWidget(vgrid_spec = generate_spec([
    faces_ism, bernie, hosts, bernie_with_host_segments, bernie_alone_segments, 
    interview_candidates]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

There are some small gaps in the yellow segments (and they're overlapping), so let's smooth over some of those gaps: with `coalesce`.

In [16]:
interview_segments = interview_candidates.coalesce(
    ('t1', 't2'),
    bounds_merge_op = Bounds3D.span,
    epsilon = 15
)

In [17]:
VGridWidget(vgrid_spec = generate_spec([
    faces_ism, bernie, hosts, bernie_with_host_segments, bernie_alone_segments, 
    interview_segments]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

The `epsilon` parameter will merge segments across small gaps in time (in this case, 15 seconds).

Finally, we can use a `filter_size` function (just syntactic sugar over `filter`) to get rid of the remaining short segments.

In [18]:
interviews = interview_segments.filter_size(min_size=45)

In [19]:
VGridWidget(vgrid_spec = generate_spec([
    faces_ism, bernie, hosts, bernie_with_host_segments, bernie_alone_segments, 
    interviews]))

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

# Congratulations!

You've now written a pretty complex Rekall query to detect interviews with Bernie Sanders - and if you switched out his name with someone else, you could detect their interviews too! Next we'll move on to the empty parking space tutorial.