# Basics Tutorial

The purpose of this tutorial is to introduce the basics of the Rekall API and
show how Rekall queries, together with the Vgrid visualization interface, can
be used for video analysis.

This is part two of the tutorial.
If you haven't already, follow the instructions in `Basics.md` to download a
small dataset with some simple annotations.

This tutorial consists of 3 steps:
1. Import visual metadata into Rekall
2. Visualize metadata using Vgrid
3. Use Rekall Interval operations to query for particular events in the video dataset

### Important Concepts will be in bold throughout this notebook.

### Setup
We first need to make sure that `rekall`, `vgrid`, and `vgrid_jupyter` are installed properly. If the following cell runs without error, you're all set. If not, make sure that you've installed `rekallpy` and `vgridpy` with `pip` or that you've activated our Anaconda environment.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from vgrid import vblocks_builder
from vgrid_jupyter import VGridWidget

# Step 1: Load Data

The first step in analyzing videos with Rekall is loading visual metadata into Rekall. If you followed the instructions in `Basic.md`, you should have these videos in this directory:
* `driving1.mp4`
* `driving2.mp4`
* `driving3.mp4`
* `driving4.mp4`

And these JSON files:
* `driving1.json`
* `driving2.json`
* `driving3.json`
* `driving4.json`

We'll start by loading these JSON files into Python.

In [3]:
video_files = [
    'driving1.mp4',
    'driving2.mp4',
    'driving3.mp4',
    'driving4.mp4'
]
metadata_files = [
    'driving1.json',
    'driving2.json',
    'driving3.json',
    'driving4.json'
]

In [4]:
import json
driving_metadata = [
    json.load(open(metadata_file, 'r'))
    for metadata_file in metadata_files
]

If we inspect each JSON object we just loaded, we'll see that it has the following structure:
```
[
    {
        'video': string,
        'frame': int,
        'bboxes': [
            {
                'class': string,
                'score': float,
                'x1': float,
                'x2': float,
                'y1': float,
                'y2': float
            },
            ...
        ]
    },
    ...
]
```

### Concept 1: Bounds represent spatiotemporal volumes
Each bounding box represents a 3D spatiotemporal volume in the video. We can represent such a volume by using a Rekall `Bounds3D` object.

![videovolume](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/videovolume.png)

In [5]:
sample_bbox = driving_metadata[0][10]['bboxes'][0]
sample_frame = driving_metadata[0][10]['frame']

In [6]:
sample_bbox

{'class': 'person',
 'score': 0.17135410010814667,
 'x1': 0.4788206100463867,
 'x2': 0.4907509803771973,
 'y1': 0.4826868693033854,
 'y2': 0.5067381964789497}

In [7]:
sample_frame

10

In [8]:
sample_bbox_as_bounds = Bounds3D(
    t1 = sample_frame,
    t2 = sample_frame,
    x1 = sample_bbox['x1'],
    x2 = sample_bbox['x2'],
    y1 = sample_bbox['y1'],
    y2 = sample_bbox['y2']
)

In [9]:
sample_bbox_as_bounds

t1:10 t2:10 x1:0.4788206100463867 x2:0.4907509803771973 y1:0.4826868693033854 y2:0.5067381964789497

We can access the co-ordinates of the bounds as follows:

In [10]:
print(sample_bbox_as_bounds['t1'])
print(sample_bbox_as_bounds['t2'])
print(sample_bbox_as_bounds['x1'])
print(sample_bbox_as_bounds['x2'])
print(sample_bbox_as_bounds['y1'])
print(sample_bbox_as_bounds['y2'])

10
10
0.4788206100463867
0.4907509803771973
0.4826868693033854
0.5067381964789497


### Concept 2: `Interval`s wrap spatiotemporal volumes with a payload
But often we will need to represent other metadata on spatiotemporal volumes. In this case, we have the class name as well as the score. For this, Rekall has the `Interval` abstraction:

In [11]:
sample_bbox_interval = Interval(
    sample_bbox_as_bounds,
    payload = { 'class': sample_bbox['class'], 'score': sample_bbox['score'] })

In [12]:
sample_bbox_interval

<Interval t1:10 t2:10 x1:0.4788206100463867 x2:0.4907509803771973 y1:0.4826868693033854 y2:0.5067381964789497 payload:{'score': 0.17135410010814667, 'class': 'person'}>

`Interval`s expose the same interface to the co-ordinates as Bounds.

In [13]:
print(sample_bbox_interval['t1'])
print(sample_bbox_interval['t2'])
print(sample_bbox_interval['x1'])
print(sample_bbox_interval['x2'])
print(sample_bbox_interval['y1'])
print(sample_bbox_interval['y2'])

10
10
0.4788206100463867
0.4907509803771973
0.4826868693033854
0.5067381964789497


But they also expose interfaces to the entire Bounds object and the payload.

In [14]:
print(sample_bbox_interval['bounds'])
print(sample_bbox_interval['payload'])

t1:10 t2:10 x1:0.4788206100463867 x2:0.4907509803771973 y1:0.4826868693033854 y2:0.5067381964789497
{'score': 0.17135410010814667, 'class': 'person'}


### Concept 3: `IntervalSet`s organize many `Interval`s
There's not much you can do with single `Interval`s, so we use `IntervalSet`s to represent collections of many `Interval`s.

![set_operations](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/set_operations.png)

`IntervalSet`s provide a number of useful set operations (see figure above), but for now, you can just think of them as wrappers around lists of `Interval`s.

`IntervalSet`s are constructed from lists of `Interval`s:

In [15]:
driving1_interval_set = IntervalSet([
    Interval(Bounds3D(
        t1=f['frame'],
        t2=f['frame'],
        x1=bbox['x1'],
        x2=bbox['x2'],
        y1=bbox['y1'],
        y2=bbox['y2']
    ), payload = {
        'class': bbox['class'],
        'score': bbox['score']
    })
    for f in driving_metadata[0]
    for bbox in f['bboxes']
])

In [16]:
print(driving1_interval_set.size())

96585


We can recover the original (sorted by bounds) list of Intervals with `get_intervals`:

In [17]:
print(len(driving1_interval_set.get_intervals()))
print(driving1_interval_set.get_intervals()[0])

96585
<Interval t1:0 t2:0 x1:0.0 x2:0.060274869203567505 y1:0.4683924780951606 y2:0.5129659440782335 payload:{'score': 0.997351884841919, 'class': 'car'}>


### Concept 4: `IntervalSetMapping`s organize `IntervalSet`s from different domains
Finally, we often deal with multiple videos at once, and often don't want to mix operations from `IntervalSet`s from different domains. `IntervalSetMapping` is a wrapper around a mapping from keys to `IntervalSet`s. Often the keys will correspond to video IDs or path names.

We construct an `IntervalSetMapping` with a dictionary:

In [18]:
bbox_interval_set_mapping = IntervalSetMapping({
    video_file: IntervalSet([
        Interval(Bounds3D(
            t1=f['frame'],
            t2=f['frame'],
            x1=bbox['x1'],
            x2=bbox['x2'],
            y1=bbox['y1'],
            y2=bbox['y2']
        ), payload = {
            'class': bbox['class'],
            'score': bbox['score']
        })
        for f in metadata
        for bbox in f['bboxes']
    ])
    for video_file, metadata in zip(video_files, driving_metadata)
})

`IntervalSetMapping` reflects operations on `IntervalSet`s:

In [19]:
bbox_interval_set_mapping.size()

{'driving1.mp4': 96585,
 'driving2.mp4': 90570,
 'driving3.mp4': 211999,
 'driving4.mp4': 17983}

And recover the original dictionary using `get_grouped_intervals`.

In [20]:
bbox_interval_set_mapping.get_grouped_intervals().keys()

dict_keys(['driving3.mp4', 'driving1.mp4', 'driving2.mp4', 'driving4.mp4'])

# Step 2: Visualize Metadata Using Vgrid

Next, we'll use `Vgrid` to visualize bounding boxes on our videos.

First, we need to load up some metadata about the videos that we'd like to display. This code will use `ffprobe` to get the width, height, and fps from the local videos on disk.

In [21]:
video_metadata = [
    vblocks_builder.VideoMetadata(path=video_file, id=video_file)
    for video_file in video_files
]

In [22]:
for vm in video_metadata:
    print(vm.fps, vm.width, vm.height, vm.id)

29.97002997002997 1280 720 driving1.mp4
29.97002997002997 1280 720 driving2.mp4
59.94005994005994 1280 720 driving3.mp4
29.97002997002997 1280 720 driving4.mp4


Next, we'll need to convert the time dimension of our `Interval`s from frames to seconds.

First, we add the video file to the payload of each `Interval`:

In [23]:
bbox_interval_set_mapping_with_paths = bbox_interval_set_mapping.add_key_to_payload()

Note that the payload is now a tuple of `({'class': string, 'score': float}, path)`

In [24]:
print(bbox_interval_set_mapping_with_paths.get_grouped_intervals()['driving1.mp4'].get_intervals()[0])

<Interval t1:0 t2:0 x1:0.0 x2:0.060274869203567505 y1:0.4683924780951606 y2:0.5129659440782335 payload:({'score': 0.997351884841919, 'class': 'car'}, 'driving1.mp4')>


Next, we use `IntervalSet`'s `map` method (reflected in `IntervalSetMapping`) to map from frame numbers to seconds. Note that we add one frame to the end frame so that the intervals have non-zero time extent.

In [25]:
path_to_fps_mapping = {
    vm.id: vm.fps
    for vm in video_metadata
}
bbox_ism_seconds = bbox_interval_set_mapping_with_paths.map(
    lambda interval: Interval(
        Bounds3D(
            t1 = interval['t1'] / path_to_fps_mapping[interval['payload'][1]],
            t2 = (interval['t2'] + 1) / path_to_fps_mapping[interval['payload'][1]],
            x1 = interval['x1'],
            x2 = interval['x2'],
            y1 = interval['y1'],
            y2 = interval['y2']
        ),
        payload = interval['payload'][0] # just have the class and score
    )
)

### Concept 5: Vgrid visualizes Vblocks.

Vgrid visualizes "Vblocks" in a grid. We'll see what that looks like in a minute, and you can go to the [Vgrid documentation](https://scanner-research.github.io/vgrid/) for more details. For now, we just need to create some Vblocks for Vgrid to visualize.

First, we initialize our vblocks builder:

In [26]:
builder = vblocks_builder.VideoVBlocksBuilder()

Next, we add the metadata that we just created. The first field is a path to the fileserver that you started in part one of this tutorial (`Basics.md`). The second field is the list of video metadata.

In [27]:
builder.add_video_metadata(
    'http://dawn10.stanford.edu:8080', video_metadata
)

<vgrid.vblocks_builder.VideoVBlocksBuilder at 0x7f2e43d54cc0>

Next, we add bounding boxes to our builder using `add_track` and `VideoTrackBuilder`. We'll also filter our bounding boxes to only the ones that the detector had greater than 75% confidence in. Finally, we'll set the draw type of this track to `bbox`.

In [28]:
builder.add_track(
    vblocks_builder.VideoTrackBuilder(
        'bounding_boxes',
        bbox_ism_seconds.filter(lambda interval: interval['payload']['score'] > 0.75)
    ).set_draw_type(vblocks_builder.DrawType_Bbox())
)

<vgrid.vblocks_builder.VideoVBlocksBuilder at 0x7f2e43d54cc0>

Next, we'll turn this into a JSON-serializable format for the Javascript widget.

In [29]:
json_for_vgrid = builder.build()

Finally, we'll display this metadata in Vgrid. To play the videos:
* Click "Disable Jupyter keyboard". Make sure to enabel the keyboard again when you're done!
* Hover over the video you want to play and press "f" to expand it. Hover over it and press "f" again to contract it.
* Hover over the video while expanded and press "p" to play it.

If you notice lagging bounding boxes, it's probably just a rendering issue; if you pause the video, the boxes will be in the right place

In [30]:
VGridWidget(vgrid_json=json_for_vgrid)

VGridWidget(vgrid_json={'interval_blocks': [{'video_id': 'driving3.mp4', 'interval_dict': {'bounding_boxes': […

### Concept 6: Vgrid can visualize multiple different tracks at once.
With our current visualization, there's no differentiation between different objects. It would be nice to to color-code different objects by different colors. We can do this by adding more tracks to Vgrid.

Let's start by having multiple `IntervalSetMapping`s, each corresponding to different objects, specified in `object_names`.

In [50]:
object_names = [
    'person',
    'bicycle',
    'car',
    'truck',
    'traffic light'
]
object_isms = [
    bbox_ism_seconds.filter(lambda interval: interval['payload']['class'] == object_name)
    for object_name in object_names
]

Now we have an array of different `IntervalSetMapping`s, each of which only has objects from a single class. We also have `objects_other_ism`, which has objects that aren't cars, trucks, motorcycles, bicycles, or traffic lights.

We'll build a new Vgrid visualization, with one track per object. We start by creating a new builder and adding our videos to it:

In [51]:
builder = vblocks_builder.VideoVBlocksBuilder()
builder.add_video_metadata(
    'http://dawn10.stanford.edu:8080', video_metadata
)

<vgrid.vblocks_builder.VideoVBlocksBuilder at 0x7f2e2b7c3e80>

Next we'll add a new track for every object:

In [52]:
for ism, object_name in zip(object_isms, object_names):
    builder.add_track(
        vblocks_builder.VideoTrackBuilder(
            object_name,
            ism.filter(lambda interval: interval['payload']['score'] > 0.75)
        ).set_draw_type(vblocks_builder.DrawType_Bbox())
    )

Finally, we'll build a JSON-serializable object for Vgrid and visualize it.

In [53]:
json_for_vgrid = builder.build()
VGridWidget(vgrid_json=json_for_vgrid)

VGridWidget(vgrid_json={'interval_blocks': [{'video_id': 'driving3.mp4', 'interval_dict': {'person': [{'payloa…

# Step 3: Use Rekall interval operations to query for particular events in the dataset

Now that we are visualizing our data with Rekall and Vgrid, let's use some of Rekall's more powerful Interval operations to look for new things in the dataset.

For example, we don't have a bicyclist as a COCO image category, but we might want to find examples of this anyways. We can describe this compositionally as a person bounding box above a bicycle bounding box.

First, we create `IntervalSetMapping`s corresponding to person and bicycle:

In [55]:
person_ism = bbox_ism_seconds.filter(lambda interval: interval['payload']['class'] == 'person')
bicycle_ism = bbox_ism_seconds.filter(lambda interval: interval['payload']['class'] == 'bicycle')

### Concept 7: Join operations can be used to express relationships between concepts
The `join` operation computes a cross product between two `IntervalSet(Mapping)`s, filters the pairs of Intervals by some predicate, and then merges the pairs of Intervals back into a single Interval with a merge operation. This figure demonstrates the three stages of a join:

![simple_join](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/simple_code.png)

We first import a collection of built-in predicates from Rekall.

In [56]:
from rekall.predicates import *

We get an `IntervalSetMapping` of people on bicycles by joining `person_ism` with `bicycle_ism`, filtering the pairs by having the same `t1`, with overlapping bounding boxes, and the `person` box's `y1` being above the `bicycle` box's `y1`.

We merge the two bounding boxes together so that the bounding box of the "bicyclist" covers both the person and the bicycle.

In [58]:
person_on_bicycle = person_ism.join(
    bicycle_ism,
    predicate = and_pred(
        Bounds3D.T(equal()), # The pair have to be equal along the time dimension
        Bounds3D.XY(lambda person, bicycle: person['y1'] > bicycle['y1']), # Person above bicycle
        Bounds3D.X(overlaps()), # The boxes overlap in the X dimension
        Bounds3D.Y(overlaps()) # The boxes overlap in the Y dimension
    ),
    merge_op = lambda person, bicycle: Interval(
        person['bounds'].span(bicycle['bounds']), # We use the "span" method of Bounds3D to get a spanning bound
        payload = {
            'class': 'bicyclist',
            'score': person['payload']['score'] * bicycle['payload']['score']
        }
    ),
    window = 0.5 # Only look at pairs that differ by less than half a second from each other
)

This code is worth breaking down further. Let's go line by line:

Lines 1 and 2 establish that we are joining `person_ism` to `bicycle_ism`. This will join every `IntervalSet` in `person_ism` to the right `IntervalSet` in `bicycle_ism` (by the mapping key).
```
1   person_on_bicycle = person_ism.join(
2       bicycle_ism,
```

Lines 3-8 establish the predicate on pairs of joined `Interval`s.
``` 
3   predicate = and_pred(
4       Bounds3D.T(equal()), # The pair have to be equal along the time dimension
5       Bounds3D.XY(lambda person, bicycle: person['y1'] > bicycle['y1']), # Person above bicycle
6       Bounds3D.X(overlaps()), # The boxes overlap in the X dimension
7       Bounds3D.Y(overlaps()) # The boxes overlap in the Y dimension
8   ),
```

`and_pred` is a predicate that computes a logical `and` between many predicates.

`Bounds3D.T(equal())` is a one-dimensional predicate that says that a certain dimension (by default, `(t1, t2)`, but we explicitly state it here for clarity) has to be equal between the two Intervals in the pair.

`Bounds3D.XY(lambda person, bicycle: person['y1'] > bicycle['y1'])` wraps a custom predicate that compares the `y1` values of the two `Interval`s and makes sure that the person is above the bicycle. `Bounds3D.XY` makes it clear that we are operating on the `X/Y` axes. By default, two-dimensional predicates are expected to operate on `(x1, x2)` and `(y1, y2)`

`Bounds3D.X(overlaps())` is a one-dimensional predicate cast to the `X` dimension (default is the time dimension) and says that the two `Interval`s have to overlap in that dimension.

`Bounds3D.Y(overlaps())` is a one-dimensional predicate cast to the `Y` dimension (default is the time dimension) and says that the two `Interval`s have to overlap in that dimension.

See the [Rekall documentation](https://rekallpy.readthedocs.io/en/latest/source/rekall.predicates.html) for more information about predicates.

Line 9-15 establish the `merge_op` to merge the pair of Intervals back to a single Interval.
```
9   merge_op = lambda person, bicycle: Interval(
10      person['bounds'].span(bicycle['bounds']), # We use the "span" method of Bounds3D to get a spanning bound
11      payload = {
12          'class': 'bicyclist',
13          'score': person['payload']['score'] * bicycle['payload']['score']
14      }
15  ),
```

Line 9 establishes the arguments to the merge op - we take in two `Intervals`, coming from the `IntervalSet` on the left and the `IntervalSet` on the right of the `join`, respectively.
```
9   merge_op = lambda person, bicycle: Interval(
```

Line 10 merges the bounds of the two of the two `Interval`s:
```
10      person['bounds'].span(bicycle['bounds']), # We use the "span" method of Bounds3D to get a spanning bound
```
We use the [`span`](https://rekallpy.readthedocs.io/en/latest/source/rekall.bounds.html#rekall.bounds.Bounds3D.span) method of `Bounds3D` to get the minimum `Bounds` spanning both `Interval`s.

Lines 11-14 establish the new payload - a new class, whose score is the product of the constituent scores.
```
11      payload = {
12          'class': 'bicyclist',
13          'score': person['payload']['score'] * bicycle['payload']['score']
14      }
```

Finally, line 16 specifies that we should only look at pairs of `Interval`s that are overlapping or less than `0.5` seconds apart from each other in the time axis.
```
16  window = 0.5 # Only look at pairs that differ by less than half a second from each other
```
For more details, see the [`IntervalSet` documentation](https://rekallpy.readthedocs.io/en/latest/index.html#rekall.IntervalSet).

And now let's visualize this new concept using vgrid.

In [65]:
builder = vblocks_builder.VideoVBlocksBuilder()
builder.add_video_metadata(
    'http://dawn10.stanford.edu:8080', video_metadata
)
builder.add_track(
    vblocks_builder.VideoTrackBuilder(
        'bicyclists',
        person_on_bicycle
    ).set_draw_type(vblocks_builder.DrawType_Bbox())
)
vgrid_json = builder.build()
VGridWidget(vgrid_json=vgrid_json)

VGridWidget(vgrid_json={'interval_blocks': [{'video_id': 'driving3.mp4', 'interval_dict': {'bicyclists': [{'pa…

This is pretty hard to visualize, since it appears so rarely, unlike our other annotations. However, we can visualize it another way by having one vblock for every instance using `vblocks_builder.IntervalVBlocksBuilder`.

### Concept 8: Visualize individual instances using `IntervalVBlocksBuilder`.
This code looks fairly similar, but this time the build function takes in a single `IntervalSetMapping` and builds a new `VBlock` for every `Interval` in it.

In [68]:
builder = vblocks_builder.IntervalVBlocksBuilder()
builder.add_video_metadata(
    'http://dawn10.stanford.edu:8080', video_metadata
)
builder.add_track(vblocks_builder.IntervalTrackBuilder('bicyclist'))
vgrid_json = builder.build(person_on_bicycle)
VGridWidget(vgrid_json=vgrid_json)

VGridWidget(vgrid_json={'interval_blocks': [{'video_id': 'driving3.mp4', 'interval_dict': {'bicyclist': [{'pay…