# Basics Tutorial

The purpose of this tutorial is to introduce the basics of the Rekall API and
show how Rekall queries, together with the Vgrid visualization interface, can
be used for video analysis.

This is part two of the tutorial.
If you haven't already, follow the instructions in `Basics.md` to download a
small dataset with some simple annotations.

This tutorial consists of 3 steps:
1. Import visual metadata into Rekall
2. Visualize metadata using Vgrid
3. Use Rekall Interval operations to query for particular events in the video dataset

### Important Concepts will be in bold throughout this notebook.

### Setup
We first need to make sure that `rekall`, `vgrid`, and `vgrid_jupyter` are installed properly. If the following cell runs without error, you're all set. If not, make sure that you've installed `rekallpy` and `vgridpy` with `pip` or that you've activated our Anaconda environment.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from vgrid import vblocks_builder
from vgrid_jupyter import VGridWidget

# Step 1: Load Data

The first step in analyzing videos with Rekall is loading visual metadata into Rekall. If you followed the instructions in `Basic.md`, you should have these videos in this directory:
* `driving1.mp4`
* `driving2.mp4`
* `driving3.mp4`
* `driving4.mp4`

And these JSON files:
* `driving1.json`
* `driving2.json`
* `driving3.json`
* `driving4.json`

We'll start by loading these JSON files into Python.

In [3]:
video_files = [
    'driving1.mp4',
    'driving2.mp4',
    'driving3.mp4',
    'driving4.mp4'
]
metadata_files = [
    'driving1.json',
    'driving2.json',
    'driving3.json',
    'driving4.json'
]

In [4]:
import json
driving_metadata = [
    json.load(open(metadata_file, 'r'))
    for metadata_file in metadata_files
]

If we inspect each JSON object we just loaded, we'll see that it has the following structure:
```
[
    {
        'video': string,
        'frame': int,
        'bboxes': [
            {
                'class': string,
                'score': float,
                'x1': float,
                'x2': float,
                'y1': float,
                'y2': float
            },
            ...
        ]
    },
    ...
]
```

### Concept 1: Bounds represent spatiotemporal volumes
Each bounding box represents a 3D spatiotemporal volume in the video. We can represent such a volume by using a Rekall `Bounds3D` object.

![title](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/videovolume.png)

In [5]:
sample_bbox = driving_metadata[0][10]['bboxes'][0]
sample_frame = driving_metadata[0][10]['frame']

In [6]:
sample_bbox

{'class': 'bicycle',
 'score': 0.1713541,
 'x1': 0.4788206100463867,
 'x2': 0.4907509803771973,
 'y1': 0.4826868693033854,
 'y2': 0.5067381964789497}

In [7]:
sample_frame

10

In [8]:
sample_bbox_as_bounds = Bounds3D(
    t1 = sample_frame,
    t2 = sample_frame,
    x1 = sample_bbox['x1'],
    x2 = sample_bbox['x2'],
    y1 = sample_bbox['y1'],
    y2 = sample_bbox['y2']
)

In [9]:
sample_bbox_as_bounds

t1:10 t2:10 x1:0.4788206100463867 x2:0.4907509803771973 y1:0.4826868693033854 y2:0.5067381964789497

We can access the co-ordinates of the bounds as follows:

In [10]:
print(sample_bbox_as_bounds['t1'])
print(sample_bbox_as_bounds['t2'])
print(sample_bbox_as_bounds['x1'])
print(sample_bbox_as_bounds['x2'])
print(sample_bbox_as_bounds['y1'])
print(sample_bbox_as_bounds['y2'])

10
10
0.4788206100463867
0.4907509803771973
0.4826868693033854
0.5067381964789497


### Concept 2: `Interval`s wrap spatiotemporal volumes with a payload
But often we will need to represent other metadata on spatiotemporal volumes. In this case, we have the class name as well as the score. For this, Rekall has the `Interval` abstraction:

In [11]:
sample_bbox_interval = Interval(
    sample_bbox_as_bounds,
    payload = { 'class': sample_bbox['class'], 'score': sample_bbox['score'] })

In [12]:
sample_bbox_interval

<Interval t1:10 t2:10 x1:0.4788206100463867 x2:0.4907509803771973 y1:0.4826868693033854 y2:0.5067381964789497 payload:{'score': 0.1713541, 'class': 'bicycle'}>

`Interval`s expose the same interface to the co-ordinates as Bounds.

In [13]:
print(sample_bbox_interval['t1'])
print(sample_bbox_interval['t2'])
print(sample_bbox_interval['x1'])
print(sample_bbox_interval['x2'])
print(sample_bbox_interval['y1'])
print(sample_bbox_interval['y2'])

10
10
0.4788206100463867
0.4907509803771973
0.4826868693033854
0.5067381964789497


But they also expose interfaces to the entire Bounds object and the payload.

In [14]:
print(sample_bbox_interval['bounds'])
print(sample_bbox_interval['payload'])

t1:10 t2:10 x1:0.4788206100463867 x2:0.4907509803771973 y1:0.4826868693033854 y2:0.5067381964789497
{'score': 0.1713541, 'class': 'bicycle'}


### Concept 3: `IntervalSet`s organize many `Interval`s
There's not much you can do with single `Interval`s, so we use `IntervalSet`s to represent collections of many `Interval`s.

![title](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/set_operations.png)

`IntervalSet`s provide a number of useful set operations (see figure above), but for now, you can just think of them as wrappers around lists of `Interval`s.

`IntervalSet`s are constructed from lists of `Interval`s:

In [15]:
driving1_interval_set = IntervalSet([
    Interval(Bounds3D(
        t1=f['frame'],
        t2=f['frame'],
        x1=bbox['x1'],
        x2=bbox['x2'],
        y1=bbox['y1'],
        y2=bbox['y2']
    ), payload = {
        'class': bbox['class'],
        'score': bbox['score']
    })
    for f in driving_metadata[0]
    for bbox in f['bboxes']
])

In [16]:
print(driving1_interval_set.size())

96585


We can recover the original (sorted by bounds) list of Intervals with `get_intervals`:

In [17]:
print(len(driving1_interval_set.get_intervals()))
print(driving1_interval_set.get_intervals()[0])

96585
<Interval t1:0 t2:0 x1:0.0 x2:0.060274869203567505 y1:0.4683924780951606 y2:0.5129659440782335 payload:{'score': 0.9973519, 'class': 'motorcycle'}>


### Concept 4: `IntervalSetMapping`s organize `IntervalSet`s from different domains
Finally, we often deal with multiple videos at once, and often don't want to mix operations from `IntervalSet`s from different domains. `IntervalSetMapping` is a wrapper around a mapping from keys to `IntervalSet`s. Often the keys will correspond to video IDs or path names.

We construct an `IntervalSetMapping` with a dictionary:

In [18]:
bbox_interval_set_mapping = IntervalSetMapping({
    video_file: IntervalSet([
        Interval(Bounds3D(
            t1=f['frame'],
            t2=f['frame'],
            x1=bbox['x1'],
            x2=bbox['x2'],
            y1=bbox['y1'],
            y2=bbox['y2']
        ), payload = {
            'class': bbox['class'],
            'score': bbox['score']
        })
        for f in metadata
        for bbox in f['bboxes']
    ])
    for video_file, metadata in zip(video_files, driving_metadata)
})

`IntervalSetMapping` reflects operations on `IntervalSet`s:

In [19]:
bbox_interval_set_mapping.size()

{'driving1.mp4': 96585,
 'driving2.mp4': 90570,
 'driving3.mp4': 211999,
 'driving4.mp4': 17983}

And recover the original dictionary using `get_grouped_intervals`.

In [20]:
bbox_interval_set_mapping.get_grouped_intervals().keys()

dict_keys(['driving3.mp4', 'driving1.mp4', 'driving4.mp4', 'driving2.mp4'])

# Step 2: Visualize Metadata Using Vgrid

Next, we'll use `Vgrid` to visualize bounding boxes on our videos.

First, we need to load up some metadata about the videos that we'd like to display. This code will use `ffprobe` to get the width, height, and fps from the local videos on disk.

In [21]:
video_metadata = [
    vblocks_builder.VideoMetadata(path=video_file, id=video_file)
    for video_file in video_files
]

In [22]:
for vm in video_metadata:
    print(vm.fps, vm.width, vm.height, vm.id)

29.97002997002997 1280 720 driving1.mp4
29.97002997002997 1280 720 driving2.mp4
59.94005994005994 1280 720 driving3.mp4
29.97002997002997 1280 720 driving4.mp4


Next, we'll need to convert the time dimension of our `Interval`s from frames to seconds.

First, we add the video file to the payload of each `Interval`:

In [23]:
bbox_interval_set_mapping_with_paths = bbox_interval_set_mapping.add_key_to_payload()

Note that the payload is now a tuple of `({'class': string, 'score': float}, path)`

In [24]:
print(bbox_interval_set_mapping_with_paths.get_grouped_intervals()['driving1.mp4'].get_intervals()[0])

<Interval t1:0 t2:0 x1:0.0 x2:0.060274869203567505 y1:0.4683924780951606 y2:0.5129659440782335 payload:({'score': 0.9973519, 'class': 'motorcycle'}, 'driving1.mp4')>


Next, we use `IntervalSet`'s `map` method (reflected in `IntervalSetMapping`) to map from frame numbers to seconds. Note that we add one frame to the end frame so that the intervals have non-zero time extent.

In [25]:
path_to_fps_mapping = {
    vm.id: vm.fps
    for vm in video_metadata
}
bbox_ism_seconds = bbox_interval_set_mapping_with_paths.map(
    lambda interval: Interval(
        Bounds3D(
            t1 = interval['t1'] / path_to_fps_mapping[interval['payload'][1]],
            t2 = (interval['t2'] + 1) / path_to_fps_mapping[interval['payload'][1]],
            x1 = interval['x1'],
            x2 = interval['x2'],
            y1 = interval['y1'],
            y2 = interval['y2']
        ),
        payload = interval['payload'][0] # just have the class and score
    )
)

### Concept 5: Vgrid visualizes Vblocks.

Vgrid visualizes "Vblocks" in a grid. We'll see what that looks like in a minute, and you can go to the [Vgrid documentation](https://scanner-research.github.io/vgrid/) for more details. For now, we just need to create some Vblocks for Vgrid to visualize.

First, we initialize our vblocks builder:

In [26]:
builder = vblocks_builder.VideoVBlocksBuilder()

Next, we add the metadata that we just created. The first field is a path to the fileserver that you started in part one of this tutorial (`Basics.md`). The second field is the list of video metadata.

In [27]:
builder.add_video_metadata(
    'http://dawn10.stanford.edu:8080', video_metadata
)

<vgrid.vblocks_builder.VideoVBlocksBuilder at 0x7fa926cacbe0>

Next, we add bounding boxes to our builder using `add_track` and `VideoTrackBuilder`. We'll also filter our bounding boxes to only the ones that the detector had greater than 75% confidence in. Finally, we'll set the draw type of this track to `bbox`.

In [28]:
builder.add_track(
    vblocks_builder.VideoTrackBuilder(
        'bounding_boxes',
        bbox_ism_seconds.filter(lambda interval: interval['payload']['score'] > 0.75)
    ).set_draw_type(vblocks_builder.DrawType_Bbox())
)

<vgrid.vblocks_builder.VideoVBlocksBuilder at 0x7fa926cacbe0>

Next, we'll turn this into a JSON-serializable format for the Javascript widget.

In [29]:
json_for_vgrid = builder.build()

Finally, we'll display this metadata in Vgrid. To play the videos:
* Click "Disable Jupyter keyboard". Make sure to enabel the keyboard again when you're done!
* Hover over the video you want to play and press "f" to expand it. Hover over it and press "f" again to contract it.
* Hover over the video while expanded and press "p" to play it.

If you notice lagging bounding boxes, it's probably just a rendering issue; if you pause the video, the boxes will be in the right place

In [30]:
VGridWidget(vgrid_json=json_for_vgrid)

VGridWidget(vgrid_json={'interval_blocks': [{'interval_dict': {'bounding_boxes': [{'y': (0.3326997545030382, 0…

### Concept 6: Vgrid can visualize multiple different tracks at once.
With our current visualization, there's no differentiation between different objects. It would be nice to to color-code different objects by different colors. We can do this by adding more tracks to Vgrid.

Let's start by having multiple `IntervalSetMapping`s, each corresponding to different objects, specified in `object_names`.

In [38]:
object_names = [
    'car',
    'truck',
    'motorcycle',
    'bicycle'
]
object_isms = [
    bbox_ism_seconds.filter(lambda interval: interval['payload']['class'] == object_name)
    for object_name in object_names
]

Now we have an array of different `IntervalSetMapping`s, each of which only has objects from a single class. We also have `objects_other_ism`, which has objects that aren't cars, trucks, motorcycles, bicycles, or traffic lights.

We'll build a new Vgrid visualization, with one track per object. We start by creating a new builder and adding our videos to it:

In [39]:
builder = vblocks_builder.VideoVBlocksBuilder()
builder.add_video_metadata(
    'http://dawn10.stanford.edu:8080', video_metadata
)

<vgrid.vblocks_builder.VideoVBlocksBuilder at 0x7fa914806908>

Next we'll add a new track for every object:

In [40]:
for ism, object_name in zip(object_isms, object_names):
    builder.add_track(
        vblocks_builder.VideoTrackBuilder(
            object_name,
            ism.filter(lambda interval: interval['payload']['score'] > 0.75)
        ).set_draw_type(vblocks_builder.DrawType_Bbox())
    )

Finally, we'll build a JSON-serializable object for Vgrid and visualize it.

In [41]:
json_for_vgrid = builder.build()
VGridWidget(vgrid_json=json_for_vgrid)

VGridWidget(vgrid_json={'interval_blocks': [{'interval_dict': {'car': [], 'truck': [{'y': (0.4764163123236762,…