# Basics Tutorial

The purpose of this tutorial is to introduce the basics of the Rekall API and
show how Rekall queries, together with the Vgrid visualization interface, can
be used for video analysis.

### Setup
We first need to make sure that `rekall`, `vgrid`, and `vgrid_jupyter` are installed properly. If the following cell runs without error, you're all set. If not, make sure that you've followed the install instructions for [`rekall`](https://github.com/scanner-research/rekall), [`vgrid`](https://github.com/scanner-research/vgrid), and [`vgrid_jupyter`](https://github.com/scanner-research/vgrid_jupyter).

In [1]:
%load_ext autoreload
%autoreload 2
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from rekall.predicates import *
from vgrid import VGridSpec, VideoMetadata, VideoBlockFormat, FlatFormat
from vgrid_jupyter import VGridWidget
import urllib3, requests, os

# Preview

Let's first take a look at what the end product will look like (this cell will take about **20 seconds** to load all the data from olimar.stanford.edu and visualize it).

You should see something like this:

![vgrid_preview](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/vgrid_preview.png)

### Once the visualization is up, hover over a cell and press `=` to expand it. Hover and use `Shift+p` or `;` to play/pause the video.

In [2]:
# Hack to disable warnings about olimar's certificate
urllib3.disable_warnings()
VIDEO_COLLECTION_BASEURL = "https://olimar.stanford.edu/hdd/rekall_tutorials/basics/"
VIDEO_METADATA_FILENAME = 'video_metadata.json'

# Load video file metadata
video_metadata = [ VideoMetadata(v['filename'], id=v['filename'], fps=v['fps'],
                                 num_frames=v['num_frames'], width=v['width'], height=v['height'])
                  for v in requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME),
                                        verify=False).json() ]

# Load bounding boxes from JSON
metadata_files = [ 'driving1.json', 'driving2.json', 'driving3.json', 'driving4.json' ]
driving_metadata = [ requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, metadata_file),
                                  verify=False).json()
                    for metadata_file in metadata_files ]

# Load bounding boxes into Rekall
path2fps = { vm.id: vm.fps for vm in video_metadata }
bbox_ism = IntervalSetMapping({
    video_file: IntervalSet([
        Interval(Bounds3D(t1=f['frame'] / path2fps[video_file], t2=(f['frame'] + 1) / path2fps[video_file],
                          x1=bbox['x1'], x2=bbox['x2'], y1=bbox['y1'], y2=bbox['y2']),
                 payload = { 'class': bbox['class'], 'score': bbox['score'] })
        for f in metadata
        for bbox in f['bboxes']
    ])
    for video_file, metadata in zip([ 'driving1.mp4', 'driving2.mp4', 'driving3.mp4', 'driving4.mp4' ],
                                    driving_metadata)
})

# Visualize bounding boxes with Vgrid
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('bounding_boxes', bbox_ism.filter(lambda interval: interval['payload']['score'] > 0.9))
    ]),
    video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xbc\xbd\xdb\xae\xac\xc9u\xa5\xf7*B]\xb3\x898\x1f|\…

# Code Walkthrough
Now let's walk through the above code bit by bit to get an idea of what's going on.

## Load Video metadata

First we need to get some metadata about the individual videos that we're visualizing. In particular, we need to know the FPS, duration, width, and height of each video in order to display them using Vgrid. In our case, we've already computed these things for our driving videos, but you can also use [this script](https://github.com/scanner-research/esperlight/blob/master/create_video_metadata.py) to compute them for you (`fmpeg/ffprobe` are dependencies).

### This code loads pre-computed FPS, duration, width, and height of each video and puts them into `VideoMetadata` objects:

In [3]:
# Hack to disable warnings about olimar's certificate
urllib3.disable_warnings()
VIDEO_COLLECTION_BASEURL = "https://olimar.stanford.edu/hdd/rekall_tutorials/basics/"
VIDEO_METADATA_FILENAME = 'video_metadata.json'

metadata_json = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME),
                             verify=False).json()

# Load video file metadata
video_metadata = [
    VideoMetadata(
        v['filename'], id=v['filename'], fps=v['fps'],
        num_frames=int(v['num_frames']), width=v['width'], height=v['height']
    )
    for v in metadata_json
]

Let's go line by line:

Lines 2-4 specify the location of the metadata. You can look at the JSON file yourself by going to https://olimar.stanford.edu/hdd/rekall_tutorials/basics/.

    urllib3.disable_warnings()
    VIDEO_COLLECTION_BASEURL = "https://olimar.stanford.edu/hdd/rekall_tutorials/basics/"
    VIDEO_METADATA_FILENAME = 'video_metadata.json'
    
Lines 6-7 get the data with an HTTP request and parse it into JSON:

    metadata_json = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME),
                                 verify=False).json()

At this point, `metadata_json` is a list of Python objects, with information about each video's filename, FPS, width, height, and number of frames. We can loop through this list and construct a list of `VideoMetadata` objects with that information:

    video_metadata = [
        VideoMetadata(
            v['filename'], id=v['filename'], fps=v['fps'],
            num_frames=v['num_frames'], width=v['width'], height=v['height']
        )
        for v in metadata_json
    ]
    
`VideoMetadata` objects are constructed by passing in `path`, `id`, `fps`, `num_frames`, `width`, and `height`. The `path` is used by Vgrid and a fileserver to serve the video, and `id` is a key that links visual bounding box metadata to the videos. In this case, we use the path for both.

### Aside: Visualizing videos directly.

Now that we've loaded in video-level metadata, we can visualize the videos in Vgrid directly.

#### Again, hover over videos and use `=` to expand the thumbnails. Then use `Shift-P` or `;` to play the videos.

In [4]:
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = None, video_meta = video_metadata),
    video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xd5U\xc1n\xa30\x10\xfd\x95\xc8\xe7\n\x0c\x98Rr\xec…

Going line by line:

Lines 1-5 specify a Vgrid spec for the widget:

    vgrid_spec = VGridSpec(
        video_meta = video_metadata,
        vis_format = VideoBlockFormat(imaps = None, video_meta = video_metadata),
        video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
    )
    
The `video_meta` option takes in a list of per-video metadata. `vis_format` specifies how the individual blocks in Vgrid should be drawn, along with what to draw on them. In this case, we are using `VideoBlockFormat` and passing in `None` for `imaps` and `video_metadata` for `video_meta`. This will automatically create one block for each video in `video_meta`. Later one, we'll see how we can use it to draw the spatial metadata as well. Finally, `video_endpoint` specifies that we should look for the videos on the `olimar` server.

Finally, line 6 creates the widget and displays it in our Jupyter environment:

    VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())
    
We pass the spec to the Vgrid widget as compressed JSON. Since it's the last line of the cell, it gets displayed below the cell automatically.

## Load Bounding Boxes from JSON

But as we saw earlier, we can do a lot more than just look at videos if we have some spatial metadata to associate with the videos.

###  This code loads bounding box data associated with each video from olimar:

In [5]:
metadata_files = [ 'driving1.json', 'driving2.json', 'driving3.json', 'driving4.json' ]
driving_metadata = [
    requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, metadata_file),
                 verify=False).json()
    for metadata_file in metadata_files
]

Going line by line:

Line 1 specifies the names of the metadata files on the server:

    metadata_files = [ 'driving1.json', 'driving2.json', 'driving3.json', 'driving4.json' ]
    
In this case, we've pre-loaded bounding box metadata into `driving1.json`, `driving2.json`, etc. Each JSON file contains all the bounding boxes for the corresponding video (`driving1.mp4`, `driving2.mp4`, etc).

Lines 2-6 make an HTTP request to the server and parse the JSON files:

    driving_metadata = [
        requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, metadata_file),
                     verify=False).json()
        for metadata_file in metadata_files
    ]

Line 3 specifies the path (join the base URL to the specific metadata file), and line 4 specifies that we should parse the file as JSON.

If you manually inspect the parsed objects, you'll see that each one has the following format:

    [
        {
            'video': string,
            'frame': int,
            'bboxes': [
                {
                    'class': string,
                    'score': float,
                    'x1': float,
                    'x2': float,
                    'y1': float,
                    'y2': float
                },
                ...
            ]
        },
        ...
    ]
    
In other words, `driving_metadata[0][frame]['bboxes'][i]` contains Bbox `i` from frame `frame` in `driving1.mp4`.

In [6]:
driving_metadata[0][10]['bboxes'][0]

{'class': 'person',
 'score': 0.17135410010814667,
 'x1': 0.4788206100463867,
 'x2': 0.4907509803771973,
 'y1': 0.4826868693033854,
 'y2': 0.5067381964789497}

## Load Bounding Boxes into Rekall

Now that we've loaded our bounding boxes from JSON, we can load them into Rekall.

In [7]:
path2fps = { vm.id: vm.fps for vm in video_metadata }
bbox_ism = IntervalSetMapping({
    video_file: IntervalSet(
        [
            Interval(
                Bounds3D(
                    t1=f['frame'] / path2fps[video_file],
                    t2=(f['frame'] + 1) / path2fps[video_file],
                    x1=bbox['x1'], x2=bbox['x2'], y1=bbox['y1'], y2=bbox['y2']
                ),
                payload = { 'class': bbox['class'], 'score': bbox['score'] }
            )
            for f in metadata
            for bbox in f['bboxes']
        ]
    )
    for video_file, metadata in zip(
        [ 'driving1.mp4', 'driving2.mp4', 'driving3.mp4', 'driving4.mp4' ],
        driving_metadata
    )
})

This cell contains Rekall's core abstractions, so let's go line by line again.

We loaded bounding boxes per-frame in JSON, but we'll need to convert their timestamps to seconds for Vgrid. Line 1 creates a mapping from video name to the FPS of the video:

    path2fps = { vm.id: vm.fps for vm in video_metadata }

Lines 2-21 creates an `IntervalSetMapping` containing all the bounding boxes:

    bbox_ism = IntervalSetMapping({
        video_file: IntervalSet(
            [
                Interval(
                    Bounds3D(
                        t1=f['frame'] / path2fps[video_file],
                        t2=(f['frame'] + 1) / path2fps[video_file],
                        x1=bbox['x1'], x2=bbox['x2'], y1=bbox['y1'], y2=bbox['y2']
                    ),
                    payload = { 'class': bbox['class'], 'score': bbox['score'] }
                )
                for f in metadata
                for bbox in f['bboxes']
            ]
        )
        for video_file, metadata in zip(
            [ 'driving1.mp4', 'driving2.mp4', 'driving3.mp4', 'driving4.mp4' ],
            driving_metadata
        )
    })
    
### The core abstraction of Rekall is an `Interval`, which contains a `Bounds` and a payload

![videovolume](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/videovolume.png)

The `Bounds` contains 3D spatial co-ordinates, while the payload contains other metadata about each `Interval`; in this case, each `Interval` corresponds to a bounding box (`t1` and `t2` in seconds, spatial co-ordinates in co-ordinates relative to the frame size), and the payload contains class information and the confidence score for each bounding box:

    Interval(
        Bounds3D(
            t1=f['frame'] / path2fps[video_file],
            t2=(f['frame'] + 1) / path2fps[video_file],
            x1=bbox['x1'], x2=bbox['x2'], y1=bbox['y1'], y2=bbox['y2']
        ),
        payload = { 'class': bbox['class'], 'score': bbox['score'] }
    )
    
### An `IntervalSet` is a collection of related `Interval`s

We create an `IntervalSet` by passing in a list of `Interval`s. In this case, we create an `IntervalSet` for each video by looping through each frame, and through all the bounding boxes for the frame:

    IntervalSet(
        [
            Interval(
                ...
            )
            for f in metadata
            for bbox in f['bboxes']
        ]
    )

### An `IntervalSetMapping` organizes `IntervalSet`s from different domains

Finally, we organize `IntervalSet`s from different videos using an `IntervalSetMapping`, which maps from keys to `IntervalSet`. We create one by passing in a `dict`:

    bbox_ism = IntervalSetMapping({
        video_file: IntervalSet(
            [
                Interval(...)
                for f in metadata
                for bbox in f['bboxes']
            ]
        )
        for video_file, metadata in zip(
            [ 'driving1.mp4', 'driving2.mp4', 'driving3.mp4', 'driving4.mp4' ],
            driving_metadata
        )
    })

In this case, we are mapping from video name to the `IntervalSet` with that video's bounding boxes.

#### Note that we are looping through the video names and JSON metadata together. This way, the keys of `bbox_ism` correspond to the IDs we set when we created `video_metadata`.

## Display videos with bounding box metadata in Vgrid

Finally, we can display the bounding box metadata drawn over the video metadata using Vgrid. To see more documentation about the Vgrid API, check out the [Vgrid documentation](https://github.com/scanner-research/vgrid#javascript-and-python).

In [8]:
# Visualize bounding boxes with Vgrid
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('bounding_boxes', bbox_ism.filter(lambda interval: interval['payload']['score'] > 0.9))
    ]),
    video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xbc\xbd\xdb\xae\xac\xc9u\xac\xf7*B_s\x13y>\xf8\xd2…

Again, line by line:

Lines 2-8 specify a VGridSpec for the widget:

    vgrid_spec = VGridSpec(
        video_meta = video_metadata,
        vis_format = VideoBlockFormat(imaps = [
            ('bounding_boxes', bbox_ism.filter(lambda interval: interval['payload']['score'] > 0.9))
        ]),
        video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
    )
    
This is the same as above, except this time we are passing `imaps` into `VideoBlockFormat`. This argument expects a list of pairs of `(String, IntervalSetMapping)`. In this case, we are only passing in a single `IntervalSetMapping` (and filtering out any bounding boxes whose score isn't high enough).

Note that the ID's in `video_metadata` match the keys in the `bbox_ism` `IntervalSetMapping` that we created earlier. Vgrid uses this mapping to know what bounding boxes to draw on which videos.

Line 9 passes the spec (with data) as compressed JSON and creates a VGrid Jupyter widget with the data. Since it's the last line of the cell, it gets displayed below the cell automatically.

# Visualizing Multiple Tracks

Now that we have a handle on visualizing `IntervalSetMapping`s with Vgrid, let's use some of Rekall's functionality to display a more meaningful visualization.

## `IntervalSet` and `IntervalSetMapping` come with a number of useful set operations

![set_operations](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/set_operations.png)

We've already used the `filter` operation above to filter out any bounding boxes whose score isn't high enough. We can use it again to get a number of different `IntervalSetMapping`s, each of which corresponds to different object categories:

In [9]:
object_names = [
    'person',
    'car',
    'truck',
    'traffic light'
]
object_isms = [
    bbox_ism.filter(lambda interval: interval['payload']['class'] == object_name)
    for object_name in object_names
]

The above code creates a list of `IntervalSetMapping`s, each of which corresponds to a different object. The first `IntervalsetMapping` of `object_isms` contains all the bounding boxes with the `person` class, the second one contains all the bounding boxes with the `car` class, etc.

This code visualizes the different `IntervalSetMapping`s. Each `IntervalSetMapping` will have a different track on the timeline and be visualized with a different color. Notice that we pass in more items into the `imaps` argument of `VideoBlockFormat`.

In [10]:
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        (
            object_name,
            ism.filter(lambda interval: interval['payload']['score'] > 0.9)
        )
        for ism, object_name in zip(object_isms, object_names)
    ]),
    video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xc4\xbd\xd9\xce4=v\xa5w+B\x1d7^p\x1e|\xe8[\xb0\xcf…

# Using `IntervalSetMapping` operations to query for events

Finally, we can use `IntervalSetMapping`'s built-in functions to query for events in our data. One particularly useful operation is joining two sets together to find interesting relationships between them:

![simple_join](https://olimar.stanford.edu/hdd/rekall_tutorials/basics/simple_code.png)

## Find people standing in front of cars

We can use the `join` operation with some of Rekall's built-in predicates to find examples of people standing in front of cars:

In [11]:
from rekall.predicates import *
person_ism = object_isms[0].filter(lambda interval: interval['payload']['score'] > 0.99)
car_ism = object_isms[1].filter(lambda interval: interval['payload']['score'] > 0.99)
person_in_front_of_cars = person_ism.join(
    car_ism,
    predicate = and_pred(
        Bounds3D.T(equal()), # The pair have to be equal along the time dimension
        Bounds3D.X(overlaps()), # The boxes overlap in the X dimension
        Bounds3D.Y(overlaps()) # The boxes overlap in the Y dimension
    ),
    merge_op = lambda person, car: Interval(
        person['bounds'].span(car['bounds']), # We use the "span" method of Bounds3D to get a spanning bound
        payload = {
            'class': 'person_overlap_car',
            'score': person['payload']['score'] * car['payload']['score']
        }
    ),
    window = 0.5 # Only look at pairs that differ by less than half a second from each other
)

This code is worth breaking down further. Let's go line by line:

Lines 4 and 5 establish that we are joining `person_ism` to `car_ism`. This will join every IntervalSet in `person_ism` to the right IntervalSet in `car_ism` (by the mapping key).

    4   person_in_front_of_cars = person_ism.join(
    5       car_ism,

Lines 6-10 establish the predicate on pairs of joined Intervals.

     6   predicate = and_pred(
     7       Bounds3D.T(equal()), # The pair have to be equal along the time dimension
     8       Bounds3D.X(overlaps()), # The boxes overlap in the X dimension
     9       Bounds3D.Y(overlaps()) # The boxes overlap in the Y dimension
    10   ),
    
`and_pred` is a predicate that computes a logical and between many predicates.

`Bounds3D.T(equal())` is a one-dimensional predicate that says that a certain dimension (by default, (t1, t2), but we explicitly state it here for clarity) has to be equal between the two Intervals in the pair.

`Bounds3D.X(overlaps())` is a one-dimensional predicate cast to the X dimension (default is the time dimension) and says that the two Intervals have to overlap in that dimension.

`Bounds3D.Y(overlaps())` is a one-dimensional predicate cast to the Y dimension (default is the time dimension) and says that the two Intervals have to overlap in that dimension.

See the Rekall documentation for more information about predicates.

Lines 11-17 establish the merge_op to merge the pair of Intervals back to a single Interval.

    11  merge_op = lambda person, car: Interval(
    12      person['bounds'].span(car['bounds']), # We use the "span" method of Bounds3D to get a spanning bound
    13      payload = {
    14          'class': 'person_overlap_car',
    15          'score': person['payload']['score'] * car['payload']['score']
    16      }
    17  ),
    
Line 11 establishes the arguments to the merge op - we take in two Intervals, coming from the IntervalSet on the left and the IntervalSet on the right of the join, respectively.

    11   merge_op = lambda person, bicycle: Interval(
Line 12 merges the bounds of the two of the two Intervals:

    12      person['bounds'].span(car['bounds']), # We use the "span" method of Bounds3D to get a spanning bound

We use the span method of Bounds3D to get the minimum Bounds spanning both Intervals.

Lines 13-16 establish the new payload - a new class, whose score is the product of the constituent scores.

    11      payload = {
    12          'class': 'person_overlap_car',
    13          'score': person['payload']['score'] * car['payload']['score']
    14      }

Finally, line 18 specifies that we should only look at pairs of Intervals that are overlapping or less than 0.5 seconds apart from each other in the time axis.

    16  window = 0.5 # Only look at pairs that differ by less than half a second from each other
    
For more details, see the `IntervalSet` [documentation](https://rekallpy.readthedocs.io/en/latest/index.html#rekall.IntervalSet).

In [12]:
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('person_car_overlap', person_in_front_of_cars)
    ]),
    video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xe5\x9d]o\\\xc7\x91\x86\xffJ\xc0\xeb\x80\xee\xef\x…

## Visualize using `FlatFormat`

### Results not great because of Vgrid bug (better with frameserver)

Now we'll visualize our results using Vgrid again. Instead of visualizing each video in its own block, we can visualize each Interval in its own block.

In [13]:
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = FlatFormat(person_in_front_of_cars),
    video_endpoint = 'https://olimar.stanford.edu/hdd/rekall_tutorials/basics/'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xed]Mo\x1c\xc9\x91\xfd+\x06\xcf\x06\x95\x11\x99\x1…