# Your First Dataset

In this tutorial, you'll load up your first dataset, and write your first query. At the end of this notebook are also some tips for setting up your own dataset.

## The Dataset: Cable TV News

We'll be using a few cable TV news videos for the workshop. You can find them at https://olimar.stanford.edu/hdd/rekall_tutorials/workshop/videos/. Go ahead and [check one out](https://olimar.stanford.edu/hdd/rekall_tutorials/workshop/videos/CNNW_20150408_200000_The_Lead_With_Jake_Tapper.mp4) now.

### Step 1: Load the videos in a Jupyter notebook

Loading up videos in a browser works fine for casual viewing, but we'll want to actually load the data in a Jupyter notebook to write queries and do analysis. We can do this using Vgrid, the visualization side of the Rekall ecosystem.

Go ahead and run the next cell to get started - if you installed everything correctly, you should see nine thumbnails. **You can click on a thumbnail to expand the video and play it.**

In [1]:
from vgrid import VGridSpec, VideoMetadata, VideoBlockFormat
from vgrid_jupyter import VGridWidget
import urllib3, requests, os
urllib3.disable_warnings()

VIDEO_COLLECTION_BASEURL = "http://olimar.stanford.edu/hdd/rekall_tutorials/workshop"
VIDEO_METADATA_FILENAME = "data/video_meta.json"

req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = req.json()

video_metadata = [
    VideoMetadata(v["path"], v["id"], v["fps"], int(v["num_frames"]), v["width"], v["height"])
    for v in video_collection
]

VIDEO_ENDPOINT = "http://olimar.stanford.edu/hdd/rekall_tutorials/workshop/videos"
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = None, video_meta = video_metadata),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xe5\x96Kk\xe30\x10\x80\xffJ\xf1yI$E~\xe5\xd8\xc2\x…

### What did we do?

Let's briefly walk through what our code did.

To visualize the videos, we need some **basic metadata** about the videos - their width, height, fps, and duration (in terms of number of frames). These were pre-comptued for you, and are stored on our server at http://olimar.stanford.edu/hdd/rekall_tutorials/workshop/data/video_meta.json. **If you're setting up your own dataset, you'll need to compute these info about your videos -- more about that below.**

All we do is load this JSON file into Python:

```Python
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = req.json()
```

You can inspect the contents of `video_collection` to see what the metadata look like:

In [2]:
video_collection[0]

{'id': 17458,
 'path': 'CNNW_20161218_210000_CNN_Newsroom_With_Fredricka_Whitfield.mp4',
 'num_frames': 219440,
 'height': 360,
 'fps': 59.94,
 'width': 640}

Then we created `VideoMetadata` objects from the JSON file, and visualized them using a `VGridWidget`:

```Python
video_metadata = [
    VideoMetadata(v["path"], v["id"], v["fps"], int(v["num_frames"]), v["width"], v["height"])
    for v in video_collection
]

VIDEO_ENDPOINT = "http://olimar.stanford.edu/hdd/rekall_tutorials/workshop/videos"
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = None, video_meta = video_metadata),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())
```

**Notice that we specify the `video_endpoint` when instantiating the VGridSpec.** This tells our web browser where to look for videos. If you're using your own dataset, you may need to use a different video endpoint -- this can be a localhost server on your laptop, or an actual server that you run. Some tips on setting that up are at the bottom of this notebook.

## Step 2: Load in the outputs of some off-the-shelf networks

Now that we have videos loaded in, let's load in the outputs of a face detector and identity classifier. We already ran a face detector and identity classifier over these videos (one frame every three seconds), and have stored the results in a [JSON file](https://olimar.stanford.edu/hdd/rekall_tutorials/workshop/data/faces.json) on our server.

Go ahead and run the next cell - this code will:
* Load the JSON file from the server
* Load the face bounding boxes and identities **into Rekall**
* Convert the time units from frame numbers to seconds
* Display the face bounding boxes in Vgrid

In [3]:
# Rekall imports
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from rekall.stdlib import ingest

# Load the JSON file from the server
FACES_JSON = "data/faces.json"
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, FACES_JSON), verify=False)
faces_json = req.json()

# Load the face bounding boxes into Rekall
faces_ism = ingest.ism_from_iterable_with_schema_bounds3D(
    faces_json,
    ingest.getter_accessor,
    {
        'key': 'video_id',
        't1': 'frame_number', # NOTE that the JSON format has frame timestamps!
        't2': 'frame_number',
        'x1': 'x1',
        'x2': 'x2',
        'y1': 'y1',
        'y2': 'y2'
    },
    with_payload = lambda json_obj: json_obj
)

# Convert from frames to seconds
video_meta_by_id = {
    vm.id: vm
    for vm in video_metadata
}

faces_ism = faces_ism.map(
    lambda face: Interval(
        Bounds3D(
            # We convert from frames to seconds, and account for temporal downsampling
            face['t1'] / video_meta_by_id[face['payload']['video_id']].fps - 1.5,
            face['t2'] / video_meta_by_id[face['payload']['video_id']].fps + 1.5,
            face['x1'], face['x2'], face['y1'], face['y2']
        ),
        face['payload']
    )
)

# Display in VGrid
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('faces', faces_ism)
    ]),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcd\x8edA\x92\x9d\xf7*\xc4\xac\x85\x86\xff…

### What did we do?

Let's look through what we did again.

#### Load faces JSON file
First, we loaded the faces JSON file:
```Python
FACES_JSON = "data/faces.json"
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, FACES_JSON), verify=False)
faces_json = req.json()
```

You can inspect the results to see what this looks like:

In [4]:
faces_json[0]

{'video_id': 19882,
 'x1': 0.0,
 'y1': 0.0606050528585911,
 'x2': 0.158529967069626,
 'y2': 0.388492971658707,
 'score': 1.0,
 'frame_number': 44055,
 'gender': 'M',
 'gender_score': 1.0,
 'identity': 'david m. rodriguez',
 'identity_score': 1.0}

Notice that the X and Y co-ordinates are frame-relative.

#### Load Faces into Rekall

Then, we loaded the faces into Rekall:

```Python
# Load the face bounding boxes into Rekall
faces_ism = ingest.ism_from_iterable_with_schema_bounds3D(
    faces_json,
    ingest.getter_accessor,
    {
        'key': 'video_id',
        't1': 'frame_number', # NOTE that the JSON format has frame timestamps!
        't2': 'frame_number',
        'x1': 'x1',
        'x2': 'x2',
        'y1': 'y1',
        'y2': 'y2'
    },
    with_payload = lambda json_obj: json_obj
)
```

This function iterated through our `faces_json` object, and loaded in bounding boxes with times `t1` to `t2` (in frame-time), and X and Y coordinates of `x1`, `x2`, `y1`, `y2`. It also associated each bounding box with the right video using `video_id`, and gave each bounding box a "payload" of the JSON object it came from.

#### Convert to Seconds

Vgrid expects time co-ordinates in terms of seconds, so next we convert the time co-ordinates into seconds. We also accounted for the fact that we only ran face detection once every three seconds.

```Python
# Convert from frames to seconds
video_meta_by_id = {
    vm.id: vm
    for vm in video_metadata
}

faces_ism = faces_ism.map(
    lambda face: Interval(
        Bounds3D(
            # We convert from frames to seconds, and account for temporal downsampling
            face['t1'] / video_meta_by_id[face['payload']['video_id']].fps - 1.5,
            face['t2'] / video_meta_by_id[face['payload']['video_id']].fps + 1.5,
            face['x1'], face['x2'], face['y1'], face['y2']
        ),
        face['payload']
    )
)
```

#### Display in VGrid

Finally, we displayed these bounding boxes in VGrid:

```Python
# Display in VGrid
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('faces', faces_ism)
    ]),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())
```

Notice that we pass in a list of tuples to the `imaps` argument of `VideoBlockFormat`.

## Step 2.5: Display More Data

We can also use a bit more of VGrid's visualization capabilities to display the predicted identity of each face (notice that not all faces have identities):

In [5]:
from vgrid import SpatialType_Bbox

vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('faces_with_identities', faces_ism.map(
            lambda face: Interval(
                face['bounds'],
                {
                    'spatial_type': SpatialType_Bbox(
                        text = (
                            face['payload']['identity']
                            if face['payload']['identity_score'] > 0.9
                            else ''
                        )
                ), }
            )
        ))
    ]),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xdb\x8e$\xc9\x91\xa6\xf9*\x05^/\nz>\xec\xe…

This visualizatio code is the same as before, except now we change the payload of our bounding boxes to display some text if the `identity_score` is high enough:

```Python
VideoBlockFormat(imaps = [
    ('faces_with_identities', faces_ism.map(
        lambda face: Interval(
            face['bounds'],
            {
                'spatial_type': SpatialType_Bbox(
                    text = (
                        face['payload']['identity']
                        if face['payload']['identity_score'] > 0.9
                        else ''
                    )
            ), }
        )
    ))
])
```


## Step 3: Your First Query!

Now let's write your first query - let's look for every detected instance of Jake Tapper's face. With Rekall, this is a simple filter function:

In [6]:
# Load up Rekall predicates
from rekall.predicates import *

jake_tapper = faces_ism.filter(
    lambda face: face['payload']['identity'] == 'jake tapper'
)

# Display in VGrid
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('jake tapper', jake_tapper)
    ]),
    video_endpoint = VIDEO_ENDPOINT
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xdb\xaelKR\xa6\xf9*(\xaf[S~>\xf4e\xbfB\xf7…

### What We Did

All we did was filter for face bounding boxes where the detected identity was Jake Tapper:

```Python
jake_tapper = faces_ism.filter(
    lambda face: face['payload']['identity'] == 'jake tapper'
)
```

We'll be writing many queries like this (and much more complex ones) during the workshop!

# Next Steps: Setting up your own dataset

If you're planning on bringing your own dataset to the workshop, you'll want to re-create these steps before the workshop. Here's what you'll need to do:
1. Make sure your videos are in mp4 (necessary to display in the browser - [ffmpeg](https://askubuntu.com/questions/396883/how-to-simply-convert-video-files-i-e-mkv-to-mp4/396906#396906) is a good tool for this)
2. Get the initial metadata - width, height, fps, num_frames for your videos. If you know this already, you can just create the `VideoMetadata` objects yourself! Otherwise, [this script](https://github.com/scanner-research/esperlight/blob/master/create_video_metadata.py) is the one that we use to generate metadata. It relies on `ffmpeg`.
3. Generate primitives that you can use to write queries over. There are many options for this - object detections, face detections (like we showed here), pose estimations. As you saw in this tutorial, we have a lot of experience visualizing bounding boxes, but we can also visualize [text data](https://github.com/scanner-research/vgrid/blob/master/examples/05_text_data.py) (designed with caption/event tags in mind) and [poses](https://github.com/scanner-research/vgrid/blob/master/examples/06_keypoints.py) (example uses OpenPose).

## Displaying videos in your browser

To actually display the videos in your browser, you'll need a simple server to host your videos (and then point Vgrid to that endpoint). The simplest option is just to spin up the server on your host machine.

For example, you can navigate to the folder where you have stored the videos, and run a command like this:
```bash
python -m http.server
```
Then, you should be able to navigate to http://localhost:8000 and see all the files in that folder (including your videos). Then you'll just need to change the `video_endpoint` in your `VGridSpec`:
```Python
vgrid_spec = VGridSpec(
    video_meta = ...,
    vis_format = ...,
    video_endpoint = 'http://localhost:8000'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())
```