# Visualizing Bounding Boxes

This notebook is a simple demonstration of how to use Rekall and Vgrid to visualize bounding boxes on a video dataset. This is an abbreviated version of the [Rekall basics tutorial](https://github.com/scanner-research/rekall/blob/master/tutorials/Basics.ipynb), on a different dataset (Intel cyclist detection dataset). We'll be visualizing maskrcnn object detections as well as manually-annotated cyclist detections.

In [11]:
import urllib3, requests, json, os, pickle

# HACK: I don't want to see certificate warnings from olimar.stanford.edu
urllib3.disable_warnings()

In [2]:
# location of the video metadata file.
# It is assumed that video data is located relative to this file.
VIDEO_COLLECTION_BASEURL = "http://olimar.stanford.edu/hdd/intel_self_driving/" 
VIDEO_METADATA_FILENAME = "intel_metadata.json"

In [18]:
# Grab the metadata (width, height, number of frames, FPS) of my video collection from olimar
req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = sorted(req.json(), key=lambda vm: vm['filename'])
print("The video collection has %d videos." % len(video_collection))

The video collection has 5 videos.


In [7]:
# Names of the maskrcnn files
maskrcnn_bbox_files = [ 'maskrcnn_bboxes_0001.pkl', 'maskrcnn_bboxes_0002.pkl', 'maskrcnn_bboxes_0003.pkl',
                  'maskrcnn_bboxes_0004.pkl', 'maskrcnn_bboxes_0005.pkl' ]

# Names of the cyclist files
cyclist_bbox_files = [ 'cyclist_labels_0001.pkl', 'cyclist_labels_0002.pkl', 'cyclist_labels_0003.pkl',
                 'cyclist_labels_0004.pkl', 'cyclist_labels_0005.pkl' ]

In [29]:
# Now, load the bounding boxes from olimar.
# We load a list of lists of bboxes from olimar, and put those into a sorted list by bbox file name.
maskrcnn_bboxes = []
for bbox_file in maskrcnn_bbox_files:
    req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, bbox_file), verify=False)
    maskrcnn_bboxes.append(pickle.loads(req.content))
    
cyclist_bboxes = []
for bbox_file in cyclist_bbox_files:
    req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, bbox_file), verify=False)
    cyclist_bboxes.append(pickle.loads(req.content))

In [31]:
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from vgrid import VGridSpec, VideoMetadata, VideoBlockFormat
from vgrid_jupyter import VGridWidget

# Load the video metadata into VideoMetadata objects, using filename for the id
video_metadata = [
    VideoMetadata(v["filename"], v["filename"], v["fps"], v["num_frames"], v["width"], v["height"])
    for v in video_collection
]

# Load the maskrcnn bboxes into Rekall, using video id as key
# Units of Bounds are seconds for time, relative units for X and Y
maskrcnn_bboxes_ism = IntervalSetMapping({
    vm.id: IntervalSet([
        Interval(
            Bounds3D(
                t1 = frame_num / vm.fps,
                t2 = (frame_num + 1) / vm.fps,
                x1 = bbox[0] / vm.width,
                x2 = bbox[2] / vm.width,
                y1 = bbox[1] / vm.height,
                y2 = bbox[3] / vm.height
            ),
            payload = {
                'class': bbox[4],
                'score': bbox[5]
            }
        )
        for frame_num, bboxes_in_frame in enumerate(maskrcnn_frame_list)
        for bbox in bboxes_in_frame
    ])
    for vm, maskrcnn_frame_list in zip(video_metadata, maskrcnn_bboxes)
})

# Load the cyclist bboxes into Rekall, using video id as key
# Units of Bounds are seconds for time, relative units for X and Y
cyclist_bboxes_ism = IntervalSetMapping({
    vm.id: IntervalSet([
        Interval(
            Bounds3D(
                t1 = frame_num / vm.fps,
                t2 = (frame_num + 1) / vm.fps,
                x1 = bbox[0] / vm.width,
                x2 = bbox[2] / vm.width,
                y1 = bbox[1] / vm.height,
                y2 = bbox[3] / vm.height
            ),
            payload = {
                'class': bbox[4],
                'score': bbox[5]
            }
        )
        for frame_num, bboxes_in_frame in enumerate(cyclist_frame_list)
        for bbox in bboxes_in_frame
    ])
    for vm, cyclist_frame_list in zip(video_metadata, cyclist_bboxes)
})

In [34]:
# Visualize Mask-RCNN bboxes, and cyclist bboxes
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('mask_rcnn_bounding_boxes', maskrcnn_bboxes_ism),
        ('cyclist_bounding_boxes', cyclist_bboxes_ism)
    ]),
    video_endpoint = VIDEO_COLLECTION_BASEURL
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xc4\xbd\xcb\xb2lKr]\xf7+\xb4jK\xb4x?\xd8\xd4/H=5h\…

# If the above videos are not displaying, download the videos locally

In [36]:
for video in video_metadata:
    print("Downloading {}".format(os.path.join(VIDEO_COLLECTION_BASEURL, video.path)))
    req = requests.get(os.path.join(VIDEO_COLLECTION_BASEURL, video.path), verify=False)
    with open(video.path, 'wb') as f:
        f.write(req.content)

Downloading http://olimar.stanford.edu/hdd/intel_self_driving/0001.mp4
Downloading http://olimar.stanford.edu/hdd/intel_self_driving/0002.mp4
Downloading http://olimar.stanford.edu/hdd/intel_self_driving/0003.mp4
Downloading http://olimar.stanford.edu/hdd/intel_self_driving/0004.mp4
Downloading http://olimar.stanford.edu/hdd/intel_self_driving/0005.mp4


Or alternatively, navigate to a folder of your choosing and run the following command:
```
wget --no-check-certificate  https://olimar.stanford.edu/hdd/intel_self_driving/0001.mp4 \
    https://olimar.stanford.edu/hdd/intel_self_driving/0002.mp4 \
    https://olimar.stanford.edu/hdd/intel_self_driving/0003.mp4 \
    https://olimar.stanford.edu/hdd/intel_self_driving/0004.mp4 \
    https://olimar.stanford.edu/hdd/intel_self_driving/0005.mp4
```

# You'll need to start up a local fileserver to serve the videos.

Navigate to your esperlight folder (or to the folder where you downloaded the videos, and run:

`python3 -m http.server [PORTNUMBER]`

Where `[PORTNUMBER]` is a port of your choosing. Be sure to specify it correctly in the `video_endpoint` argument below.

In [41]:
# Visualize Mask-RCNN bboxes, and cyclist bboxes
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('mask_rcnn_bounding_boxes', maskrcnn_bboxes_ism),
        ('cyclist_bounding_boxes', cyclist_bboxes_ism)
    ]),
    video_endpoint = 'http://localhost:8080'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xc4\xbd\xcb\xb2lKr]\xf7+\xb4jK\xb4x?\xd8\xd4/H=5h\…