In [1]:
%load_ext autoreload
%autoreload 2
from user_study_helpers import *

# Load Dev Set and primitive data

Expand a video by clicking on it; then use `;` to play the video.

In [2]:
bounding_boxes = get_bboxes('dev')
parking_space_gt = get_gt( )
visualize_boxes([bounding_boxes, parking_space_gt])

100%|██████████| 2/2 [00:00<00:00, 71.68it/s]
100%|██████████| 2/2 [00:00<00:00, 316.85it/s]


VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xc4\xbd\xcb\xae$Kr\xae\xf7*\x07=\x16\x04\xbf_4\xd4…

# Task: detect all empty parking spaces

Your goal is to write a Rekall program to detect all empty parking spaces (visualized in the second timeline above).

You're given a Rekall `IntervalSetMapping` object, `bounding_boxes`, that contains detections from Mask R-CNN. The Intervals contain 3D bounds, and the payloads contain the class and the class score:

In [59]:
# Step 1: Get all parking spaces, duplicate across entire video
cars_at_start = bounding_boxes[0].filter(
    lambda interval: interval['t1'] == 0 and interval['payload']['class'] == 'car'
).get_intervals()

video_lengths = {
    key: bounding_boxes[key].get_intervals()[-1]['t2']
    for key in bounding_boxes
}

cars_duplicated = IntervalSetMapping({
    key: IntervalSet([
        Interval(Bounds3D(
            t1 = t,
            t2 = t + 30,
            x1 = interval['x1'],
            x2 = interval['x2'],
            y1 = interval['y1'],
            y2 = interval['y2']
        ))
        for t in range(0, int(video_lengths[key]), 30)
        for interval in cars_at_start
    ])
    for key in bounding_boxes
})

# Step 2: Get all detections for cars below y=0.5, and coalesce
cars = bounding_boxes.filter(lambda interval: interval['payload']['class'] == 'car'
                            ).filter(lambda interval: interval['y2'] < 0.8)

# Step 3: Subtract detections for cars from GT
empty_spaces = cars_duplicated.minus(
    cars,
    predicate = and_pred(
#        Bounds3D.T(equal()),
        Bounds3D.X(overlaps()),
        Bounds3D.Y(overlaps()),
        iou_at_least(0.5)
    ),
    window = 0.0,
    progress_bar = True
)



  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:03<00:03,  3.35s/it][A
100%|██████████| 2/2 [00:15<00:00,  5.91s/it][A

In [58]:
# Step 3: Subtract detections for cars from GT
empty_spaces = cars_duplicated.minus(
    cars,
    predicate = and_pred(
        Bounds3D.X(overlaps()),
        Bounds3D.Y(overlaps()),
        iou_at_least(0.5)
    ),
    window = 0.0,
    progress_bar = True
)


  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:00<00:00,  2.61it/s][A
100%|██████████| 2/2 [00:00<00:00,  2.45it/s][A

In [60]:
visualize_boxes([empty_spaces, parking_space_gt])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xed\x9dOo\x1bG\x12\xc5\xbfJ\xa0\xf3\xc2\x9a\xfe\xd…

The bounding boxes are sampled every thirty seconds (hence why the Interval above has time bounds of 0 to 30), and so are the ground truth annotations.

Your goal is to fill in the following function:

```Python
def detect_empty_parking_spaces(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    first_video_id is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    # Your code here
```

This function takes in bounding box outputs from Mask R-CNN, and detects empty parking spaces. We also pass in the video ID of the first video - we guarantee that at time zero of this video, all the car detections will be parked cars.

This function will be evaluated against an unseen test set at the end of the user study. We'll be using the Average Precision metric.

We provide some helper functions to evaluate your programs on the dev set below!

This task is inspired by [this Medium blog post](https://medium.com/@ageitgey/snagging-parking-spaces-with-mask-r-cnn-and-python-955f2231c400):
* They use an off-the-shelf object detector to detect cars (like what you have in `bounding_boxes`)
* They take a timestamp where all the parking spots are full, and use car detections to get parking spots (like time 0 of `reference_video`)
* Then empty parking spots are just parking spots where there are no cars

In [108]:
def detect_empty_parking_spaces(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    first_video_id is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    cars_at_start = bounding_boxes[reference_video].filter(
    lambda interval: interval['t1'] == 0 and interval['payload']['class'] == 'car'
).get_intervals()

    video_lengths = {
        key: bounding_boxes[key].get_intervals()[-1]['t2']
        for key in bounding_boxes
    }

    cars_duplicated = IntervalSetMapping({
        key: IntervalSet([
            Interval(Bounds3D(
                t1 = t,
                t2 = t + 30,
                x1 = interval['x1'],
                x2 = interval['x2'],
                y1 = interval['y1'],
                y2 = interval['y2']
            ))
            for t in range(0, int(video_lengths[key]), 30)
            for interval in cars_at_start
        ])
        for key in bounding_boxes
    })

    # Step 2: Get all detections for cars below y=0.5, and coalesce
    cars = bounding_boxes.filter(lambda interval: interval['payload']['class'] == 'car'
                                ).filter(lambda interval: interval['y2'] < 0.9)

    # Step 3: Subtract detections for cars from GT
    empty_spaces = cars_duplicated.minus(
        cars,
        predicate = and_pred(
            Bounds3D.T(equal()),
            Bounds3D.X(overlaps()),
            Bounds3D.Y(overlaps()),
            iou_at_least(0.4)
        ),
        window = 0.0,
        progress_bar = True
    )
    # Your code here
    
    # BAD example - return all the car detections (this is obviously incorrect...)
    return empty_spaces

## Helper functions

`evaluate_on_dev` runs `detect_empty_parking_spaces` on `bounding_boxes`, and visualizes the results along with the ground truth in the dev set. Then it computes the AP score on the dev set.

Of course, you can also run these functions yourself!

`compute_ap` takes the predicted parking spaces and the ground truth set, and computes the AP score. As we saw before, `visualize_boxes` takes a list of `IntervalSetMapping` objects and visualizes them.

In [109]:
def evaluate_on_dev():
    empty_parking_spaces = detect_empty_parking_spaces(bounding_boxes, 0)
    
    # Visualize the predictions the first row will be your predictions, second
    # row will be the ground truth
    widget = visualize_boxes([empty_parking_spaces, parking_space_gt])
    display(widget)
    
    # Compute average precision on the dev set
    ap = compute_ap(empty_parking_spaces, parking_space_gt)
    
    print('Average precision: ', ap)

In [110]:
evaluate_on_dev()


  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:03<00:03,  3.36s/it][A
100%|██████████| 2/2 [00:14<00:00,  5.83s/it][A

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xed\x9dOo\x1bG\x12\xc5\xbfJ\xa0\xf3\xc2\x9a\xfe\xd…


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 29.91it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  2.94it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  3.09it/s][A

Average precision:  0.7075471698113207


In [111]:
evaluate_on_test(detect_empty_parking_spaces)


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 663.71it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 69.12it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:03<00:03,  3.27s/it][A
100%|██████████| 2/2 [00:15<00:00,  6.09s/it][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 77.04it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  6.55it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  8.06it/s][A

Average precision:  0.6591928251121076


# Post-hoc Analysis

In [119]:
def detect_empty_parking_spaces_modified(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    first_video_id is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    cars_at_start = bounding_boxes[reference_video].filter(
    lambda interval: interval['t1'] == 0 and interval['payload']['class'] == 'car'
).get_intervals()

    video_lengths = {
        key: bounding_boxes[key].get_intervals()[-1]['t2']
        for key in bounding_boxes
    }

    cars_duplicated = IntervalSetMapping({
        key: IntervalSet([
            Interval(Bounds3D(
                t1 = t,
                t2 = t + 30,
                x1 = interval['x1'],
                x2 = interval['x2'],
                y1 = interval['y1'],
                y2 = interval['y2']
            ))
            for t in range(0, int(video_lengths[key]), 30)
            for interval in cars_at_start
        ])
        for key in bounding_boxes
    })

    # Step 2: Get all detections for cars below y=0.5, and coalesce
    cars = bounding_boxes.filter(lambda interval: True
                                ).filter(lambda interval: interval['y2'] < 0.9)

    # Step 3: Subtract detections for cars from GT
    empty_spaces = cars_duplicated.minus(
        cars,
        predicate = and_pred(
            Bounds3D.T(equal()),
            Bounds3D.X(overlaps()),
            Bounds3D.Y(overlaps()),
            iou_at_least(0.4)
        ),
        window = 0.0,
        progress_bar = True
    )
    # Your code here
    
    # BAD example - return all the car detections (this is obviously incorrect...)
    return empty_spaces

In [120]:
def evaluate_on_dev_modified():
    empty_parking_spaces = detect_empty_parking_spaces_modified(bounding_boxes, 0)
    
    # Visualize the predictions the first row will be your predictions, second
    # row will be the ground truth
    widget = visualize_boxes([empty_parking_spaces, parking_space_gt])
    display(widget)
    
    # Compute average precision on the dev set
    ap = compute_ap(empty_parking_spaces, parking_space_gt)
    
    print('Average precision: ', ap)

In [121]:
evaluate_on_dev_modified()


  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:03<00:03,  3.81s/it][A
100%|██████████| 2/2 [00:18<00:00,  7.15s/it][A

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xdd\x9cOo\x1b\xc9\x11\xc5\xbf\x8a\xa1s`\xcd\xf4\xf…


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 37.82it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  3.71it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  3.67it/s][A

Average precision:  0.9259259259259259


In [122]:
evaluate_on_test(detect_empty_parking_spaces_modified)


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 598.76it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 70.04it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:03<00:03,  3.86s/it][A
100%|██████████| 2/2 [00:18<00:00,  7.09s/it][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 78.99it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  8.30it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  8.96it/s][A

Average precision:  0.9423076923076923
