In [1]:
%load_ext autoreload
%autoreload 2
from user_study_helpers import *

# Load Dev Set and primitive data

Expand a video by clicking on it; then use `;` to play the video.

In [2]:
bounding_boxes = get_bboxes('dev')
parking_space_gt = get_gt( )
visualize_boxes([bounding_boxes, parking_space_gt])
# red - all mask cnn detections
# green - ground truth empty parking spaces

100%|██████████| 2/2 [00:00<00:00, 68.94it/s]
100%|██████████| 2/2 [00:00<00:00, 422.17it/s]


VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xc4\xbd\xcb\xae$Kr\xae\xf7*\x07=\x16\x04\xbf_4\xd4…

# Task: detect all empty parking spaces

Your goal is to write a Rekall program to detect all empty parking spaces (visualized in the second timeline above).

You're given a Rekall `IntervalSetMapping` object, `bounding_boxes`, that contains detections from Mask R-CNN. The Intervals contain 3D bounds, and the payloads contain the class and the class score:

In [52]:
#bounding_boxes[0].get_intervals()[0]
starting_cars = bounding_boxes[0].filter(
        lambda interval: interval['payload']['class'] == 'car' and interval['t1'] == 0
    ).get_intervals()
starting_cars

[<Interval t1:0.0 t2:30.0 x1:0.0 x2:0.08424050211906434 y1:0.5207680172390408 y2:0.6528446621365017 payload:{'class': 'car', 'score': 0.9638893008232117}>,
 <Interval t1:0.0 t2:30.0 x1:0.0770739495754242 x2:0.16108734607696534 y1:0.5190534379747179 y2:0.6503598531087239 payload:{'class': 'car', 'score': 0.9346588253974915}>,
 <Interval t1:0.0 t2:30.0 x1:0.16353940963745117 x2:0.2407153844833374 y1:0.5117196400960287 y2:0.636061520046658 payload:{'class': 'car', 'score': 0.9411571621894836}>,
 <Interval t1:0.0 t2:30.0 x1:0.25799615383148194 x2:0.31947376728057864 y1:0.5125064849853516 y2:0.6232725355360244 payload:{'class': 'car', 'score': 0.8447434306144714}>,
 <Interval t1:0.0 t2:30.0 x1:0.340884804725647 x2:0.40604052543640134 y1:0.4878042432996962 y2:0.6180920071072049 payload:{'class': 'car', 'score': 0.8292120695114136}>,
 <Interval t1:0.0 t2:30.0 x1:0.41316685676574705 x2:0.4804346561431885 y1:0.4915266672770182 y2:0.6030947791205512 payload:{'class': 'car', 'score': 0.9915500879

The bounding boxes are sampled every thirty seconds (hence why the Interval above has time bounds of 0 to 30), and so are the ground truth annotations.

Your goal is to fill in the following function:

```Python
def detect_empty_parking_spaces(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    first_video_id is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    # Your code here
```

This function takes in bounding box outputs from Mask R-CNN, and detects empty parking spaces. We also pass in the video ID of the first video - we guarantee that at time zero of this video, all the car detections will be parked cars.

This function will be evaluated against an unseen test set at the end of the user study. We'll be using the Average Precision metric.

We provide some helper functions to evaluate your programs on the dev set below!

This task is inspired by [this Medium blog post](https://medium.com/@ageitgey/snagging-parking-spaces-with-mask-r-cnn-and-python-955f2231c400):
* They use an off-the-shelf object detector to detect cars (like what you have in `bounding_boxes`)
* They take a timestamp where all the parking spots are full, and use car detections to get parking spots (like time 0 of `reference_video`)
* Then empty parking spots are just parking spots where there are no cars

In [106]:
def detect_empty_parking_spaces(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    reference_video is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    # Your code here
    
    cars = bounding_boxes.filter(
        lambda interval: interval['payload']['class'] == 'car'
    )
    # BAD example - return all the car detections (this is obviously incorrect...)
    starting_cars = bounding_boxes[reference_video].filter(
        lambda interval: interval['payload']['class'] == 'car' and interval['t1'] == 0
    ).get_intervals()
    
    video_lengths = {
        key: bounding_boxes[key].get_intervals()[-1]['t2']
        for key in bounding_boxes
    }
    
    t_incr = 30
    
    starting_cars_duplicated = IntervalSetMapping({
        key: IntervalSet([
            Interval(Bounds3D(
                t1 = t,
                t2 = t + t_incr,
                x1 = interval['x1'],
                x2 = interval['x2'],
                y1 = interval['y1'],
                y2 = interval['y2']
            ))
            
            for t in range(0, int(video_lengths[key]), t_incr)
            for interval in starting_cars
        ])
        
        for key in bounding_boxes
    })
    
    cars_dilated = cars.dilate(
        50
    ).map(lambda interval: Interval(interval['bounds'], payload=[interval])
    ).coalesce(
        ('t1','t2'),
        bounds_merge_op = lambda bounds1, bounds2: bounds1.span(bounds2),
        payload_merge_op = lambda p1, p2: p1 + p2,
        predicate = iou_at_least(0.5)
    ).dilate(-50)
    
    not_there = starting_cars_duplicated.minus(
            cars_dilated,
            predicate = and_pred(
                Bounds3D.T(overlaps()),
                Bounds3D.X(overlaps()),
                Bounds3D.Y(overlaps()),
                iou_at_least(0.3)
            ),
            window = 0.0,
            progress_bar = True
        )
    
    not_there_min_size = not_there.filter(lambda interval: interval['x2'] - interval['x1'] > 0.05)
    
    not_there_dilated = not_there_min_size.dilate(
        50
    ).map(lambda interval: Interval(interval['bounds'], payload=[interval])
    ).coalesce(
        ('t1','t2'),
        bounds_merge_op = lambda bounds1, bounds2: bounds1.span(bounds2),
        payload_merge_op = lambda p1, p2: p1 + p2,
        predicate = iou_at_least(0.5)
    ).dilate(-50)
        
    return not_there_min_size

## Helper functions

`evaluate_on_dev` runs `detect_empty_parking_spaces` on `bounding_boxes`, and visualizes the results along with the ground truth in the dev set. Then it computes the AP score on the dev set.

Of course, you can also run these functions yourself!

`compute_ap` takes the predicted parking spaces and the ground truth set, and computes the AP score. As we saw before, `visualize_boxes` takes a list of `IntervalSetMapping` objects and visualizes them.

In [34]:
def evaluate_on_dev():
    empty_parking_spaces = detect_empty_parking_spaces(bounding_boxes, 0)
    
    # Visualize the predictions the first row will be your predictions, second
    # row will be the ground truth
    widget = visualize_boxes([empty_parking_spaces, parking_space_gt])
    display(widget)
    
    # Compute average precision on the dev set
    ap = compute_ap(empty_parking_spaces, parking_space_gt)
    
    print('Average precision: ', ap)

In [107]:
evaluate_on_dev()

100%|██████████| 2/2 [00:00<00:00,  2.44it/s]


VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xdd\x9dKo\xdbX\x12\x85\xffJC\xebAL\xde\xf7\xcdr\x9…

100%|██████████| 2/2 [00:00<00:00, 38.38it/s]
100%|██████████| 1/1 [00:00<00:00,  1.68it/s]
100%|██████████| 2/2 [00:00<00:00,  3.79it/s]

Average precision:  0.9





In [108]:
evaluate_on_test(detect_empty_parking_spaces)

100%|██████████| 2/2 [00:00<00:00, 718.51it/s]
100%|██████████| 2/2 [00:00<00:00, 71.94it/s]
100%|██████████| 2/2 [00:01<00:00,  2.17it/s]
100%|██████████| 2/2 [00:00<00:00, 81.16it/s]
100%|██████████| 2/2 [00:00<00:00,  7.04it/s]
100%|██████████| 2/2 [00:00<00:00,  8.39it/s]

Average precision:  0.7819148936170213





# Post-hoc analysis

In [116]:
def detect_empty_parking_spaces_modified(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    reference_video is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    # Your code here
    
    cars = bounding_boxes.filter(
        lambda interval: True
    )
    # BAD example - return all the car detections (this is obviously incorrect...)
    starting_cars = bounding_boxes[reference_video].filter(
        lambda interval: interval['payload']['class'] == 'car' and interval['t1'] == 0
    ).get_intervals()
    
    video_lengths = {
        key: bounding_boxes[key].get_intervals()[-1]['t2']
        for key in bounding_boxes
    }
    
    t_incr = 30
    
    starting_cars_duplicated = IntervalSetMapping({
        key: IntervalSet([
            Interval(Bounds3D(
                t1 = t,
                t2 = t + t_incr,
                x1 = interval['x1'],
                x2 = interval['x2'],
                y1 = interval['y1'],
                y2 = interval['y2']
            ))
            
            for t in range(0, int(video_lengths[key]), t_incr)
            for interval in starting_cars
        ])
        
        for key in bounding_boxes
    })
    
    cars_dilated = cars.dilate(
        50
    ).map(lambda interval: Interval(interval['bounds'], payload=[interval])
    ).coalesce(
        ('t1','t2'),
        bounds_merge_op = lambda bounds1, bounds2: bounds1.span(bounds2),
        payload_merge_op = lambda p1, p2: p1 + p2,
        predicate = iou_at_least(0.5)
    ).dilate(-50)
    
    not_there = starting_cars_duplicated.minus(
            cars_dilated,
            predicate = and_pred(
                Bounds3D.T(overlaps()),
                Bounds3D.X(overlaps()),
                Bounds3D.Y(overlaps()),
                iou_at_least(0.3)
            ),
            window = 0.0,
            progress_bar = True
        )
    
    not_there_min_size = not_there.filter(lambda interval: interval['x2'] - interval['x1'] > 0.05)
    
    not_there_dilated = not_there_min_size.dilate(
        50
    ).map(lambda interval: Interval(interval['bounds'], payload=[interval])
    ).coalesce(
        ('t1','t2'),
        bounds_merge_op = lambda bounds1, bounds2: bounds1.span(bounds2),
        payload_merge_op = lambda p1, p2: p1 + p2,
        predicate = iou_at_least(0.5)
    ).dilate(-50)
        
    return not_there_min_size

In [117]:
def evaluate_on_dev_modified():
    empty_parking_spaces = detect_empty_parking_spaces_modified(bounding_boxes, 0)
    
    # Visualize the predictions the first row will be your predictions, second
    # row will be the ground truth
    widget = visualize_boxes([empty_parking_spaces, parking_space_gt])
    display(widget)
    
    # Compute average precision on the dev set
    ap = compute_ap(empty_parking_spaces, parking_space_gt)
    
    print('Average precision: ', ap)

In [118]:
evaluate_on_dev_modified()

100%|██████████| 2/2 [00:01<00:00,  1.99it/s]


VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xdd\x9c\xcdn\xdbX\x16\x84_%\xd0z\x10\x93\xf7\xfff9…

100%|██████████| 2/2 [00:00<00:00, 40.34it/s]
100%|██████████| 1/1 [00:00<00:00,  1.97it/s]
100%|██████████| 2/2 [00:00<00:00,  3.21it/s]

Average precision:  0.995575221238938





In [119]:
evaluate_on_test(detect_empty_parking_spaces_modified)

100%|██████████| 2/2 [00:00<00:00, 699.05it/s]
100%|██████████| 2/2 [00:00<00:00, 68.94it/s]
100%|██████████| 2/2 [00:01<00:00,  1.49it/s]
100%|██████████| 2/2 [00:00<00:00, 85.02it/s]
100%|██████████| 2/2 [00:00<00:00,  8.93it/s]
100%|██████████| 2/2 [00:00<00:00,  9.14it/s]

Average precision:  0.9865771812080537



