In [1]:
%load_ext autoreload
%autoreload 2
from user_study_helpers import *

# Load Dev Set and primitive data

Expand a video by clicking on it; then use `;` to play the video.

In [2]:
bounding_boxes = get_bboxes('dev')
parking_space_gt = get_gt( )
visualize_boxes([bounding_boxes, parking_space_gt])

100%|██████████| 2/2 [00:00<00:00, 70.24it/s]
100%|██████████| 2/2 [00:00<00:00, 497.81it/s]


VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xc4\xbd\xcb\xae$Kr\xae\xf7*\x07=\x16\x04\xbf_4\xd4…

# Task: detect all empty parking spaces

Your goal is to write a Rekall program to detect all empty parking spaces (visualized in the second timeline above).

You're given a Rekall `IntervalSetMapping` object, `bounding_boxes`, that contains detections from Mask R-CNN. The Intervals contain 3D bounds, and the payloads contain the class and the class score:

In [3]:
bounding_boxes[0].get_intervals()[0]

<Interval t1:0.0 t2:30.0 x1:0.0 x2:0.08424050211906434 y1:0.5207680172390408 y2:0.6528446621365017 payload:{'class': 'car', 'score': 0.9638893008232117}>

The bounding boxes are sampled every thirty seconds (hence why the Interval above has time bounds of 0 to 30), and so are the ground truth annotations.

Your goal is to fill in the following function:

```Python
def detect_empty_parking_spaces(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    first_video_id is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    # Your code here
```

This function takes in bounding box outputs from Mask R-CNN, and detects empty parking spaces. We also pass in the video ID of the first video - we guarantee that at time zero of this video, all the car detections will be parked cars.

This function will be evaluated against an unseen test set at the end of the user study. We'll be using the Average Precision metric.

We provide some helper functions to evaluate your programs on the dev set below!

This task is inspired by [this Medium blog post](https://medium.com/@ageitgey/snagging-parking-spaces-with-mask-r-cnn-and-python-955f2231c400):
* They use an off-the-shelf object detector to detect cars (like what you have in `bounding_boxes`)
* They take a timestamp where all the parking spots are full, and use car detections to get parking spots (like time 0 of `reference_video`)
* Then empty parking spots are just parking spots where there are no cars

In [93]:
def detect_empty_parking_spaces(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    first_video_id is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    
    # get video lengths
    video_lengths = {
        key: bounding_boxes[key].get_intervals()[-1]['t2']
        for key in bounding_boxes
    }
    
    # From time 0 of the reference video, get all bounding boxes for cars. 
    reference_start_parked_cars = bounding_boxes[reference_video].filter(
        lambda interval: interval['payload']['class'] == 'car' and interval['t1']==0
    ).get_intervals()
    
    # Copy these to extend throughout the entire video, with the same X/Y extents. These are all the parking spots. 
    parking_spots = IntervalSetMapping({
        key: IntervalSet([
            Interval(Bounds3D(
                t1 = t,
                t2 = t + 1,
                x1 = interval['x1'],
                x2 = interval['x2'],
                y1 = interval['y1'],
                y2 = interval['y2']
            ))
            for t in range(0, int(video_lengths[key]), 10)
            for interval in reference_start_parked_cars
        ])
        for key in bounding_boxes
    })
    
    # get bounding boxes for cars
    parked_cars = bounding_boxes.filter(
        lambda interval: interval['payload']['class'] == 'car')
    

    # Use the minus function to catch the set of parking bounding boxes which don't overlap with cars!
    
    empty_spots = parking_spots.minus(
        parked_cars,
        predicate = and_pred(
            Bounds3D.T(overlaps()),
            Bounds3D.X(overlaps()),
            Bounds3D.Y(overlaps()),
            iou_at_least(0.10)
        ),
        window = 0.0,
        progress_bar = True
    )
    
    empty_spots_coalesced = empty_spots.dilate(
        15
    ).map(
        lambda interval: Interval(interval['bounds'], payload=[interval])
    ).coalesce(
        ('t1', 't2'),
        bounds_merge_op = lambda bounds1, bounds2: bounds1.span(bounds2),
        payload_merge_op = lambda p1, p2: p1 + p2,
        predicate = iou_at_least(0.5)
    ).dilate(-15)

    
    empty_spots_30_seconds = empty_spots_coalesced.filter_size(
        min_size=30
    ).split(
        lambda interval: IntervalSet(interval['payload']).dilate(-15)
    )

    
    return empty_spots_30_seconds
        
    # BAD example - return all the car detections (this is obviously incorrect...)
    #return bounding_boxes.filter(lambda interval: interval['payload']['class'] == 'car')

In [94]:
visualize_boxes([detect_empty_parking_spaces(bounding_boxes, 0)])


  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:10<00:10, 10.27s/it][A
100%|██████████| 2/2 [00:46<00:00, 18.06s/it][A

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xdd\x9dMo\x1c\xc7\x15E\xff\x8a\xc1u u\xd7g\xb7\x97…

## Helper functions

`evaluate_on_dev` runs `detect_empty_parking_spaces` on `bounding_boxes`, and visualizes the results along with the ground truth in the dev set. Then it computes the AP score on the dev set.

Of course, you can also run these functions yourself!

`compute_ap` takes the predicted parking spaces and the ground truth set, and computes the AP score. As we saw before, `visualize_boxes` takes a list of `IntervalSetMapping` objects and visualizes them.

In [68]:
def evaluate_on_dev():
    empty_parking_spaces = detect_empty_parking_spaces(bounding_boxes, 0)
    
    # Visualize the predictions the first row will be your predictions, second
    # row will be the ground truth
    widget = visualize_boxes([empty_parking_spaces, parking_space_gt])
    display(widget)
    
    # Compute average precision on the dev set
    ap = compute_ap(empty_parking_spaces, parking_space_gt)
    
    print('Average precision: ', ap)

In [95]:
evaluate_on_dev()


  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:10<00:10, 10.02s/it][A
100%|██████████| 2/2 [00:45<00:00, 17.72s/it][A

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xdd\x9dMo\x1cG\x96E\xff\x8a\xc1\xf5\xc0\xca\x8c\x8…


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 12.41it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:01<00:00,  1.11it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:01<00:00,  1.15it/s][A

Average precision:  0.2650618046983637


In [96]:
evaluate_on_test(detect_empty_parking_spaces)


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 716.36it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 67.70it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:09<00:09,  9.69s/it][A
100%|██████████| 2/2 [00:47<00:00, 18.06s/it][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 31.18it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  2.32it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  2.77it/s][A

Average precision:  0.2512820512820513


# Post-hoc analysis

In [110]:
def detect_empty_parking_spaces_modified(bounding_boxes, reference_video):
    """
    Function to detect empty parking spaces.
    
    bounding_boxes is a Rekall IntervalSetMapping object whose intervals are
    bounding box outputs from Mask R-CNN.
    
    first_video_id is the video ID of the first video. We guarantee that at
    time 0 in this video, all the car detections are parking spots.
    
    This function needs to return a Rekall IntervalSetMapping object whose
    Intervals are empty parking spots in the video.
    
    The output Intervals need to have time extent (0, 30), (30, 60), etc.
    Each Interval should have the spatial extent of a single empty parking
    spot.
    """
    
    
    # get video lengths
    video_lengths = {
        key: bounding_boxes[key].get_intervals()[-1]['t2']
        for key in bounding_boxes
    }
    
    # From time 0 of the reference video, get all bounding boxes for cars. 
    reference_start_parked_cars = bounding_boxes[reference_video].filter(
        lambda interval: interval['payload']['class'] == 'car' and interval['t1']==0
    ).get_intervals()
    
    # Copy these to extend throughout the entire video, with the same X/Y extents. These are all the parking spots. 
    parking_spots = IntervalSetMapping({
        key: IntervalSet([
            Interval(Bounds3D(
                t1 = t,
                t2 = t + 30,
                x1 = interval['x1'],
                x2 = interval['x2'],
                y1 = interval['y1'],
                y2 = interval['y2']
            ))
            for t in range(0, int(video_lengths[key]), 30)
            for interval in reference_start_parked_cars
        ])
        for key in bounding_boxes
    })
    
    # get bounding boxes for cars
    parked_cars = bounding_boxes.filter(
        lambda interval: True)
    

    # Use the minus function to catch the set of parking bounding boxes which don't overlap with cars!
    
    empty_spots = parking_spots.minus(
        parked_cars,
        predicate = and_pred(
            Bounds3D.T(overlaps()),
            Bounds3D.X(overlaps()),
            Bounds3D.Y(overlaps()),
            iou_at_least(0.10)
        ),
        window = 0.0,
        progress_bar = True
    )
    
    empty_spots_coalesced = empty_spots.dilate(
        15
    ).map(
        lambda interval: Interval(interval['bounds'], payload=[interval])
    ).coalesce(
        ('t1', 't2'),
        bounds_merge_op = lambda bounds1, bounds2: bounds1.span(bounds2),
        payload_merge_op = lambda p1, p2: p1 + p2,
        predicate = iou_at_least(0.5)
    ).dilate(-15)

    
    empty_spots_30_seconds = empty_spots_coalesced.filter_size(
        min_size=30
    ).split(
        lambda interval: IntervalSet(interval['payload']).dilate(-15)
    )

    
    return empty_spots_30_seconds
        
    # BAD example - return all the car detections (this is obviously incorrect...)
    #return bounding_boxes.filter(lambda interval: interval['payload']['class'] == 'car')

In [111]:
def evaluate_on_dev_modified():
    empty_parking_spaces = detect_empty_parking_spaces_modified(bounding_boxes, 0)
    
    # Visualize the predictions the first row will be your predictions, second
    # row will be the ground truth
    widget = visualize_boxes([empty_parking_spaces, parking_space_gt])
    display(widget)
    
    # Compute average precision on the dev set
    ap = compute_ap(empty_parking_spaces, parking_space_gt)
    
    print('Average precision: ', ap)

In [112]:
evaluate_on_dev_modified()


  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:03<00:03,  3.88s/it][A
100%|██████████| 2/2 [00:18<00:00,  7.24s/it][A

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xdd\x9cOo\x1b\xc9\x11\xc5\xbf\x8a\xa1s`\xcd\xf4\xf…


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 36.97it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  3.79it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  3.72it/s][A

Average precision:  0.9120812507408258


In [113]:
evaluate_on_test(detect_empty_parking_spaces_modified)


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 675.47it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 63.32it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
 50%|█████     | 1/2 [00:03<00:03,  4.00s/it][A
100%|██████████| 2/2 [00:18<00:00,  7.16s/it][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00, 81.83it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  9.04it/s][A
  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:00<00:00,  9.51it/s][A

Average precision:  0.9545454545454546
