<img align="center" src="img/course.png" width="800">

# 16720 (B)  Object Tracking in Videos - Assignment 6 - Q5
    Instructor: Kris                          TAs: Arka, Rohan, Rawal, Sheng-Yu, Jinkun

In [6]:
# Libraries

import numpy as np
from scipy.interpolate import RectBivariateSpline
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches
%matplotlib inline

from scipy.optimize import linear_sum_assignment

## Q5: Multi-Object Tracking by Detection (EC, 45 PT)

### Overview

In this extra credit problem, you will be introduced to a more modern perspective on tracking. In the previous problems, you implemented single-object tracking with a classical method, the LK tracker. Multi-object tracking (MOT), on the other hand, is a richer and more useful problem to attack. One approach to this problem is called tracking by detection. In this paradigm, for each frame of a video, we produce object detections (typically from a learned object detection neural net). These are called proposals, and are often bounding boxes. In the next step, we associate those boxes with any existing tracked objects. For a more in-depth overview, please see the following [link](https://arshren.medium.com/an-introduction-to-object-tracking-9fd6249a76b6).

### Q5.1 Loading the bounding boxes, video and visualization (5 pts)

In the spirit of the World Cup, we will be evaluating your method on a short excerpt from a [famous soccer clip](https://youtu.be/uBa8dYlqv8Y). We'll begin by loading and visualizing the input. We have already computed bounding boxes for you, which are available in ```soccer_boxes.json```. The images to use in the video are available in ```soccer_images.npy``` Both files can be download at this [link](https://www.dropbox.com/sh/uovrr3cgtehhtuc/AAATS-GtGEwfS-z2MXRNku7Ea?dl=0).

Fill in the functions below. For testing, your result for the visualization on the 124th frame should look something like

<img align="center" src="img/sample_bbox_img.jpg" width="800">

Please submit an image of the bounding boxes rendered on the 1st frame of the sequence along with your code for this question to the writeup PDF.

In [7]:
import cv2

def load_images_and_boxes(img_path, box_path):
    """
    Given a paths to the images and bounding boxes, loads them into numpy arrays
    and returns for later use.
    """
    import json
    boxes = None
    with open(box_path, "r") as f:
        boxes = json.load(f)        
    imgs = np.load(img_path)
    return imgs, boxes

def render_single_frame(image, bboxes, colors=None):
    """
    Given an image and bounding boxes, renders the bounding boxes on top of the image 
    and saves the image. 
    Also takes in an optional array of colors to apply to the boxes
    """
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    for idx, bbox in enumerate(bboxes):
        if colors is not None:
            # Cycle through the colors if need be
            color = colors[idx]
        else:
            color = None
        x1, y1, x2, y2 = bbox
        image = cv2.rectangle(image, (int(x1), int(y1)), (int(x2), int(y2)), (int(color[0]), int(color[1]), int(color[2])))
    return image


### Q5.2 Implementing a similarity metric (10 pts)

In order to do track association, we need a way to measure how similar two bounding boxes are to each other. One way to do this is intersection-over-union (IoU). An overview of how to compute IoU is provided [here](https://pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/). Below, you will implement a function to compute IoU. The input will be a set of N bounding boxes (representing the boxes in the reference frame), and a set of M bounding boxes (representing the boxes in the next frame) and the output will be an NxM matrix, with the ```[i, j]```th entry correspoding to bounding box ```i``` in the reference frame's IoU with bounding box ```j``` of the next frame.

For this question, please submit the matrix of IoUs between the boxes on the 124th and 125th frames, as well as your code to the writeup PDF, 

In [8]:
def compute_iou(boxes1, boxes2):
    """
    Input: 
        boxes1: Nx4 ndarray, representing N bounding boxes coordinates
        boxes2: Nx4 ndarray, representing N bounding boxes coordinates
    Output: 
        iou_mat: NxM ndarray, with iou_mat[i, j] = iou(boxes1[i], boxes2[j])
    """
    # Placeholder for output ious
    out_iou = np.zeros((boxes1.shape[0], boxes2.shape[0]))

    # Abandon, necessary for industry not for a homework
    # # Calculate the bbox pairwise values for the intersection points
    # xA = np.max(boxes1[:, 0], boxes2[:, 0])
    # yA = np.max(boxes1[:, 1], boxes2[:, 1])
    # xB = np.min(boxes1[:, 2], boxes2[:, 2])
    # yB = np.max(boxes1[:, 3], boxes2[:, 3])

    # inter_area = np.max(0, xB - xA + 1) * np.max(0, yB - yA + 1)
    # box1_area  = (boxes1[:, 2] - boxes1[:, 0]) * (boxes1[:, 3] - boxes1[:, 1])
    # box2_area  = (boxes2[:, 2] - boxes2[:, 0]) * (boxes2[:, 3] - boxes2[:, 1])
    # union      = box2_area + box1_area - inter_area
    # iou_vec    = inter_area / union
    

    # Function for the for loop
    def iou_box_to_box(box1, box2):

        # Get the coordinates of each box
        b1_x1, b1_y1, b1_x2, b1_y2 = box1
        b2_x1, b2_y1, b2_x2, b2_y2 = box2

        # Get the area of the intersection
        xA = max(b1_x1, b2_x1)
        xB = min(b1_x2, b2_x2)
        yA = max(b1_y1, b2_y1)
        yB = min(b1_y2, b2_y2)
        inter_area = max(0, xB - xA + 1) * max(0, yB - yA + 1)

        # Calculate the union
        b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
        b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
        union   = b1_area + b2_area - inter_area

        return inter_area / union

    # For loop implementation
    for i in range(boxes1.shape[0]):
        for j in range(boxes2.shape[0]):
            out_iou[i, j] = iou_box_to_box(boxes1[i], boxes2[j])


    return out_iou

### Q5.3 Matching with the Hungarian Algorithm (10 pts)

Given a matrix of similarities, the next step in tracking by detection is to find the bounding box that corresponds most to the previous frame. In essence, the idea is to find the bounding box that has the closest IoU to a bounding box in a previous frame. The challenging part is to remove this bounding box from contention for other matches. This problem is known as the optimal cost assignment problem. The Hungarian algorithm is the most commonly used method to solve this problem, and is implemented in ```scipy.optimize.linear_sum_assignment```. Below, you will implement a function ```compute_assignment``` that will produce such a matching. 

Some notes to keep in mind: ```scipy```'s implementation uses costs, so you will need to use the negative of the similarities you computed in the previous part. Additionally, since the IoU of a bounding box with itself is 1, you will need to set the diagonal entries of the cost matrix to some high value so they are not picked. Finally, it's likely there will be matches with very low IoU scores. Please use the ```threshold``` parameter to filter out any matches that are below ```threshold``` IoU.

Please submit your code for this section to the writeup PDF, along with the output run on the IOU matrix computed between the 124th and 125th images.

In [9]:
from scipy.optimize import linear_sum_assignment

def compute_assignment(iou_matrix, threshold=0.0):
    """
    Given an input matrix of IoUs, uses the Hungarian algorithm to compute a matching.
    
    Args:
        iou_matrix: NxM matrix of IoUs of bounding boxes between two frames.
        threshold: float value representing minimum value of IoU to be considered as a candidate for matching.

    Returns:
        box1_ind, box2_ind : a set of indices into box1 (bboxes of ref frame) and corresponding 
        indices into box2 representing the optimal assignment.
    """
    iou_matrix[iou_matrix < threshold] = 0.0
    cost_mtx = np.max(iou_matrix) - iou_matrix

    box1_ind, box2_ind = linear_sum_assignment(cost_mtx)

    return box1_ind, box2_ind

### Q5.4 Putting it all together (20 pts)

Now, you will put all the pieces together that you implemented above to create a full tracking system. You will maintain a set of tracks throughout the video. At the beginning, no tracks will exist, only detections. In each successive frame, you will read in the detections for that frame and associate the new detections to the previous ones, and create candidate tracks. If a candidate track has persisted for P frames, you will add it to the list of tracks. If a track has not had a match in K frames, you will remove it from the list of tracks. For each frame, you will render the bounding boxes corresponding to every track. Once a track is removed from the list, do not render its bounding box. 

The hyperparameters you will use for the tracker are:

P: number of frames a candidate track must exist before it is added to the list of tracks
K: number of frames a track must have no match for before it is removed from the list of tracks
iou_thresh: threshold to be used for whether a match is strong enough to be added to a track.


For your submission for this part, please submit 10 images (with bounding boxes) from successive frames at any point in the video. Bounding boxes belonging to the same track should be the same color. Please also submit your code to the writeup PDF. 

Please note that the output will NOT be perfect! There should be ID switches and the tracker, even if implemented correctly, will likely fail in several frames. This should hopefully demonstrate to you that tracking is a tough problem and show why it is a hot research area today :) 

In [16]:
import numpy as np
import cv2



class Trk:
    def __init__(self, id, ind1, ind2):
        # History of associations in the form of idxs
        self.history = [ind1, ind2]
        self.id = id
        # Number of times missed
        self.missed_frames = 0
        self.color = np.random.randint(0, 255, size = 3)

    def get_missed_frames(self):
        return self.missed_frames

    def get_persistence(self):
        return len(self.history)

    def get_current_box_id(self):
        return self.history[-2]

    def get_color(self):
        return self.color

    def update(self, update_dict):
        ret = None
        last_idx = self.history[-1]
        if last_idx not in update_dict.keys():
            self.missed_frames += 1
        else: 
            next_idx = update_dict.get(last_idx)
            self.history.append(next_idx)
            self.missed_frames = 0
            # Let the caller know that we did associate this detection
            ret = last_idx
        return ret

def run_tracker(images, boxes, P, K, iou_thresh):
    """
    Runs the entire tracking pipeline. 

    Args:
        images: numpy array of N images. 
        boxes: list of bounding boxes for 
    """
    trks       = []
    candidates = []
    # Global variable for trk id
    trk_id = 0
    
    for idx, img in enumerate(images):
        # List of bounding boxes for the current frame
        boxes_frame = np.array(boxes[idx]["boxes"])

        # Compute the IoU Matrix
        iou_mat = compute_iou(np.array(boxes[idx]["boxes"]), np.array(boxes[idx+1]["boxes"]))

        # Assign bounding boxes to each other for tracking
        box1_ind, box2_ind = compute_assignment(iou_mat, iou_thresh)
        idx_mapping = dict(zip(box1_ind, box2_ind))

        # Go through existing candidate tracks and update them
        associated = np.zeros((boxes_frame.shape[0]))
        promotion  = np.zeros((len(candidates)))
        deletion_cand = np.zeros((len(candidates)))
        for idx, candidate in enumerate(candidates):
            # Update the candidate
            assoc_idx = candidate.update(idx_mapping)
            # Make a mark that we associated this detection
            if assoc_idx is not None:
                associated[assoc_idx] = 1

            if candidate.get_persistence() >= P:
                # Promote to full track and delete for candidates list
                promotion[idx] = 1
                deletion_cand[idx] = 1
            if candidate.get_missed_frames() >= K:
                deletion_cand[idx] = 1

       # Do promotion
        promotion_idxs = np.where(promotion==1)
        for idx in promotion_idxs[0]:
            trks.append(candidates[idx])

        # Do deletion for candidates
        candidate_delete_idxs = np.where(deletion_cand==1)
        if len(candidate_delete_idxs[0]):
            candidates = list(np.delete(candidates, candidate_delete_idxs))

        # Visit the full tracks
        deletion_trk = np.zeros((len(trks)))
        for idx, trk in enumerate(trks):
            assoc_idx = trk.update(idx_mapping)
            # Make a mark that we associated this detection
            associated[assoc_idx] = 1

            # Delete tracks if they are stale
            if trk.get_missed_frames() >= K:
                deletion_trk[idx] = 1

        # Do deletion for tracks
        trk_delete_idxs = np.where(deletion_trk == 1)
        if len(trk_delete_idxs[0]):
            trks = list(np.delete(trks, trk_delete_idxs))
        

        # For all that were not associated, create new candidates
        new_candidate_idxs = np.where(associated == 0)
        for idx in new_candidate_idxs[0]:
            if idx in idx_mapping.keys():
                candidates.append(Trk(trk_id, idx, idx_mapping[idx]))
            
        # Display the tracks
        trk_boxes = []
        colors    = []
        for trk in trks:
            if not trk.get_missed_frames() > 0:
                bbox_id = trk.get_current_box_id()
                trk_boxes.append(boxes_frame[bbox_id])
                color = trk.get_color()
                colors.append((int(color[0]), int(color[1]), int(color[2])))
        render_img = render_single_frame(img, trk_boxes, colors)
        cv2.destroyAllWindows()
        cv2.imshow("Rendered Image", render_img)
        cv2.waitKey(60)
    

# Part 1: Load the images and the boxes
images, boxes = load_images_and_boxes("data/soccer_images.npy", "data/soccer_boxes.json")
run_tracker(images, boxes, P = 5, K = 5, iou_thresh=0.5)

: 

: 