## Reference

* [Simple Object Tracking with OpenCV](https://pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/)
* [SSD 수행하기 - OpenCV DNN 모듈](https://junha1125.github.io/blog/artificial-intelligence/2020-08-16-SSD2OpenCV/)

## Introduction

* Phase 1. Detecting (with the Centroid Tracking Algorithm)
    - During the detection phase we are running our computationally more expensive object detector to (1) detect if new objects have entered our view, and (2) see if we can find objects that were “lost” during the tracking phase. For each detected object we create or update an object tracker with the new bounding box coordinates. Since our object detector is more computationally expensive we only run this phase once every N frames.


* Phase 2. Tracking (with a MobileNet Single Shot Detector (SSD))
    - When we are not in the “detecting” phase we are in the “tracking” phase. For each of our detected objects, we create an object tracker to track the object as it moves around the frame. Our object tracker should be faster and more efficient than the object detector. We’ll continue tracking until we’ve reached the N-th frame and then re-run our object detector. The entire process then repeats.

<table>
    <tr>
        <td><img src="https://pyimagesearch.com/wp-content/uploads/2018/07/simple_object_tracking_step1.png" width="300"></td>
        <td><img src="https://pyimagesearch.com/wp-content/uploads/2018/07/simple_object_tracking_step2.png" width="300"></td>
    </tr>
    <tr>
        <td><img src="https://pyimagesearch.com/wp-content/uploads/2018/07/simple_object_tracking_step3.png" width="300"></td>
        <td><img src="https://pyimagesearch.com/wp-content/uploads/2018/07/simple_object_tracking_step4.png" width="300"></td>
    </tr>
</table>

In [1]:
import numpy as np
import time
import cv2

from scipy.spatial import distance as dist

import easydict
from collections import OrderedDict

## Centroid Tracker

### Motivating Example

In [3]:
np.random.seed(123)

In [4]:
# old: there are two existing objects
objectCentroids = np.random.uniform(size=(2,2))
objectCentroids

array([[0.69646919, 0.28613933],
       [0.22685145, 0.55131477]])

In [5]:
# new: three objects are detected
inputCentroids = np.random.uniform(size=(3,2))
inputCentroids

array([[0.71946897, 0.42310646],
       [0.9807642 , 0.68482974],
       [0.4809319 , 0.39211752]])

In [6]:
D = dist.cdist(objectCentroids, inputCentroids)
D

array([[0.13888478, 0.489671  , 0.24018263],
       [0.50902789, 0.76564396, 0.29983435]])

In [7]:
rows = D.min(axis=1).argsort()
rows

array([0, 1])

In [8]:
cols = D.argmin(axis=1)[rows]
cols

array([0, 2])

In [9]:
list(zip(rows, cols))

[(0, 0), (1, 2)]

- D[0,0] implies that the first existing object will be matched with the first input centroid.
- D[1,2] implies that the second existing object will be matched with the thrid input centroid.

### Implementation

In [2]:
class CentroidTracker():
    def __init__(self, maxDisappeared=30):
        self.nextObjectID = 0
        self.objects = OrderedDict() # centroid
        self.disappeared = OrderedDict() # number of consecutive frames marked as disappeared
        
        # store the number of maximum consecutive frames a given object is allowed
        # to be marked as "disappeared" -> deregister the object from tracking
        self.maxDisappeared = maxDisappeared
        
    def register(self, centroid):
        self.objects[self.nextObjectID] = centroid
        self.disappeared[self.nextObjectID] = 0
        self.nextObjectID += 1
        
    def deregister(self, objectID):
        del self.objects[objectID]
        del self.disappeared[objectID]
        
    def update(self, rects):
        # if the list of input bounding box rectangles is empty
        if len(rects) == 0:
            for objectID in list(self.disappeared.keys()):
                self.disappeared[objectID] += 1
                
                if self.disappeared[objectID] > self.maxDisappeared:
                    self.deregister(objectID)
                    
            return self.objects
            
        # initialize an array of input centroids for the current frame
        # and loop over the bounding box rectangles
        inputCentroids = np.zeros((len(rects), 2), dtype="int")
        for (i, (startX, startY, endX, endY)) in enumerate(rects):
            cX = int((startX + endX) / 2.0)
            cY = int((startY + endY) / 2.0)
            inputCentroids[i] = (cX, cY)
            
        # when currently not tracking any objects -> register
        if len(self.objects) == 0:
            for i in range(0, len(inputCentroids)):
                self.register(inputCentroids[i])
        # otherwise
        else:
            objectIDs = list(self.objects.keys())
            objectCentroids = list(self.objects.values())
            
            # compute the distance between each pair of object centorids and input centroids
            # and find the smallest value
            print("objects:", self.objects); print("\n")
            print("exist:" ,objectCentroids); print("\n") #############################################
            print("new:", inputCentroids); print("\n")
            D = dist.cdist(np.array(objectCentroids), inputCentroids)
            print("D:", D); print("\n\n")
            rows = D.min(axis=1).argsort()
            cols = D.argmin(axis=1)[rows]
            
            # 
            usedRows, usedCols = set(), set()
            for (row, col) in zip(rows, cols):
                if row in usedRows or col in usedCols:
                    continue
                
                objectID = objectIDs[row]
                self.objects[objectID] = inputCentroids[cols]
                self.disappeared[objectID] = 0
                
                usedRows.add(row)
                usedCols.add(col)
            
            #
            unusedRows = set(range(0, D.shape[0])).difference(usedRows)
            unusedCols = set(range(0, D.shape[1])).difference(usedCols)
            
            # if the number of object centroids is equal or greater than the number of input centroids
            # check if some of these objects have potentially disappeared
            if D.shape[0] >= D.shape[1]:
                for row in unusedRows:
                    objectID = objectIDs[row]
                    self.disappeared[objectID] += 1
                    
                    if self.disappeared[objectID] > self.maxDisappeared:
                        self.deregister(objectID)
            
            else:
                for col in unusedCols:
                    self.register(inputCentroids[col])
                    
        return self.objects

In [3]:
ct = CentroidTracker()

* MobileNet SSD (https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API)

In [31]:
net = cv2.dnn.readNetFromTensorflow("./MobileNetSSD_v3_large_coco/frozen_inference_graph.pb",
                                    "./MobileNetSSD_v3_large_coco/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt")

In [None]:
# cap = cv2.VideoCapture("../video_input.mp4")
cap = cv2.VideoCapture("../video_input_02.mp4")
(H, W) = (None, None)
confidence = 0.4

# look over the frames from the video
while True:
    ret, frame = cap.read()
    if not ret or cv2.waitKey(1)>=0:
        break
    
    frame = cv2.resize(frame, (416,416))
    
    if W is None or H is None:
        (H, W) = frame.shape[:2]
    
    blob = cv2.dnn.blobFromImage(frame, 0.8, (W, H), (104.0, 177.0, 123.0), swapRB=True, crop=False) # mean-R, mean-G, mean-B
#     blob = cv2.dnn.blobFromImage(frame, 0.9, (W, H), (0, 0, 0), swapRB=True, crop=False) # mean-R, mean-G, mean-B
    net.setInput(blob)
    detections = net.forward()
    rects = []
    
    # loop over the detections
    for i in range(0, detections.shape[2]):
        # over the confidence & person only
        if detections[0, 0, i, 2] > confidence and detections[0, 0, i, 1] == 1:
#         if detections[0, 0, i, 2] > confidence:
            box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
            rects.append(box.astype("int"))
            # draw a bounding box surrounding the object
            (startX, startY, endX, endY) = box.astype("int")
            cv2.rectangle(frame, (startX, startY), (endX, endY), (0, 255, 0), thickness=2)
    
#     objects = ct.update(rects)
    
    cv2.imshow("frame", frame)

cap.release()
cv2.destroyAllWindows()

* YOLO (https://github.com/arunponnusamy/object-detection-opencv)

In [38]:
net = cv2.dnn.readNet("./yolo_v3/yolov3.weights","./yolo_v3/yolov3.cfg.txt")
def get_output_layers(net):
    layer_names = net.getLayerNames()
    try:
        output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
    except:
        output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
    return output_layers

In [59]:
# cap = cv2.VideoCapture("../video_input.mp4")
cap = cv2.VideoCapture("../video_input_02.mp4")
(H, W) = (None, None)
confidence = 0.3

# look over the frames from the video
while True:
    ret, frame = cap.read()
    if not ret or cv2.waitKey(1)>=0:
        break
    
    frame = cv2.resize(frame, (416,416))
    
    if W is None or H is None:
        (H, W) = frame.shape[:2]
    
#     blob = cv2.dnn.blobFromImage(frame, 0.8, (W, H), (104.0, 177.0, 123.0), swapRB=True, crop=False) # mean-R, mean-G, mean-B
    blob = cv2.dnn.blobFromImage(frame, 0.04, (W, H), (0, 0, 0), swapRB=True, crop=False)
    net.setInput(blob)
    detections = net.forward(get_output_layers(net))
    rects = []
    confidences = []
    
    for out in detections:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            if scores[class_id] > confidence:
                center_x = int(detection[0] * W)
                center_y = int(detection[1] * H)
                w = int(detection[2] * W)
                h = int(detection[3] * H)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
#                 box = np.array([x,y,w,h])
#                 rects.append(box.astype("int"))
                rects.append([x,y,w,h])
                confidences.append(float(scores[class_id]))
#                 cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), thickness=2)
    
    for i in range(len(rects)):
        if i in cv2.dnn.NMSBoxes(rects, confidences, 0.4, 0.4):
            x, y, w, h = rects[i]
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), thickness=2)
    
    
#     objects = ct.update(rects)
    
    cv2.imshow("frame", frame)

cap.release()
cv2.destroyAllWindows()

In [40]:
old = [np.array([316,130]), np.array([315,146])]
old

[array([316, 130]), array([315, 146])]

In [52]:
new = np.array([[316,130], [315,146]])
new

array([[316, 130],
       [315, 146]])

In [53]:
dist.cdist(np.array(old), new)

array([[ 0.        , 16.03121954],
       [16.03121954,  0.        ]])

In [57]:
old = [np.array([[316,130], [315,146]]), np.array([[316,130], [315,146]])]
np.array(old)

array([[[316, 130],
        [315, 146]],

       [[316, 130],
        [315, 146]]])

In [58]:
new = np.array([[314,129]])
new

array([[314, 129]])

In [None]:
dist.cdist(np.array(old), new)

## Creating a trackable object

In [1]:
class TrackableObject:
    def __init__(self, objectID, centroid):
        # store the object ID and initialize a list of centroid location history
        self.objectID = objectID
        self.centroids = [centroid]
        
        # check if the object has already been counted or not
        self.counted = False