# <strong style="color: tomato;">Object Tracking</strong> $\color{blue}{\text{}}$
---

Object Tracking Section Goals
- Learn basic object tracking techniques
    - Optical Flow
    - MeanShift and CamShift
- Understand more advanced tracking
    - Review Built-in Tracking APIs


## <span style="color: yellowgreen;">1. </span>Optical flow.

### <span style="color: royalblue;">a) </span>Introduction

Let’s begin discussing object tracking by learning about optical flow. Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of object or camera.

Optical Flow Analysis has a few assumptions:
- The pixel intensities of an object do not change between consecutive frames.
- Neighbouring pixels have similar motion.

The optical flow methods in OpenCV will first take in a given set of points and a frame. Then it will attempt to find those points in the next frame. It is up to the user to supply the points to track.

Consider the following image where we display a five frame clip of a ball moving up and towards the right. Note that given just this clip, we can not determine if the ball is moving, or if the camera moved down and to the left! Using OpenCV we pass in the previous frame, previous points and the current frame to the **Lucas-Kanade function**. The function then attempts to locate the points in the current frame.

The Lucas-Kanade computes optical flow for a **sparse** feature set - meaning only the points it was told to track. But what if we wanted to track all the points in a video? We can use **Gunner Farneback’s algorithm** (also built in to OpenCV) to calculate **dense** optical flow. This **dense** optical flow will calculate flow for all points in an image. It will color them black if no flow (no movement) is detected.

Check out the [resource links](https://en.wikipedia.org/wiki/Optical_flow) for full descriptions and publication links for these two algorithms!
- Note: Requires strong linear algebra skills to understand the math behind the methods.

### <span style="color: royalblue;">b) </span>Part two - Coding Lucas-Kanade function 

<sup>(tracking sparse points - pick few points aand track the m throughout the video):</sup>

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Setting up parameters for Shi-Tomasi Corner Detection:

In [None]:
# Parameters for ShiTomasi corner detection (good features to track paper)
corner_track_params = dict(maxCorners = 10,
                           qualityLevel = 0.3,
                           minDistance = 7,
                           blockSize = 7)

**Parameters for Lucas Kanade Optical Flow**:

Detect the motion of specific points or the aggregated motion of regions by modifying the winSize argument. This determines the integration window size. Small windows are more sensitive to noise and may miss larger motions. Large windows will “survive” an occlusion.

The integration appears smoother with the larger window size.

criteria has two here - the max number (10 above) of iterations and epsilon (0.03 above). More iterations means a more exhaustive search, and a smaller epsilon finishes earlier. These are primarily useful in exchanging speed vs accuracy, but mainly stay the same.

When maxLevel is 0, it is the same algorithm without using pyramids (ie, calcOpticalFlowLK). Pyramids allow finding optical flow at various resolutions of the image. 

In [None]:
# initialize Lucas-Kanade parameters dictionary
lk_params = dict(winSize=(200, 200), # large window => sensitive to noise; small => can capture small motions
                 maxLevel=2, # level 2 => 1/4 resolution
                 criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03)) # we are providing 2 criteria: max num of iterations (criteria count) = 10; epsilon = 0.03 smaller epsilon => finnish earlier; they regulate speed vs accuracy of tracking

Grab the image from the camera:

In [None]:
cap = cv2.VideoCapture(0)

# Grab the very first frame of the stream
ret, previous_frame = cap.read()
prev_gray = cv2.cvtColor(previous_frame, cv2.COLOR_BGR2GRAY)

# which points we want to track using a goodFeaturesToTrack and the corner[...] dictionary
prevPts = cv2.goodFeaturesToTrack(prev_gray, mask=None, **corner_track_params)

# mask for displaying the actual points and drawing the lines, just for visualisation
mask = np.zeros_like(previous_frame) # create the array of zeros with the same size as the given image

while True:
    ret, frame = cap.read()
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # calculate the optical flow on the grayscale frame
    # previous frame; current frame; points from the previous frame that we want to find in the next frame; None nextPts, but we want to find them so we can not pass them; lk params
    nextPts, status, err = cv2.calcOpticalFlowPyrLK(prev_gray, frame_gray, prevPts, None, **lk_params)

    # use the returned status array; it outputs a status vector with each element set to 1 if the flow of the corresponding features has been found otherwise it is set to 0
    good_new = nextPts[status == 1]
    good_prev = prevPts[status == 1]

    # Use ravel to get points to draw lines and circles
    for i, (new, prev) in enumerate(zip(good_new, good_prev)):
        # NumPy method 
        x_new, y_new = new.ravel()
        x_prev, y_prev = prev.ravel()

        mask = cv2.line(mask, (x_new, y_new), (x_prev, y_prev), (0, 255, 0), 3)

        frame = cv2.circle(frame, (x_new, y_new), 8, (0, 0, 255), -1)

    # Display the image along with the mask we drew the line on.
    img = cv2.add(frame, mask)
    cv2.imshow('tracking', img)

    k = cv2.waitKey(30) & 0xFF
    if k == 27:
        break
    
    # Now update the previous frame and previous points
    prev_gray = frame_gray.copy()
    prevPts = good_new.reshape(-1, 1, 2)

cap.release()
cv2.destroyAllWindows()

### <span style="color: royalblue;">c) </span>Part three - dense optical flow

Currently we have a flow object containing vector flow cartesian information. We want to convert this into polar coordinates to magnitude and angle. Currently we have a flow object containing vector flow information.

alcOpticalFlowFarneback(prev, next, flow, pyr_scale, levels, winsize, iterations, poly_n, poly_sigma, flags) -> flow

This function computes a dense optical flow using the Gunnar Farneback's algorithm.

Here are the parameters for the function and what they represent:
* prev first 8-bit single-channel input image.
* next second input image of the same size and the same type as prev.
* flow computed flow image that has the same size as prev and type CV_32FC2.
* pyr_scale parameter, specifying the image scale (\<1) to build pyramids for each image
    * pyr_scale=0.5 means a classical pyramid, where each next layer is twice smaller than the previous one.
    
* levels number of pyramid layers including the initial image; levels=1 means that no extra layers are created and only the original images are used.
* winsize averaging window size
    * larger values increase the algorithm robustness to image
* noise and give more chances for fast motion detection, but yield more blurred motion field.
* iterations number of iterations the algorithm does at each pyramid level.
* poly_n size of the pixel neighborhood used to find polynomial expansion in each pixel
    * larger values mean that the image will be approximated with smoother surfaces, yielding more robust algorithm and more blurred motion field, typically poly_n =5 or 7.
* poly_sigma standard deviation of the Gaussian that is used to smooth derivatives used as a basis for the polynomial expansion; for poly_n=5, you can set poly_sigma=1.1, for poly_n=7, a good value would be poly_sigma=1.5.

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
cap = cv2.VideoCapture(0)

# Grab the very first frame of the stream
ret, frame1 = cap.read(0)
prvsImg = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

# hsv mask
hsv_mask = np.zeros_like(frame1)
hsv_mask[:, :, 1] = 255

while True:
    ret, frame2 = cap.read()

    nextImg = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    flow = cv2.calcOpticalFlowFarneback(prvsImg, nextImg, None, 0.5, 3, 15, 3, 5, 1.2, 0)

    # Color the channels based on the angle of travel
    # Pay close attention to your video, the path of the direction of flow will determine color!
    # x and y coordinates for every vector of every pixel in the image
    mag, ang = cv2.cartToPolar(flow[:, :, 0], flow[:, :, 1], angleInDegrees = True)
    hsv_mask[:, :, 0] = ang / 2
    hsv_mask[:, :, 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)

    # Convert back to BGR to show with imshow from cv
    bgr = cv2.cvtColor(hsv_mask, cv2.COLOR_HSV2BGR)
    cv2.imshow('frame', bgr)

    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break
    
    # Set the Previous image as the next iamge for the loop
    prvsImg = nextImg

cap.release()
cv2.destroyAllWindows()

## <span style="color: yellowgreen;">2. </span>MeanShift and CAMShift tracking.

### <span style="color: royalblue;">a) </span>MeanShift:

Some of the most basic tracking methods are MeanShift and CAMShift.

Let’s first describe the general MeanShift algorithm, then learn how to apply it for image tracking. Afterwards we will learn how to extend the MeanShift into CAMShift (Continuously Adaptive MeanShift)

Imagine we have a set of points and we wanted to assign them into clusters. We take all our data points and stack red and blue points on them. (You can’t see the red points underneath). The direction to the closest cluster centroid is determined by where most of the points nearby are at. So each iteration each blue point will move closer to where the most points are at, which is or will lead to the cluster center. The red and blue datapoints overlap completely in the first iteration before the Meanshift algorithm starts. At the end of iteration 1, all the blue points move towards the clusters. Here it appears there will be either 3 or 4 clusters. The bottom clusters have begun to reach convergence. MeanShift found 3 clusters by the third iteration. After subsequent iterations, the cluster means have stopped moving. All clusters have converged and there is no more movement.

It won’t always detect what may be more “reasonable”. It may have been more reasonable to detect 4 clusters in the previous situation. MeanShift can be given a target to track, calculate the color histogram of the target area, and then keep sliding the tracking window to the closest match (the cluster center).

Just using MeanShift won’t change the window size if the target moves away or towards the camera. We can use CAMshift to update the size of the window.

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
cap = cv2.VideoCapture(0)

ret, frame = cap.read()

# FACE TRACKING
# object detection to grab the face location, treat it as a bunch of pixels and apply MeanShift tracking on that face
face_cascade = cv2.CascadeClassifier('../Computer-Vision-with-Python/DATA/haarcascades/haarcascade_frontalface_default.xml')
face_rects = face_cascade.detectMultiScale(frame)

# Convert this list of a single array to a tuple of (x,y,w,h) - just tracking the 1st face
(face_x, face_y, w, h) = tuple(face_rects[0])
track_window = (face_x, face_y, w, h)

# ROI setup
roi = frame[face_y:face_y+h, face_x:face_x+w]

# hsv color mapping
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)

# find a histogram to backproject the target on each frame in order to calculate that MeanShift
roi_hist = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])

# normalize the histogram to array values given a min of 0 and max of 255
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# setting up the termination criteria - either 10 iteration or move by at least 1 pt
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

while True:
    ret, frame = cap.read()

    if ret:
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

        # calculate the backprojection based of the roi_hist that we created 
        dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

        # Apply meanshift to get the new coordinates of the rectangle
        ret, track_window = cv2.meanShift(dst, track_window, term_crit)

        # draw a new rectangle on the image based of the the new updated track_window
        x, y, w, h = track_window
        img2 = cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 5)

        cv2.imshow('tracking', img2)

        k = cv2.waitKey(1) & 0xFF

        if k == 27:
            break

    else:
        break

cap.release()
cv2.destroyAllWindows()

### <span style="color: royalblue;">b) </span>CAMShift:

In [None]:
cap = cv2.VideoCapture(1)

ret, frame = cap.read()

# FACE TRACKING
# object detection to grab the face location, treat it as a bunch of pixels and apply MeanShift tracking on that face
face_cascade = cv2.CascadeClassifier('../Computer-Vision-with-Python/DATA/haarcascades/haarcascade_frontalface_default.xml')
face_rects = face_cascade.detectMultiScale(frame) 

# Convert this list of a single array to a tuple of (x,y,w,h) - just tracking the 1st face
(face_x, face_y, w, h) = tuple(face_rects[0])
track_window = (face_x, face_y, w, h)

# ROI setup
roi = frame[face_y:face_y+h, face_x:face_x+w]

# hsv color mapping
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)

# find a histogram to backproject the target on each frame in order to calculate that MeanShift
roi_hist = cv2.calcHist([hsv_roi], [0], None, [180], [0, 180])

# normalize the histogram to array values given a min of 0 and max of 255
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# setting up the termination criteria - either 10 iteration or move by at least 1 pt
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

while True:
    ret, frame = cap.read()

    if ret:
        hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

        # calculate the backprojection based of the roi_hist that we created 
        dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

    ######################################################################################################

        # Apply meanshift to get the new coordinates of the rectangle
        ret, track_window = cv2.CamShift(dst, track_window, term_crit)

        # draw a new rectangle on the image based of the points
        pts = cv2.boxPoints(ret)
        pts = np.int0(pts)

        img2 = cv2.polylines(frame, [pts], True, (0, 0, 255), 5)

    ######################################################################################################

        cv2.imshow('tracking', img2)

        k = cv2.waitKey(1) & 0xFF

        if k == 27:
            break

    else:
        break

cap.release()
cv2.destroyAllWindows()

## <span style="color: yellowgreen;">3. </span>Tracking APIs.

There are many Object Tracking methods. Fortunately, many have been designed as simple API calls with OpenCV. Let’s explore a few of these easy to use Object Tracking APIs.

BOOSTING TRACKER:
- Based off AdaBoost algorithm (the same underlying algorithm that the HAAR Cascade based Face Detector Used).
- Evaluation occurs across multiple frames.
- Pros: 
    - Very well known and studied algorithm.
- Cons:
    - Doesn’t know when tracking has failed.
    - Much better techniques available!

MIL TRACKER:
- Multiple Instance Learning
- Similar to BOOSTING, but considers a neighborhood of points around the current location to create multiple instances.
- Check project page for more details.
- Pros: 
    - Good performance and doesn’t drift as much as BOOSTING.
- Cons:
    - Failure to track an object may not be reported back.
    - Can’t recover from full obstruction.

KCF TRACKER:
- Kernelized Correlation Filters 
- Exploits some properties of the MIL Tracker and the fact that many data points will overlap, leading to more accurate and faster tracking.
- Pros: 
    - Better than MIL and BOOSTING.
    - Great first choice!
- Cons:
    - Can not recover from full obstruction of object.

TLD TRACKER:
- Tracking, Learning, and Detection
- The tracker follows the object from frame to frame. 
- The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. 
- The learning estimates detector’s errors and updates it to avoid these errors in the future.
- Pros: 
    - Good at tracking even with obstruction in frames.
Tracks well under large changes in scale.
- Cons:
    - Can provide many false positives.

MedianFlow TRACKER:
- Internally, this tracker tracks the object in both forward and backward directions in time and measures the discrepancies between these two trajectories. 
- Pros: 
    - Very good at reporting failed tracking.
    - Works well with predictable motion.
- Cons:
    - Fails under large motion (fast moving objects)

In [None]:
import cv2

In [None]:
def ask_for_tracker():
    print("Welcome! What Tracker API would you like to use?")
    print("Enter 0 for BOOSTING: ")
    print("Enter 1 for MIL: ")
    print("Enter 2 for KCF: ")
    print("Enter 3 for TLD: ")
    print("Enter 4 for MEDIANFLOW: ")
    choice = input("Please select your tracker: ")
    
    if choice == '0':
        tracker = cv2.TrackerBoosting_create()
    if choice == '1':
        tracker = cv2.TrackerMIL_create()
    if choice == '2':
        tracker = cv2.TrackerKCF_create()
    if choice == '3':
        tracker = cv2.TrackerTLD_create()
    if choice == '4':
        tracker = cv2.TrackerMedianFlow_create()


    return tracker

In [None]:
tracker = ask_for_tracker()
tracker

In [None]:
str(tracker).split()[0][1:]

In [None]:
tracker = ask_for_tracker()
tracker_name = str(tracker).split()[0][1:]

# Read video
cap = cv2.VideoCapture(0)

# Read first frame.
ret, frame = cap.read()


# Special function allows us to draw on the very first frame our desired ROI
roi = cv2.selectROI(frame, False)

# Initialize tracker with first frame and bounding box
ret = tracker.init(frame, roi)

while True:
    # Read a new frame
    ret, frame = cap.read()
    
    
    # Update tracker
    success, roi = tracker.update(frame)
    
    # roi variable is a tuple of 4 floats
    # We need each value and we need them as integers
    (x,y,w,h) = tuple(map(int,roi))
    
    # Draw Rectangle as Tracker moves
    if success:
        # Tracking success
        p1 = (x, y)
        p2 = (x+w, y+h)
        cv2.rectangle(frame, p1, p2, (0,255,0), 3)
    else :
        # Tracking failure
        cv2.putText(frame, "Failure to Detect Tracking!!", (100,200), cv2.FONT_HERSHEY_SIMPLEX, 1,(0,0,255),3)

    # Display tracker type on frame
    cv2.putText(frame, tracker_name, (20,400), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255,0),3);

    # Display result
    cv2.imshow(tracker_name, frame)

    # Exit if ESC pressed
    k = cv2.waitKey(1) & 0xff
    if k == 27 : 
        break
        
cap.release()
cv2.destroyAllWindows()