# Object tracking using CamShift

- [CamShift](https://docs.opencv.org/4.x/d7/d00/tutorial_meanshift.html) is an algorihm for tracking objects in a sequence of frames **based on its color distribution**.
- It assumes that
  1. we know object's current position (e.g. at the start of the video),
  2. the object of interest has different color distribution than the background
- The tracking is based on [histogram backprojection](https://docs.opencv.org/4.x/dc/df6/tutorial_py_histogram_backprojection.html). 
  - In short, every pixel in the input image is replaced with its probability under the object's histogram.
  - The histogram is computed from the hue component of the [HSV color space](https://en.wikipedia.org/wiki/HSL_and_HSV) extracted from the object.
- The backprojection step produces a "heatmap" (see OpenCV documentation above) image in which
  - *high values* mean that object likely is present, since they correspond to colors that are part of the object (and therefore its histogram),
  - *low values* mean that object is likely not present, since they correspond to background colors and those are, by assumption, different from the object.
- The object's position is updated by [Mean shift](https://en.wikipedia.org/wiki/Mean_shift) algorithm.
  - The idea is to find the "center of gravity", i.e. the average coordinate of pixels in the current object bounding box, where each coordinate's contribution to the average is weighted by the backprojection at that position.
  - The center of gravity $(x_c, y_c)$ computation is based on calculating zeroth and first order moments according to the formula
    $$
    m_{k,l} = \sum_{i=i_1}^{i_2}{ \sum_{j=j_1}^{j_2}{ \textrm{bp}(i,j)\cdot i^k\cdot j^l } }
    $$
    so that
    $$
    x_c = \frac{m_{0,1}}{m_{0,0}} \qquad y_c = \frac{m_{1,0}}{m_{0,0}}
    $$
    where
    - $\textrm{bp}(i,j)$ is the histogram backprojection image
    - $j_1, i_1, j_2, i_2$ are the coordinates of the object bounding box in the "xyxy" format (top-left and bottom-right corners).
- When we know the *new* center of gravity $(x_c, y_c)$, we shift the object's bounding box coordinates $j_1, i_1, j_2, i_2$ based on the difference from the *previous* center of gravity $(x_c^\textrm{prev}, y_c^\textrm{prev})$, e.g. for $j_1$
  $$
  j_1 \leftarrow j_1 + x_c - x_c^\textrm{prev} \\
  \ldots
  $$

  <figure class="image">
  <img src="../figures/camshift-expected_output.png" alt="" style="width: 6.4in;"/>
  <figcaption>Figure 1: Expected output in the first frame of the video. The green rectangle denotes initial bounding box provided by the user.</figcaption>
</figure>

In [1]:
import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# Task 1: implement the `init_camshift`

- The function takes as argument an BGR image (the entire frame from the video).
- It will
  1. extract a region of interest (ROI) based on the initial coordinates provided by the user,
  2. convert the ROI BGR imgae to HSV,
  3. compute and return histogram from the hue component of the ROI.
- The initial object's position is represented as a quadruple of 4 numbers `x1, y1, x2, y2`, in which
  - `x1, y1` is the top-left corner,
  - `x2, y2` is the bottom-right corner of the bounding box.

In [2]:
def init_camshift(
    bgr: np.ndarray,
    xyxy: tuple[float, float, float, float]
) -> np.ndarray:
    ########################################
    # TODO: implement

    raise NotImplementedError

    # ENDTODO
    ########################################

# Task 2: implement the `camshift_step`

- The `camshift_step` implements single step of the CamShift algorithm.
- It has four inputs
  1. `bgr` is the next frame of the video in the BGR format,
  2. `hist` is the object's hue histogram obtained by `init_camshift` in the first step,
  3. `xyxy` is the current object's position represented as bounding box coordinates in the "xyxy" format,
  4. `steps` is the number of times the mean shift should repeat between each pair of video frames.
- It will return one output
  1. `xyxy`, which will represent the updated object's position.
- The function should
  1. extract ROI from the input BGR image,
  2. backproject the object's histogram onto the ROI to produce the "heatmap" mentioned above,
  3. calculate the heatmap's center of gravity,
  4. update the position based on how it differs from the previous center of gravity (the previous center of gravity will simply be the center of the bounding box passed into the function).
  5. be aware of image borders and prevent exceptions raising from invalid image coordinates.

In [3]:
def camshift_step(
    bgr: np.ndarray,
    hist: np.ndarray,
    xyxy: tuple[float, float, float, float],
    steps: int = 1
) -> tuple[float, float, float, float]:
    ########################################
    # TODO: implement

    raise NotImplementedError

    # ENDTODO
    ########################################

    return xyxy

# Run the tracking loop

- If the functions above are implemented correctly, you can run the following code and it should succefully track the object.
- You only need to provide an initial position for the object. It should be a bounding box covering the object entirely with as few background pixels as possible.
- The code will probably not work in Google colab or other cloud services. You need to run it locally due to OpenCV's way of displaying images.
- If you use Google colab or similar, replace OpenCV plotting-related parts with Matplotlib. The code will not be interactive anymore, however.

In [None]:
cap = cv.VideoCapture('../data/cup.mp4')

x1, x2 = ...
y1, y2 = ...

ret, bgr = cap.read()
h = init_camshift(bgr, ...)
h

In [None]:
with sns.axes_style(style='darkgrid'):
    plt.plot(h);

In [6]:
# There should be no need to modify the following code

cap = cv.VideoCapture('../data/cup.mp4')

x1, x2 = 280, 380
y1, y2 = 160, 295
box = x1, y1, x2, y2

ret, bgr = cap.read()
h = init_camshift(bgr, box)

try:
    while True:
        ret, bgr = cap.read()
        if not ret:
            break
        
        box = camshift_step(bgr, h, box, steps=3)
        
        # Draw the tracking results
        j1, i1, j2, i2 = [int(0.5 + v) for v in box]
        cv.line(bgr, (j1, i1), (j2, i1), (0, 255, 0))
        cv.line(bgr, (j2, i1), (j2, i2), (0, 255, 0))
        cv.line(bgr, (j2, i2), (j1, i2), (0, 255, 0))
        cv.line(bgr, (j1, i2), (j1, i1), (0, 255, 0))
        cv.imshow('camshift', bgr)

        key = cv.waitKey(0)  # hit any key to continue to the next frame
        if key == 27:  # hit escape to break
            break
finally:
    cap.release()
    cv.destroyAllWindows()