<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Quickstart: Web Cam Multi-Object Tracking


Multi-Object tracking is the canonical computer vision task of associating multiple detected objects from one frame to another within a video or a sequence of images.

This notebook shows a simple example of loading a pretrained tracking model for multi-object tracking from a webcam stream using the `torchvision` package. In particular, we will use FairMOT,a one-shot multi-object tracking model, which jointly detect objects and learn their re-ID features, developed by MSR Asia and others in this [repo](https://github.com/ifzhang/FairMOT).

To understand the basics of Multi-Object-Tracking, please visit our #TODO [FAQ](FAQ.md).  For more details about the underlying technology of object tracking tasks, including finetuning, please see our [training introduction notebook](01_training_introduction.ipynb).

## Prerequisite for Webcam example 
This notebook assumes you have **a webcam** connected to your machine.  We use the `ipywebrtc` module to show the webcam widget in the notebook. Currently, the widget works on **Chrome** and **Firefox**. For more details about the widget, please visit `ipywebrtc` [github](https://github.com/maartenbreddels/ipywebrtc) or [documentation](https://ipywebrtc.readthedocs.io/en/latest/).

## Initialization

In [3]:
# Regular Python libraries
import io
import os
import sys
import time
import urllib.request
import matplotlib.pyplot as plt

# IPython
import scrapbook as sb
from ipywebrtc import CameraStream, ImageRecorder, VideoStream, VideoRecorder
from ipywidgets import HBox, Layout, widgets, Widget, VBox
from ipywidgets import Video #KIP
# Image
from PIL import Image

# TorchVision
import torchvision
from torchvision import transforms as T

# KIP: utils_cv
sys.path.append("../../")
from utils_cv.multi_object_tracking.display_with_bb import process_video, process_images
from utils_cv.multi_object_tracking.file_format import func_extractBBoxes_fromXML, func_convertLTRB_toXYWH
from utils_cv.multi_object_tracking.convert_seq_vid import vid_to_seq, seq_to_vid
from utils_cv.multi_object_tracking.baseline import baseline_algorithm

# utils_cv
# sys.path.append("../../")
from utils_cv.common.data import data_path
from utils_cv.common.gpu import which_processor, is_windows
from utils_cv.detection.data import coco_labels
from utils_cv.detection.model import DetectionLearner
from utils_cv.detection.plot import PlotSettings, plot_boxes

# Change matplotlib backend so that plots are shown for windows
if is_windows():
    plt.switch_backend('TkAgg')

print(f"TorchVision: {torchvision.__version__}")
which_processor()

TorchVision: 0.4.0a0+6b959ee
Torch is using GPU: Tesla K80


This shows your machine's GPUs (if it has any) and the computing device `torch/torchvision` is using.

In [2]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

# Load Pretrained Model

We first load our  tracking model.
#TODO add about FairMOT model architecture, detection and re-id head).
#TODO add about dataset pretrained on.

In [3]:
model_path = "./model/baseline_pedestrian" #TODO: check with Casey, or tell user to download from FairMOT
model_path = "./model/finetuned_fridgeObjects"

 Next, we pass our loaded tracking model into our TrackingLearner object

In [4]:
tracker = TrackingLearner(
    load_model=model_path   
)

# Object Tracking

## From Video File

To illustrate, a simple example of tracking objects, we use a video of a person drinking. 

In [4]:
# Download an example video file
VID_URL = "https://cvbp.blob.core.windows.net/public/datasets/action_recognition/drinking.mp4"
vid_path = os.path.join(data_path(), "example_vid.mp4")
urllib.request.urlretrieve(VID_URL, vid_path)

from ipywidgets import Video
video = Video.from_file(vid_path)
video

Video(value=b'\x00\x00\x00 ftypisom\x00\x00\x02\x00isomiso2avc1mp41\x00\x00\x00\x08free\x00RP\xa8mdat\x00\x00\…

Using the tracker's predict() method, we ask the tracking model to detect objects on each frame in this video, and associate them from one frame to another. 

Using the `predict()` method on each image in the sequence, we ask the model to detect how many objects and what they are on each image. The method returns annotation boxes that contains the bounding boxes around the identified objects, as well as the id number, whereby each new object acquires a new id number.


In [None]:
tracks = tracker.predict(vid_path)

For example, frame 50, 51, 52 of the video are as follows:

In [None]:
frame_list=[50, 51, 52]

for frame_i in frame_list:
    print(tracks[frame_i]) 
    plot_boxes(frame_number,video, tracks.get_frame(frame_i)["tracked_bboxes"], plot_settings=PlotSettings(rect_color=(0, 255, 0))) 

We now put overlay the generated tracking results onto the video. 

In [None]:
results_video_path = convert_trackingbbox_video(tracking_results, frame_rate) #TODO: ask about using ffmeg as cmd_str or other means, currently cv2

video = Video.from_file(results_video_path)
video

## From WebCam Stream

Now, we use a WebCam stream for object detection. We use `ipywebrtc` to start a webcam and get the video stream which is sent to the notebook's widget. Note that Jupyter widgets are quite unstable - if the widget below does not show then see the "Troubleshooting" section in this [FAQ](../classification/FAQ.md) for possible fixes. 

In [5]:
# Webcam for object detection for reference, was not working, added the #KIP lines, might use and edit it to enable "live" tracking
w_cam = CameraStream(
    constraints={
        'facing_mode': 'user',
        'audio': False,
        'video': { 'width': 200, 'height': 200 }
    },
    layout=Layout(width='200px')
)
# Image recorder for taking a snapshot
w_imrecorder = ImageRecorder(stream=w_cam, layout=Layout(padding='0 0 0 50px'))
# Label widget to show our object detection results
w_im = widgets.Image(layout=Layout(width='200px'))
output = widgets.Output() #KIP

@output.capture() #KIP

def detect_frame(_):
    """ Detect objects on an image snapshot by using a pretrained model
    """
    # Once capturing started, remove the capture widget since we don't need it anymore
    if w_imrecorder.layout.display != 'none':
        w_imrecorder.layout.display = 'none'
        
    try:
        # Get the image and convert to RGB
        im = Image.open(io.BytesIO(w_imrecorder.image.value)).convert('RGB')
        
        # Process the captured image
        detections = detector.predict(im)
        plot_boxes(im, detections["det_bboxes"], plot_settings=PlotSettings(rect_color=(0, 255, 0)))
        
        # Convert the processed image back into the image widget for display
        f = io.BytesIO()
        im.save(f, format='png')
        w_im.value = f.getvalue()
        
    except OSError:
        # If im_recorder doesn't have valid image data, skip it. 
        pass
    
    # Taking the next snapshot programmatically
    w_imrecorder.recording = True

# Register detect_frame as a callback. Will be called whenever image.value changes. 
w_imrecorder.image.observe(detect_frame, 'value')

In [6]:
# Show widgets
HBox([w_cam, w_imrecorder, w_im, output]) #KIP
#HBox([w_cam, w_imrecorder, w_im])

HBox(children=(CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 200, 'heigh…

In [7]:
# Webcam for video object tracking
w_cam = CameraStream(
    constraints={
        'facing_mode': 'user',
        'audio': False,
        'video': { 'width': 200, 'height': 200 }
    },
    layout=Layout(width='200px')
)
# Image recorder for taking a snapshot
w_imrecorder = VideoRecorder(stream=w_cam, layout=Layout(padding='0 0 0 50px')) #KIP
#w_imrecorder = ImageRecorder(stream=w_cam, layout=Layout(padding='0 0 0 50px'))
# Label widget to show our object detection results
w_im = widgets.Image(layout=Layout(width='200px'))

video_path = convert_trackingbbox_video(tracking_results, frame_rate) #TODO: ask about using ffmeg as cmd_str or other means, currently cv2

video = Video.from_file(video_path)
video

In [8]:
# Show widgets
#HBox([w_cam, w_imrecorder, w_im, output])
HBox([w_cam, w_imrecorder, w_im])#, tracked_video])

HBox(children=(CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 200, 'heigh…

Now, click the **capture button** in the widget above to start recording of the video you want using your webcam. 

Next, run the following script to run the tracker on the recorded video. In the resulting video, bounding boxes are displayed to show the objects detected by the model, and the id number indicates the unique tracked object.

In [None]:
# Save the captured video
webcam_video_path = './webcam/example_webcam.mp4'
w_imrecorder.save(webcam_video_path)

# Run inference on captured video
tracking_results = tracker.predict(webcam_video_path)

# Generate video from tracking results
results_video_path = convert_trackingbbox_video(tracking_results, frame_rate) #TODO: ask about using ffmeg as cmd_str or other means, currently cv2

video = Video.from_file(results_video_path)
video

# Conclusion
In this notebook, we used a simple example to demonstrate how to use a pretrained tracking model to detect and track objects on across frames in a video sequence. The model,being trained on pedestrian dataset, is limited to only detect and track humans #TODO: check what final model . In the [training introduction notebook](01_training_introduction.ipynb), we will learn how to fine-tune a model on our own data.

In [11]:
# Stop the model and webcam
Widget.close_all()