<div style="width: 100%; clear: both;">
<div style="float: left; width: 50%;">
<img src="http://www.uoc.edu/portal/_resources/common/imatges/marca_UOC/UOC_Masterbrand.jpg", align="left">
</div>
<div style="float: right; width: 50%;">
<p style="margin: 0; padding-top: 22px; text-align:right;">M0.532 · Pattern Recognition</p>
<p style="margin: 0; text-align:right;">Computational Engineering and Mathematics Master</p>
<p style="margin: 0; text-align:right; padding-button: 100px;">Computers, Multimedia and Telecommunications Department</p>
</div>
</div>
<div style="width:100%;">&nbsp;</div>

In this notebook, we will see how to use the [YOLACT](https://github.com/dbolya/yolact) model for object tracking and video object segmentation. This model does not only predicts the bounding box of the object being tracked but also the instance segmentation mask at every frame. YOLACT is caracterized by its efficiency, allowing real-time instance segmentation in videos. This notebook is based on this [notebook](https://colab.research.google.com/github/tugstugi/dl-colab-notebooks/blob/master/notebooks/YOLACT.ipynb)


You need to have GPU activated (Change runtime type) for running this notebook.

First of all, we import some basic modules and clone the github repository of YOLACT.

In [None]:
import os
from os.path import exists, join, basename, splitext

git_repo_url = 'https://github.com/dbolya/yolact.git'
project_name = splitext(basename(git_repo_url))[0]
if not exists(project_name):
  # clone and install dependencies
  !git clone -q --depth 1 {git_repo_url}
  !pip install -q youtube-dl
  
import sys
sys.path.append(project_name)

from IPython.display import YouTubeVideo

[?25l[K     |▏                               | 10 kB 18.2 MB/s eta 0:00:01[K     |▍                               | 20 kB 24.1 MB/s eta 0:00:01[K     |▌                               | 30 kB 13.0 MB/s eta 0:00:01[K     |▊                               | 40 kB 9.9 MB/s eta 0:00:01[K     |▉                               | 51 kB 5.5 MB/s eta 0:00:01[K     |█                               | 61 kB 6.1 MB/s eta 0:00:01[K     |█▏                              | 71 kB 5.9 MB/s eta 0:00:01[K     |█▍                              | 81 kB 6.6 MB/s eta 0:00:01[K     |█▌                              | 92 kB 5.0 MB/s eta 0:00:01[K     |█▊                              | 102 kB 5.4 MB/s eta 0:00:01[K     |██                              | 112 kB 5.4 MB/s eta 0:00:01[K     |██                              | 122 kB 5.4 MB/s eta 0:00:01[K     |██▎                             | 133 kB 5.4 MB/s eta 0:00:01[K     |██▍                             | 143 kB 5.4 MB/s eta 0:00:01[K  

Then, we download the pretrained YOLACT model.

In [None]:
def download_from_google_drive(file_id, file_name):
  # download a file from the Google Drive link
  !rm -f ./cookie
  !curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id=$file_id" > /dev/null
  confirm_text = !awk '/download/ {print $NF}' ./cookie
  confirm_text = confirm_text[0]
  !curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm=$confirm_text&id=$file_id" -o $file_name
  
pretrained_model = 'yolact_resnet50_54_800000.pth'
if not exists(pretrained_model):
  download_from_google_drive('1yp7ZbbDwvMiFJEq4ptVKTYTI2VeRDXl0', pretrained_model)

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   408    0   408    0     0   1980      0 --:--:-- --:--:-- --:--:--  1980
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  121M  100  121M    0     0  51.2M      0  0:00:02  0:00:02 --:--:-- 79.0M


We download the video that we will use for inference. We decide to use the same video as in the previous notebooks (the one we used for SiamMask) to compare the results obtained with both models on the same image.

In [None]:
!wget https://www.bogotobogo.com/python/OpenCV_Python/images/mean_shift_tracking/slow_traffic_small.mp4

--2021-12-20 10:44:33--  https://www.bogotobogo.com/python/OpenCV_Python/images/mean_shift_tracking/slow_traffic_small.mp4
Resolving www.bogotobogo.com (www.bogotobogo.com)... 173.254.30.214
Connecting to www.bogotobogo.com (www.bogotobogo.com)|173.254.30.214|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2018126 (1.9M) [video/mp4]
Saving to: ‘slow_traffic_small.mp4’


2021-12-20 10:44:34 (12.3 MB/s) - ‘slow_traffic_small.mp4’ saved [2018126/2018126]



We run the YOLACT model on the given image with the eval.py script.

In [None]:
!rm -rf youtube.mp4 input.mp4
#!ffmpeg -y -loglevel panic -i youtube.mp4 -t 20 input.mp4

!cd {project_name} && python eval.py --trained_model=../{pretrained_model} --score_threshold=0.3 --top_k=100 --video=../slow_traffic_small.mp4:../pre_output.mp4
# encode with FFMPEG otherwise can't embedd in colab!
!ffmpeg -y -loglevel panic -i pre_output.mp4 output.mp4

Config not specified. Parsed yolact_resnet50_config from the file name.

  " but it is a non-constant {}. Consider removing it.".format(name, hint))
  " but it is a non-constant {}. Consider removing it.".format(name, hint))
  " but it is a non-constant {}. Consider removing it.".format(name, hint))
Loading model... Done.
Initializing model... Done.

Processing Frames  ██████████████████████████████    914 /    914 (100.00%)     7.88 fps        


Finally, we visualize the output video with the predictions.

In [None]:
def show_local_mp4_video(file_name, width=640, height=480):
  import io
  import base64
  from IPython.display import HTML
  video_encoded = base64.b64encode(io.open(file_name, 'rb').read())
  return HTML(data='''<video width="{0}" height="{1}" alt="test" controls>
                        <source src="data:video/mp4;base64,{2}" type="video/mp4" />
                      </video>'''.format(width, height, video_encoded.decode('ascii')))

show_local_mp4_video('output.mp4')

Output hidden; open in https://colab.research.google.com to view.