<a href="https://colab.research.google.com/github/omerahmed12345elhussien/Object_Tracker_using_Detectron2/blob/main/Object_Tracker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Detectron2 Lab 3 : Building an Object Tracker

<img src="https://dl.fbaipublicfiles.com/detectron2/Detectron2-Logo-Horz.png" width="500">

Welcome to detectron2!


## Install detectron2

In [None]:
!python -m pip install pyyaml==5.1
import sys, os, distutils.core
# Note: This is a faster way to install detectron2 in Colab, but it does not include all functionalities.
# See https://detectron2.readthedocs.io/tutorials/install.html for full installation instructions
!git clone 'https://github.com/facebookresearch/detectron2'
dist = distutils.core.run_setup("./detectron2/setup.py")
!python -m pip install {' '.join([f"'{x}'" for x in dist.install_requires])}
sys.path.insert(0, os.path.abspath('./detectron2'))

# Properly install detectron2. (Please do not install twice in both ways)
# !python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyyaml==5.1
  Downloading PyYAML-5.1.tar.gz (274 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m274.2/274.2 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyyaml
  Building wheel for pyyaml (setup.py) ... [?25l[?25hdone
  Created wheel for pyyaml: filename=PyYAML-5.1-cp310-cp310-linux_x86_64.whl size=44090 sha256=271e59acd2de625ad12b39bda09b90fffc0dda0915aad75b4261cb6d4b75e7ec
  Stored in directory: /root/.cache/pip/wheels/70/83/31/975b737609aba39a4099d471d5684141c1fdc3404f97e7f68a
Successfully built pyyaml
Installing collected packages: pyyaml
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 6.0
    Uninstalling PyYAML-6.0:
      Successfully uninstalled PyYAML-6.0
[31mERROR: pip's dependency resolver does not currently take into accoun

### Required packages

In [None]:
import torch, detectron2
!nvcc --version
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
print("detectron2:", detectron2.__version__)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
torch:  2.0 ; cuda:  cu118
detectron2: 0.6


In [None]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
import typing
import torchvision.ops
import os
import joblib
from torch import Tensor
from detectron2.utils.colormap import random_color


## Part A: Download a video clip

Download a small video clip of 41 frames from: https://github.com/gkioxari/aims2020_visualrecognition/releases/download/v1.0/videoclip.zip

In [None]:
# download, decompress the video clip.
!wget https://github.com/gkioxari/aims2020_visualrecognition/releases/download/v1.0/videoclip.zip
!unzip videoclip.zip > /dev/null

--2023-05-14 09:38:35--  https://github.com/gkioxari/aims2020_visualrecognition/releases/download/v1.0/videoclip.zip
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/255177940/09ad9d80-7f47-11ea-93bc-002a89d4791c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230514%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230514T093835Z&X-Amz-Expires=300&X-Amz-Signature=a2d9ad6efde1e4738a4a9412ffc0a10ac0363a83217b56fa4ed46ab3284d0cf9&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=255177940&response-content-disposition=attachment%3B%20filename%3Dvideoclip.zip&response-content-type=application%2Foctet-stream [following]
--2023-05-14 09:38:35--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/255177940/09ad9d80-7f47-11ea-93bc-002a

##Part B: Object tracker class

In [None]:
class Object_Tracker:

  def __init__(self,folder_path:str = None, model_thresh:float=0.5,overlap_thresh:float=0.8,start_idx:int=0,end_idx:int=10):
    """
    The class constructor.
    Input:
    - folder_path: The folder path for our images.
    - model_thresh: A threshold of the model for the tasks of object detection and instance segmentation.
    - overlap_thresh: A threshold value we set to compare two overlapping boxes.
    - start_idx: Index of the first frame we need to consider in our implementation.
    - end_idx: Index of the last frame we need to consider in our implementation.
    Example: We design our code with a default value of start_idx=0, end_idx=10, which will take the first 10 frames.
    While, changing the values start_idx and end_idx will give us access to several parts in our folder since we built the
    code on more than 10 frames.
    """
    self.folder_path=folder_path
    self.model_thresh=model_thresh
    self.overlap_thresh=overlap_thresh
    self.start_idx=start_idx
    self.end_idx=end_idx
    pass

  def __read_files(self):
    """
    __read_files() reads the files in the self.folder_path.
    It creates a class attribute self.images_path: which consists of the names of the files sorted in ascending order.
    """
    assert type(self.folder_path)==str, "The folder path should be string."
    # folder path
    dir_path = repr(self.folder_path)[1:-1]
    # list to store files
    images_path = []
    # Iterate directory
    for path in os.listdir(dir_path):
      if os.path.isfile(os.path.join(dir_path, path)):
        images_path.append(os.path.join(dir_path, path))
    #Sor the files ascending
    images_path.sort()
    assert len(images_path)>=(self.end_idx-self.start_idx), "The default length is 10, please change self.start_idx and self.end_idx for your current size."
    assert self.start_idx>=0 and type(self.start_idx)==int and self.end_idx>1 and type(self.end_idx)==int and self.end_idx>(self.start_idx+1)
    self.images_path=images_path[self.start_idx:self.end_idx]

  def __model_prediction(self):
    """
    __model_prediction(): implements the bounding box and instance segmentation tasks relying on R-CNN model with
    a ResNet-101-FPN backbone pre-trained on the COCO dataset. We made the model fixed for the simplicity of our
    implementation.
    It creates class attributes: self.cfg, and self.results.
    """
    assert self.model_thresh>0 and self.model_thresh<1 and type(self.model_thresh)==float, "model threshold should be in the range (0,1)"
    #Initialize the results list to stor the outputs of our predictions.
    self.results=[]
    for image1 in self.images_path:
      im = cv2.imread(image1)
      self.cfg = get_cfg()
      # project-specific config
      self.cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
      self.cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = self.model_thresh  # threshold for this model
      # Set the weights
      self.cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml")
      predictor = DefaultPredictor(self.cfg)
      outputs = predictor(im)
      self.results.append(outputs)

  def matching_score(self,P: dict,Q: dict):
    """
    matching_score() implements the scoring function between prediction P and Q. It considers two prediction are similar if they are
    from the same class and that their overlapping boxes value is more than self.overlap_thresh.
    It creates a class attribute self.indexes, which consists of the indexes of the matches in P and Q.

    """
    assert self.overlap_thresh>0 and self.overlap_thresh<1 and type(self.overlap_thresh)==float, "overlap threshold should be in the range (0,1)"
    #Initialize the index lists for P and Q.
    idx_P,idx_Q=[],[]
    for idx1,valu1 in enumerate(P["instances"].pred_boxes):
      max_list=torch.zeros(Q["instances"].pred_boxes.tensor.size(0))
      for idx2, valu2 in enumerate(Q["instances"].pred_boxes):
        #Check the condition that P_i and Q_i are from the same class and they have overlapping boxes with value more than self.overlap_thresh
        if (P["instances"].pred_classes[idx1]==Q["instances"].pred_classes[idx2]) and (torchvision.ops.box_iou(valu1.view(1,-1),valu2.view(1,-1)).item()>self.overlap_thresh ):
          #Compute IoU for P_i and Q_i
          max_list[idx2]=torchvision.ops.box_iou(valu1.view(1,-1),valu2.view(1,-1)).item()
        else:
          max_list[idx2]=0
      if torch.max(max_list)!=0:
        idx_Q.append(torch.argmax(max_list).item())
        idx_P.append(idx1)
    self.indexes=torch.tensor([idx_P,idx_Q])

  def __set_colors(self,val_1:int,val_2:int,idxs:Tensor,base_color:list=None)->tuple:
    """
    __set_colors() is used to set the color for our frames.
    Input:
    - val_1: The number of objects in frame 1.
    - val_2: The number of objects in frame 2.
    - idxs: The indexes for the current frame, which is one row of self.indexes.
    - base_color: The color of the previous frame. We use it, since we may have different number of
    objects in the two frames.
    """
    if base_color:
      #The case that frame 1 has more objects than frame 2.
      if max(val_1,val_2)==val_1:
        #Set the colors for frame 2.
        assig_color_min=[random_color(rgb=True, maximum=1)  for _ in range(val_2)]
        for i_1,i_2 in zip(idxs[0],idxs[1]):
          #Change the colors of our trackers in frame 2 relying on base_color.
          assig_color_min[i_2.item()]=base_color[i_1.item()]
        return base_color,assig_color_min
      else:
        #Set the colors for frame 2.
        assig_color_max=[random_color(rgb=True, maximum=1)  for _ in range(val_2)]
        for i_1,i_2 in zip(idxs[0],idxs[1]):
          #Change the colors of our trackers in frame 2 relying on base_color.
          assig_color_max[i_2.item()]=base_color[i_1.item()]
        return assig_color_max,base_color

    else:
      #Set the colors for frame 1 and 2.
      assig_color_max=[random_color(rgb=True, maximum=1)  for _ in range(max(val_1,val_2))]
      assig_color_min=assig_color_max[0:min(val_1,val_2)].copy()
      #The case when we don't have trackers.
      if idxs.size(0)==0:
        return assig_color_max,assig_color_min
      #The case when frame 1 is larger than frame 2.
      elif max(val_1,val_2)==val_1:
        for i,j in zip(idxs[0],idxs[1]):
          assig_color_min[j.item()]=assig_color_max[i.item()]

        return assig_color_max,assig_color_min
      #The case when frame 1 is smaller than frame 2
      else:
        for i,j in zip(idxs[0],idxs[1]):
          assig_color_min[i.item()]=assig_color_max[j.item()]

        return assig_color_max,assig_color_min

  def __update_labels(self,labels:list, idxs:Tensor,pass_label:list =None, pass_track:list =None, count:int=None,part_two:bool=False)->tuple:
    """
    __update_labels() is used to update the labels of our frame.
    Ipute:
    - labels: Our standard label without changing the name for some of them to track1, track2, etc.
    - idxs: The indexes for the current frame, which is one row of self.indexes.
    - pass_label: The labels of the previous frame.
    - pass_track: The indexes of the previous frame similar to idxs.
    - count: The number of trackers we have so far.
    - part_two: Is the current update for the second frame in the same matching, or is it for two frames in different matches.
    """
    #The case when there is no matches between the two frames.
    if idxs.size(0)==0:
      return labels,count
    elif pass_label:
      #The same number of trackers for two different matching.
      if idxs.size(0)==pass_track.size(0):
        for i,j in zip(idxs,pass_track):
          #The case when we have the same tracker in the next comparison.
          if (i==j) and part_two==False:
            labels[i.item()]=pass_label[j.item()]
          #The case when the tracker idex changed.
          elif (i!=j) and part_two==False:
            labels[i.item()]="track"+ str(count+1)
            count+=1
          #The case of part_two=True, where we only copy from pass_label
          else:
            labels[i.item()]=pass_label[j.item()]
        return labels,count
      #The case where we have more or less trackers compared with the passed number of trackers.
      else:
        for i in idxs:
          if i in pass_track:
            labels[i.item()]=pass_label[i.item()]
          else:
            labels[i.item()]="track"+ str(count+1)
            count+=1
        return labels,count
    #The first initialization.
    elif count==0:
      tracker_label= ["track"+ str(s) for s in range(1,1+idxs.size(0))]
      saved_count=idxs.size(0)
      for id,value in enumerate(idxs):
        labels[value.item()]=tracker_label[id]
      return labels,saved_count
    pass
  def visualizer_fun(self,frame_1: dict, frame_2: dict, image_path_idx, pass_color:list =None, pass_track:Tensor = None, pass_label:list =None,count:int=None)->tuple:
    """
    visualizer_fun() implements tracking objects for two frames.
    Input:
    - frame_1: The first frame prediction.
    - frame_2: The second frame prediction.
    - image_path_idx: The index of the original image for frame 1, and frame 2.
    - pass_color: Passed color from the previous comparison of our current frame 1 and another frame.
    - pass_track: The indexes of frame 1, when compared with another frame in the previous comparison.
    - pass_label: The labels of frame 1, when compared with another frame in the previous comparison.
    - count: The number of trackers that we have so far.
    """
    #Compute matching score.
    self.matching_score(frame_1,frame_2)
    #Determine the number of objects.
    val_1=frame_1["instances"].pred_boxes.tensor.size(0)
    val_2=frame_2["instances"].pred_boxes.tensor.size(0)
    assig_color_max,assig_color_min=self.__set_colors(val_1,val_2,self.indexes,pass_color)

    count1=0
    for (outputs, track,idx) in  zip([frame_1,frame_2], self.indexes,image_path_idx):
      if(outputs["instances"].pred_boxes.tensor.size(0)==len(assig_color_max)):
        color_assig=assig_color_max
      else:
        color_assig=assig_color_min
      #Initialize the standard labels.
      labels1=[MetadataCatalog.get(self.cfg.DATASETS.TRAIN[0]).thing_classes[c]+" "+ "{:.0f}%".format(s * 100)  for c,s in zip(outputs["instances"].pred_classes,outputs["instances"].scores)]
      if count1==0:
        labels1,saved_count=self.__update_labels(labels1,track, pass_label,pass_track,count)
        old_labels=labels1
        old_track=track
        count1+=1
      else:
        labels1,saved_count=self.__update_labels(labels1,track,old_labels,old_track,saved_count,True)
      im = cv2.imread(self.images_path[idx])
      v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(self.cfg.DATASETS.TRAIN[0]), scale=1.2)
      #image object.
      out = v.overlay_instances(boxes=outputs["instances"].pred_boxes.to("cpu"),
            labels=labels1,
            masks=outputs["instances"].pred_masks.to("cpu"),
            keypoints=None,
            assigned_colors=color_assig,
            alpha=0.5,)
      cv2_imshow(out.get_image()[:, :, ::-1])

    return color_assig,labels1, track, saved_count
  def visuali_fun_videos(self):
    """
    visuali_fun_videos() track objects for a long sequence of frames rather than two frames.
    """
    #Read the files
    self.__read_files()
    #Make model prediction
    self.__model_prediction()
    for i,j in zip(range(len(self.results)-1),range(self.start_idx,self.end_idx-1)):
      if i==0:
        color_assig,tracker_label, track, saved_count=self.visualizer_fun(self.results[i],self.results[i+1],[j,j+1],count=0)
      else:
        color_assig,tracker_label, track,saved_count=self.visualizer_fun(self.results[i],self.results[i+1],[j,j+1],color_assig,track,tracker_label,saved_count)



We got our results in the report using: model_thresh=0.5,overlap_thresh=0.8.

In [None]:
#To get the same results of the report.
ot=Object_Tracker("/content/clip/")
ot.visuali_fun_videos()

## Documentation (How to implement)

When you upload your zipped folder, make sure the files are named in this way(00.jpg, 01.jpg, 02.jpg, .etc) since we sort them during our implementation (00.jpg: means the first frame, 01.jpg: means the second frame, and so on).

Steps to follow during implementation:

1- Starts the notebook by installing detectron first. Then, run the remaining cells in part (Install detectron2), for importing required packages.

2- In part (B), run the cell of Object_Tracker class.

3- To get the same results as we got in our report, you can run the cell for downolading videoclip.zip in part(A) with the cell after the class.

4- In the below two cells: change the name of (file_name.zip) to the name of the file you upload; then, copy the path of your unzipped folder. Change the name of (folder_path) with the path you copied.

5- Run the two cells.

Note: these are parameters that can be changed in the class object (model_thresh=0.5,overlap_thresh=0.8,start_idx=0,end_idx=10).

- For instance, if you upload a file with more than 10 images, the code as default will consider the first ten images. You can expand the code to work on all of them or some of them by changing start_idx and end_idx.
- overlap_thresh: can be changed in the range of (0,1) with the default value 0.8.

- model_thresh: can  be changed which is the threshold score we pass to our model; default value is 0.5.

In [None]:
#Change the file_name.zip with the name of the folder you upload.
!unzip file_name.zip > /dev/null

In [None]:
#Add your folder path here.
folder_path="/your_folder_path/folder_name/"
ot=Object_Tracker(folder_path)
ot.visuali_fun_videos()