# Global Tracking Transformers

<img align="center" src="https://github.com/xingyizhou/GTR/raw/master/docs/GTR_teaser.jpg" width="800">

This is a colab demo of using GTR (**G**lobal **Tr**acking Transformers). We will use the pretrained GTR models to run global tracking on an example video.

This demo is modified from the [detectron2 colab tutorial](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5).

You can make a copy of this tutorial by "File -> Open in playground mode" and make changes there. __DO NOT__ request access to this tutorial.


In [1]:
# Install detectron2
import torch
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
# Install detectron2 that matches the above pytorch version
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
# !pip install detectron2==0.6 -f https://dl.fbaipublicfiles.com/detectron2/wheels/$CUDA_VERSION/torch$TORCH_VERSION/index.html
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

torch:  1.12 ; cuda:  cu113
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/facebookresearch/detectron2.git
  Cloning https://github.com/facebookresearch/detectron2.git to /tmp/pip-req-build-cf4r1fsq
  Running command git clone -q https://github.com/facebookresearch/detectron2.git /tmp/pip-req-build-cf4r1fsq
Collecting yacs>=0.1.8
  Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Collecting fvcore<0.1.6,>=0.1.5
  Downloading fvcore-0.1.5.post20220512.tar.gz (50 kB)
[K     |████████████████████████████████| 50 kB 3.3 MB/s 
[?25hCollecting iopath<0.1.10,>=0.1.7
  Downloading iopath-0.1.9-py3-none-any.whl (27 kB)
Collecting omegaconf>=2.1
  Downloading omegaconf-2.2.2-py3-none-any.whl (79 kB)
[K     |████████████████████████████████| 79 kB 6.2 MB/s 
[?25hCollecting hydra-core>=1.1
  Downloading hydra_core-1.2.0-py3-none-any.whl (151 kB)
[K     |████████████████████████████████| 151 kB 52.5 MB/s 
[?2

In [2]:
# clone and install GTR
!git clone https://github.com/xingyizhou/GTR.git --recurse-submodules
%cd GTR
!pip install -r requirements.txt

Cloning into 'GTR'...
remote: Enumerating objects: 166, done.[K
remote: Counting objects: 100% (56/56), done.[K
remote: Compressing objects: 100% (35/35), done.[K
remote: Total 166 (delta 31), reused 21 (delta 21), pack-reused 110[K
Receiving objects: 100% (166/166), 2.67 MiB | 15.55 MiB/s, done.
Resolving deltas: 100% (44/44), done.
Submodule 'third_party/CenterNet2' (https://github.com/xingyizhou/CenterNet2) registered for path 'third_party/CenterNet2'
Cloning into '/content/GTR/third_party/CenterNet2'...
remote: Enumerating objects: 13921, done.        
remote: Counting objects: 100% (945/945), done.        
remote: Compressing objects: 100% (98/98), done.        
remote: Total 13921 (delta 886), reused 850 (delta 845), pack-reused 12976        
Receiving objects: 100% (13921/13921), 5.08 MiB | 15.71 MiB/s, done.
Resolving deltas: 100% (10472/10472), done.
Submodule path 'third_party/CenterNet2': checked out '8745e012e4dbdf560ac2f27e0b771d4907ad4aaf'
/content/GTR
Looking in inde

In [3]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import sys
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog

# GTR libraries
sys.path.insert(0, 'third_party/CenterNet2/')
from centernet.config import add_centernet_config
from gtr.config import add_gtr_config
from gtr.predictor import GTRPredictor, TrackingVisualizer

In [4]:
# Download models
import gdown
!mkdir models/
gdown.download("https://drive.google.com/u/1/uc?id=1TqkLpFZvOMY5HTTaAWz25RxtLHdzQ-CD", "models/GTR_TAO_DR2101.pth")

Downloading...
From: https://drive.google.com/u/1/uc?id=1TqkLpFZvOMY5HTTaAWz25RxtLHdzQ-CD
To: /content/GTR/models/GTR_TAO_DR2101.pth
100%|██████████| 512M/512M [00:02<00:00, 211MB/s]


'models/GTR_TAO_DR2101.pth'

In [5]:
# Build the detector and download our pretrained weights
cfg = get_cfg()
add_centernet_config(cfg)
add_gtr_config(cfg)
cfg.merge_from_file("configs/GTR_TAO_DR2101.yaml")
cfg.MODEL.WEIGHTS = 'models/GTR_TAO_DR2101.pth'
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# cfg.MODEL.DEVICE='cpu' # uncomment this to use cpu-only mode.
metadata = MetadataCatalog.get(
    cfg.DATASETS.TEST[0] if len(cfg.DATASETS.TEST) else "__unused")
predictor = GTRPredictor(cfg)
tracker_visualizer = TrackingVisualizer(metadata)

[32m[08/09 08:58:13 d2.checkpoint.c2_model_loading]: [0mFollowing weights matched with model:
| Names in Model                                          | Names in Checkpoint                                                                                               | Shapes                                          |
|:--------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------|:------------------------------------------------|
| backbone.bottom_up.res2.0.bns.0.*                       | backbone.bottom_up.res2.0.bns.0.{bias,running_mean,running_var,weight}                                            | (26,) (26,) (26,) (26,)                         |
| backbone.bottom_up.res2.0.bns.1.*                       | backbone.bottom_up.res2.0.bns.1.{bias,running_mean,running_var,weight}                                            | (26,) (26,) (26,) (26,)                         |


Some model parameters or buffers are not found in the checkpoint:
[34mroi_heads.box_predictor.0.{fed_loss_cls_weights, freq_weight}[0m
[34mroi_heads.box_predictor.1.{fed_loss_cls_weights, freq_weight}[0m
[34mroi_heads.box_predictor.2.{fed_loss_cls_weights, freq_weight}[0m
The checkpoint state_dict contains keys that are not used by the model:
  [35mroi_heads.pos_emb.weight[0m


In [6]:
# Functions to load and same videos
import imageio
from IPython.core.display import Video
from IPython.display import display
def show_video(filename, frames, fps=5):
    imageio.mimwrite(
        filename, [x[..., ::-1] for x in frames], fps=fps)
    display(Video(filename, embed=True))

def _frame_from_video(video):
    while video.isOpened():
        success, frame = video.read()
        if success:
            yield frame
        else:
            break

In [10]:
!pip install imageio-ffmpeg

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting imageio-ffmpeg
  Downloading imageio_ffmpeg-0.4.7-py3-none-manylinux2010_x86_64.whl (26.9 MB)
[K     |████████████████████████████████| 26.9 MB 66.6 MB/s 
[?25hInstalling collected packages: imageio-ffmpeg
Successfully installed imageio-ffmpeg-0.4.7


In [11]:
# Load images from video
video_path = 'docs/yfcc_v_acef1cb6d38c2beab6e69e266e234f.mp4'
video = cv2.VideoCapture(video_path)
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
frames_per_second = video.get(cv2.CAP_PROP_FPS)
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
basename = os.path.basename(video_path)
codec, file_ext = "mp4v", ".mp4"
# codec = 'H264'
frames = [x for x in _frame_from_video(video)]
video.release()
show_video('input.mp4', frames)

In [12]:
# Run model
outputs = predictor(frames)

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


In [None]:
# Post processing and save output video
def _process_predictions(tracker_visualizer, frame, predictions):
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    predictions = predictions["instances"].to('cpu')
    vis_frame = tracker_visualizer.draw_instance_predictions(
        frame, predictions)
    vis_frame = cv2.cvtColor(vis_frame.get_image(), cv2.COLOR_RGB2BGR)
    return vis_frame

out_frames = []
for frame, instances in zip(frames, outputs):
    out_frame = _process_predictions(tracker_visualizer, frame, instances)
    out_frames.append(out_frame)

show_video('output.mp4', out_frames)

In [None]:
# Download results
# from google.colab import files
# files.download('output.mp4')