# Tracking the patient

---
## Setup
*   If you have not run the `FRCNN.ipynb` notebook, that's the first step in the preprocessing pipeline, please run that before.
*   This setup is just imports and result extraction from the previous stage.
---

### CoLab setup

In [None]:
ROOT = '/content/drive'
from google.colab import drive
drive.mount(ROOT)

Mounted at /content/drive


### Imports

In [None]:
import os
import sys
import cv2
import json
import torch
import shutil
import subprocess
import numpy as np
import collections
from tqdm import tqdm
from os.path import join


SORT_PATH = "/content/sort/"
FINAL_PATH = "/content/final/"
JSONS_PATH = "/content/jsons/"
VIDEO_PATH = "/content/videos/"
FRAMES_PATH = "/content/frames/"
RAW_VIDEOS_PATH = "/content/drive/MyDrive/ataxia_dataset/"
# use CUDA if available
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

#### Extract output from previous phase

In [None]:
# extract frames
shutil.copy(RAW_VIDEOS_PATH + "frames.zip", "/content/")
!unzip frames.zip
!mv content/frames/ /content/

# extract jsons
shutil.copy(RAW_VIDEOS_PATH + "bboxes.zip", "/content/")
!unzip bboxes.zip
!mv content/jsons/ /content/

---
## 3. Object ID Tracking with SORT
*   Simple Online and Realtime Tracking (SORT) algorithm for object ID tracking
*   Quite fast (can be run on a CPU runtime), takes about 20m.
---

In [None]:
# Git clone: SORT Algorithm
!git clone https://github.com/abewley/sort.git
sys.path.append(SORT_PATH)

Cloning into 'sort'...
remote: Enumerating objects: 208, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (4/4), done.[K
remote: Total 208 (delta 2), reused 1 (delta 1), pack-reused 203 (from 2)[K
Receiving objects: 100% (208/208), 1.20 MiB | 19.28 MiB/s, done.
Resolving deltas: 100% (74/74), done.


In [None]:
# download requirements for SORT
!cd "$SORT_PATH"; pip install -r requirements.txt
!cd

Collecting filterpy==1.4.5 (from -r requirements.txt (line 1))
  Downloading filterpy-1.4.5.zip (177 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/178.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.0/178.0 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting scikit-image==0.17.2 (from -r requirements.txt (line 2))
  Downloading scikit-image-0.17.2.tar.gz (29.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.8/29.8 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[?25h  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... [?25l[

In [None]:
# Optional: if error occurs, you might need to re-install scikit-image, imgaug and filterpy

# !pip install filterpy
# !pip uninstall scikit-image -y
# !pip uninstall imgaug -y
# !pip install imgaug
# !pip install -U scikit-image

import skimage
print(skimage.__version__)

In [None]:
!export MPLBACKEND=Agg # NOTE TKAgg doesn't work

At this point, the next cell will give an error with TKAgg, you must modify the **23rd** line in `/content/sort/sort.py` from TKAgg to Agg.

In [None]:
from sort import *

example = join(JSONS_PATH, '000.json')
with open(example) as data_file:
   data = json.load(data_file)
odata = collections.OrderedDict(sorted(data.items()))
print(f"For video 000, we have {len(odata)} frames")

For video 000, we have 180 frames


---

*   We can't easily track each patient as is because of the blurring (this is also reported in https://github.com/ROC-HCI/Automated-Ataxia-Gait)
*   In fact, we do not even need the exact bbox, once the first frame has a majority of the patient, OpenPose can track them
*   We do not normalize the height like in the Auto-Gait paper
*   Additionally, because of the same blurring the video is out-of-distribution for most models, so we keep the full height, else it becames too pixeled
*   Finally, some images have no people detected by the FRCNN, we use the previous frame's predictions for this

---

In [None]:
# Making new directory for saving results
!mkdir "$FINAL_PATH"
progress = [] # in case of errors

In [None]:
for vid_path in tqdm(sorted(os.listdir(FRAMES_PATH))):
  # Create a tracker using SORT Algorithm
  mot_tracker = Sort()
  # vid is like xxx
  cur_save_path = join(FINAL_PATH, vid_path)
  # this becomes /content/final/xxx: we will save all tracked objects inside this
  # folder for the xxx video in separate folders
  if not os.path.exists(cur_save_path):
    os.mkdir(cur_save_path)
  elif vid_path in progress:
    print(f"Already processed {vid_path}, skipping...")
    continue
  # load preds for current video
  odata = json.load(open(join(JSONS_PATH, vid_path + ".json")))
  odata = collections.OrderedDict(sorted(odata.items()))
  # book-keeping variables
  heights = {}
  IDs = set()
  first_frame_widths = {}
  result = None
  # key = frame_num
  for key in sorted(odata.keys()):
    arrlist = []
    # load the image
    det_img = cv2.imread(os.path.join(FRAMES_PATH, vid_path, key))
    # load the predictions for this image (bbox, labels and score)
    tmp_res = odata[key]
    if len(tmp_res) == 0:
      print(f"Empty prediction at frame {key} in video {vid_path}, setting to previous prediction.")
      # do not update the result variable
    else:
      result = tmp_res

    # run the tracker update
    for info in result:
      bbox = info['bbox']
      # labels = info['labels'], labels are useless for us, if this was a general purpose
      # thing then maybe useful, but right now we have already filtered for humans
      scores = info['scores']
      # this is the format that MoT expects
      templist = bbox+[scores]
      arrlist.append(templist) # in the genral case, we would filter by labels here

    # update the tracker with this new frame info.
    track_bbs_ids = mot_tracker.update(np.array(arrlist))

    for j in range(track_bbs_ids.shape[0]):
      xy_xy_label = track_bbs_ids[j, :]
      x = int(xy_xy_label[0])
      x = max(x - 50, 0) # sometimes bounding boxes are too tight
      y = int(xy_xy_label[1])
      x2 = int(xy_xy_label[2])
      x2 = min(x2 + 50, det_img.shape[1])
      y2 = int(xy_xy_label[3])
      track_label = str(int(xy_xy_label[4]))

      # we also tried cropping according to ONLY the first frame but it was quite wide
      # if "001" in str(key):
      #   print(f"{j}th valid person detected for {vid_path}.")
      #   # add the first frame bbox to the first_frame_widths dict
      #   first_frame_widths[track_label] = (x, x2)
      # elif track_label not in first_frame_widths:
      #     print(f"New person detected at frame {key} for {vid_path}, track_label: {track_label}.")
      #     continue

      # get the height of the bbox
      if track_label not in heights:
        print(f"New person detected at frame {key} for {vid_path}, track_label: {track_label}.")
        heights[track_label] = []
      heights[track_label].append(y2 - y)

      # crop each person with along the width according to their first bbox, but keep the full height
      # cropped_img = det_img[:, first_frame_widths[track_label][0]:first_frame_widths[track_label][1]]
      cropped_img = det_img[:, x:x2]
      # make a directory for this 'track_label'
      os.makedirs(f'{cur_save_path}/' + track_label, exist_ok=True)
      if isinstance(cropped_img, np.ndarray):
        try:
          # save cropped video and landmark data in separate folders
          cv2.imwrite(f'{cur_save_path}/' + track_label + '/person_' + track_label + '_' + key, cropped_img)
        except Exception as e:
          print(f"vid_path: {vid_path}, key: {key}, track_label: {track_label}, error: {e}")
          continue

  # now this video is processed, we can check which participant had maximum score
  # this is fro the auto-gait paper itself
  # score = sum of diffs in heights
  max_score = -1
  max_k = None
  for k, v in heights.items():
    if len(v) > 0:
      scores = -np.diff(v) # negative because we want to pick the person with decreasing height
      scores = np.sum(scores)
      if scores > max_score:
        max_score = scores
        max_k = k
    else:
      print(f"Empty height for {k} in {vid_path}, skipping...")
      continue
  if max_k is not None:
    # this is the participant with maximum score
    print(f"Max score for {vid_path} is {max_score}, for participant {max_k}")
    # delete frames for other participants
    for k, v in heights.items():
      if k != max_k:
        shutil.rmtree(f'{cur_save_path}/' + str(k))
        print(f"Deleted {k} from {vid_path}")
  progress.append(vid_path)


We have now a list of frames with largely only the patient in them.

In [None]:
!zip final_frames.zip -r "$FINAL"
shutil.copy("final_frames.zip", RAW_VIDEOS_PATH) # the frames will be in your drive.

---
## 4. Create videos and move to drive for manual inspection
*   We will make videos for all the people tacked, following the same directory structure and then move the folder to our drive.
*   Quite fast (can be run on a CPU runtime), takes about 15m.
---

In [None]:
# Make new directory for saving videos
!mkdir "$VIDEO_PATH"

In [None]:
for vid_path in tqdm(sorted(os.listdir(FINAL_PATH))):
  cur_save_path = join(FINAL_PATH, vid_path)
  if len(os.listdir(cur_save_path)) == 1:
    track_label = os.listdir(cur_save_path)[0]
    # directly save the video
    video_maker = ["ffmpeg",
                   "-framerate", "30",
                   "-pattern_type", "glob",
                   "-i", os.path.join(FINAL_PATH, vid_path, track_label, "*.jpg"),
                   "-c:v", "libx264",
                   "-vf", "pad=ceil(iw/2)*2:ceil(ih/2)*2",
                   "-pix_fmt", "yuv420p",
                   join(VIDEO_PATH, vid_path + ".mp4")]
    out = subprocess.run(video_maker, stderr=subprocess.PIPE)
  else:
    os.mkdir(join(VIDEO_PATH, vid_path))
    for track_label in sorted(os.listdir(cur_save_path)):
      video_maker = ["ffmpeg",
                    "-framerate", "30",
                    "-pattern_type", "glob",
                    "-i", os.path.join(FINAL_PATH, vid_path, track_label, "*.jpg"),
                    "-c:v", "libx264",
                    "-pix_fmt", "yuv420p",
                    join(VIDEO_PATH, vid_path, track_label + ".mp4")]
      out = subprocess.run(video_maker, stderr=subprocess.PIPE)


100%|██████████| 149/149 [14:09<00:00,  5.70s/it]


In [None]:
shutil.copytree(VIDEO_PATH, "/content/drive/MyDrive/ataxia_dataset/patient_videos/")

## Thank you!
Now we can move on to further processing using these videos to extract keypoints with OpenPose.
*   We have manually inspected these and ensured that all of them process sucessfully and have the patient in them for a majority of the time, especially at the start.
*   However, after the OpenPose extraction you would need to investigate further manually.
*   The videos might look funny, but the frames are properly created, as the height is the same across videos, when you play them they might start bulging or shrinking.