## Traffic Camera Tracking

Main goals of this project is to track escooters, pedestrians and cyclists through a traffic intersection and record informations regarding their time of passing and the directions. This project is a part of **SaveNoW**, where all these details are used to create a digital twin of Ingolstadt for simulation purposes.

So far only escooters have been taken into account of this project. When all the code works, it won't be difficult to retrain the model with enough annotations of cyclists and pedestrians and modify a few lines of code to accomodate them.

_For further updates on the project, follow this notion page: https://www.notion.so/Project-Traffic-Camera-tracking-15a9bb984c1341369ad40562610dc83f_


## Basic Setup

This project depends heavily on **Detectron2** and **PyTorch** for running a MASK R-CNN Model for instance segmentation. Installing them is really easy for Linux Machines but for Windows 10, some workaround is required (_I have mentioned about this in my notion page_). PyTorch can be installed from their main website. The installations are not part of this notebook and must be done separately.

In [1]:
# install dependencies: 
!pip install pyyaml==5.1
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())


1.8.0 True


## Importing the required packages and dependencies

In [2]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import numpy as np
import os, json, cv2, random
from os import path

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances

## Input Data

The folder structure, in which all the clips and the associated annotations must be placed

**!! Images directory is not required. It is created in the next code cell**

```
input/
│
└─── 1/
│       1.mp4
│       1_Annotations.json
|       images/
|            frame_000001.jpg
|            frame_000002.jpg and so on
|
└─── 2/
│       2.mp4
│       2_Annotations.json
|       images/
|            frame_000001.jpg
|            frame_000002.jpg and so on
.
.
.
│   
└─── {Clip_Number}/
        {Clip_Number}.mp4
        {Clip_Number}_Annotations.json
        images/
             frame_000001.jpg
             frame_000002.jpg and so on
```

## Splitting the clips into frames
In the below code cell, we are using FFMPEG Commands to split the clip into images for training the MASK R-CNN Model.

**!! Note: Install FFMPEG first. If not, atleast have a FFMPEG Executable file in the same directory as running this notebook, otherwise it won't work**

In [7]:
# main_path refers to current working directory
main_path = os.getcwd()
print(main_path)
# input_path refers to the folder containing the clips folders (as mentioned above in the diagram)
#input_path = main_path + '/input'
input_path = os.path.abspath(r'D:\Project_Escooter_Tracking\input')
print(input_path)

for dir in os.listdir(input_path):
  clip_path = input_path + f'\\{dir}'
  print(clip_path)
  
  image_path = clip_path + '\\images'
  os.system(f'mkdir {image_path}')

  labels_path = image_path + '\\labels'
  os.system(f'mkdir {labels_path}')

  clip_path += f'\\{dir}.mp4'

  ffmpeg_command = 'ffmpeg -i ' + clip_path + " " + image_path + '\\frame_%06d.png'
  print(ffmpeg_command)
  os.system(ffmpeg_command)

C:\Users\balaji\Desktop\Traffic_Camera_Tracking\Main_Code
D:\Project_Escooter_Tracking\input
D:\Project_Escooter_Tracking\input\16
ffmpeg -i D:\Project_Escooter_Tracking\input\16\16.mp4 D:\Project_Escooter_Tracking\input\16\images\frame_%06d.png
D:\Project_Escooter_Tracking\input\18
ffmpeg -i D:\Project_Escooter_Tracking\input\18\18.mp4 D:\Project_Escooter_Tracking\input\18\images\frame_%06d.png
D:\Project_Escooter_Tracking\input\2
ffmpeg -i D:\Project_Escooter_Tracking\input\2\2.mp4 D:\Project_Escooter_Tracking\input\2\images\frame_%06d.png
D:\Project_Escooter_Tracking\input\20
ffmpeg -i D:\Project_Escooter_Tracking\input\20\20.mp4 D:\Project_Escooter_Tracking\input\20\images\frame_%06d.png
D:\Project_Escooter_Tracking\input\21
ffmpeg -i D:\Project_Escooter_Tracking\input\21\21.mp4 D:\Project_Escooter_Tracking\input\21\images\frame_%06d.png
D:\Project_Escooter_Tracking\input\22
ffmpeg -i D:\Project_Escooter_Tracking\input\22\22.mp4 D:\Project_Escooter_Tracking\input\22\images\frame_%0

## Reading, manipulating and checking the annotations on the images

### This code section is partly derived from Check_Dataset.ipynb and ReadJSON.py. 
---------------------------------------------------------------------

This is my custom code for combining all the CVAT annotation files (_Format: COCO v1_) and splitting them into **Train**, **Test** and **Validation** .json files 😎. 

**!! Important thing to note while importing annotations from CVAT:**
When we annotate in CVAT, for each clip we receive one annotation file. Unfortunately, these annotation files have _'frame numbers'_ that start only from 0 but Detectron 2 needs it to start from **1**. In the below code cell, we are taking care of this by using the function _adjustFrameDifference()_

In [9]:
import json
from copy import deepcopy
import random

# coco_format is the dict file which includes all the values that needs to be output in the final annotations json file
# Some of the key values like 'licenses', 'info' and 'categories' are constant and declared at first here

coco_format = {
    "licenses": [{
        "name": "",
        "id": 0,
        "url": ""
    }],
    "info": {
        "contributor": "Vishal Balaji",
        "date_created": "",
        "description": "Escooter Dataset",
        "url": "",
        "version": "",
        "year": ""
    },
    "categories": [{
        "id": 1,
        "name": "Escooter",
        "supercategory": ""
    }]
}

# The key values 'images' and 'annotations' needs to be processed and appended. The below given lines is the format for
# those dicts.
"""
"images":[
    {
        "id":1,
        "width": 1920,
        "height": 1080,
        "file_name":"sdfa.PNG",
        "license":0,
        "flickr_url": "",
        "coco_url": "",
        "date_captured": 0
    }
]

"annotations":[
    {
        "id": 1,
        "image_id": 55,
        "category_id": 1,
        "segmentation": [[]],
        "area": {some area number in float},
        "bbox": [].
        "iscrowd": 0
    }
]
"""

# Path where the annotations are stored, when the repo is the path of current working directory
#main_file_path = os.path.abspath(r'D:\Carissma Video Copy\Traffic Camera Tracking\Finished')
input_path = r'D:\Project_Escooter_Tracking\input'
main_file_path = input_path

# Declaration of empty lists that is later appended it with images and annotations.
images_list = []
annotations_list = []

# Each image and annotations has an ID associated with it and it starts with 1.
# These values are incremented as the images and annotations are being added.
img_num = 1
anno_num = 1

def adjustFrameDifference(file_name, offset=1):
    # Adjusting for difference in frame
    file_name_from_dict = file_name.split('.')[0]
    file_number = int(file_name_from_dict[-6:])
    
    # 1 is the offset number for the frame difference between the annotations 
    # from CVAT and frames extracted from the FFMPEG Script
    file_number += 1
    
    # Adding the write number of 0's and taking care of proper filename
    if int(file_number / 10) == 0:
      new_file_name = file_name_from_dict[:-6] + '00000' + str(file_number) + '.png'
    elif int(file_number / 100) == 0:
      new_file_name = file_name_from_dict[:-6] + '0000' + str(file_number) + '.png'
    elif int(file_number / 1000) == 0:
      new_file_name = file_name_from_dict[:-6] + '000' + str(file_number) + '.png'
    elif int(file_number / 10000) == 0:
      new_file_name = file_name_from_dict[:-6] + '00' + str(file_number) + '.png'
    
    return new_file_name


print("- Processing the following annotation files: ")
for clip_number, clips in enumerate(os.listdir(main_file_path)):
    # Checking that only numbers are given as folder names for the clips
    if all(char.isdigit() for char in clips):
      # Path of the clips folder
      clips_path = main_file_path + '\\' + clips
      # Path of the annotation of the clips
      annotation_file = clips_path + f'\\{str(clips)}_Annotations.json'

      file = open(annotation_file)
      json_file = json.load(file)
      print(f'  - {annotation_file}')
        
      
      # !! Testing purpose only for restricting number of annotations
      # flag = 1
      for annotations in json_file['annotations']:

          anno_image_ID = annotations['image_id']
          anno_ID = annotations['id']

          image_filename = ''
          for images in json_file['images']:
              if images['id'] == anno_image_ID:
                  image_filename = images['file_name']

          filename = input_path + '\\' + clips + '\\images\\' + image_filename
          filename = adjustFrameDifference(filename)  
          
        # The formats for 'images' dictionary and 'annotations' dictionary in COCO
          image_dict = {
              'id': img_num,
              "width": 1920,
              "height": 1080,
              "file_name": filename,
              "license": 0,
              "flickr_url": "",
              "coco_url": "",
              "date_captured": 0
          }
          anno_dict = {
              "id": anno_num,
              'image_id': img_num,
              "category_id": 1,
              'segmentation': annotations['segmentation'],
              'area': annotations['area'],
              'bbox': annotations['bbox'],
              'iscrowd': annotations['iscrowd']
          }

          # In the COCO-Format, every images and associated annotations are passed as array of dicts.
          images_list.append(image_dict)
          annotations_list.append(anno_dict)

          # Incrementing the Image ID and Annotation ID for each loop
          img_num += 1
          anno_num += 1
        
      file.close()

      # !! Meant for testing purpose.
      # if clip_number == 1:
      #     break

print(f'\n- Total no.of annotations/images in the dataset: {anno_num}')

train_json = deepcopy(coco_format)
valid_json = deepcopy(coco_format)
test_json = deepcopy(coco_format)

train_split = 0.8
valid_split = 0.1
test_split = 0.1

# Function to split the whole dataset of images and annotations into train,
# valid and test sets
def splitDataset(images, annotations, trainSplit, validSplit):
  trainSize = int(len(images) * trainSplit)
  train_images = []
  train_annotations = []
  
  copy_images = list(images)
  copy_annotations = list(annotations)
  while len(train_images) < trainSize:
    index = random.randrange(len(copy_images))
    train_images.append(copy_images.pop(index))
    train_annotations.append(copy_annotations.pop(index))
  

  copySize = int(len(copy_images) * (validSplit/(1 - trainSplit)))
  valid_images = []
  valid_annotations = []

  test_images = copy_images
  test_annotations = copy_annotations
  while len(valid_images) < copySize:
    index = random.randrange(len(test_images))
    valid_images.append(test_images.pop(index))
    valid_annotations.append(test_annotations.pop(index))
  
  return [(train_images, train_annotations), (valid_images, valid_annotations), (test_images, test_annotations)]

train_set, valid_set, test_set = splitDataset(images_list, annotations_list, 0.8, 0.1)
print("\n- Splitting the dataset into Train, Valid and Test is successfull\n")

# Storing the processed arrays of images and annotations with their
# respective keys in the final dataset
# coco_format["images"] = images_list
# coco_format["annotations"] = annotations_list

train_json['images'] = train_set[0]
train_json['annotations'] = train_set[1]

valid_json['images'] = valid_set[0]
valid_json['annotations'] = valid_set[1]

test_json['images'] = test_set[0]
test_json['annotations'] = test_set[1]

# Code Snippet to automatically create new names for the many
# .json files created during the testing
base_filename = 'Test_'
for numbers in range(20):
    check_filename = base_filename + str(numbers+1) + '.json'
    if check_filename not in os.listdir(os.getcwd()):
        base_filename = check_filename
        break


# These lines writes all the dictionaries into the final required .json file
# For train, valid and test individually
train_file = f"{input_path}\\{base_filename[:-5]}_Train.json"
valid_file = f"{input_path}\\{base_filename[:-5]}_Valid.json"
test_file = f"{input_path}\\{base_filename[:-5]}_Test.json"

print("- Saving train, test and valid annotation files")
with open(train_file, "w") as file:
    json.dump(train_json, file)
    print(f"  - Final training set file saved as: {train_file}")

with open(valid_file, "w") as file:
    json.dump(valid_json, file)
    print(f"  - Final valid set file saved as: {valid_file}")

with open(test_file, "w") as file:
    json.dump(test_json, file)
    print(f"  - Final test set file saved as: {test_file}")


- Processing the following annotation files: 
  - D:\Project_Escooter_Tracking\input\16\16_Annotations.json
  - D:\Project_Escooter_Tracking\input\18\18_Annotations.json
  - D:\Project_Escooter_Tracking\input\2\2_Annotations.json
  - D:\Project_Escooter_Tracking\input\20\20_Annotations.json
  - D:\Project_Escooter_Tracking\input\21\21_Annotations.json
  - D:\Project_Escooter_Tracking\input\22\22_Annotations.json
  - D:\Project_Escooter_Tracking\input\23\23_Annotations.json
  - D:\Project_Escooter_Tracking\input\24\24_Annotations.json
  - D:\Project_Escooter_Tracking\input\25\25_Annotations.json
  - D:\Project_Escooter_Tracking\input\26\26_Annotations.json
  - D:\Project_Escooter_Tracking\input\27\27_Annotations.json
  - D:\Project_Escooter_Tracking\input\28\28_Annotations.json
  - D:\Project_Escooter_Tracking\input\29\29_Annotations.json
  - D:\Project_Escooter_Tracking\input\3\3_Annotations.json
  - D:\Project_Escooter_Tracking\input\30\30_Annotations.json
  - D:\Project_Escooter_Trac

## Visualizing the annotations

We have loaded all our annotations in the right format from .json files. To check whether the annotations are loaded correctly and mapped to the right images, we are randomly selecting _8_ images and displaying the annotations on top of them.

In [11]:
from IPython.display import Image, display

# Registering the datasets to the Detectron2 model. Using try-except block because dataset can be registered only once.
# If we need to run this code block multiple times, then we need to ignore the 'AssertionError'
try:
    register_coco_instances("escooter_train", {}, train_file, '')
except AssertionError:
    pass

try:
    register_coco_instances("escooter_valid", {}, valid_file, '')
except AssertionError:
    pass

try:
    register_coco_instances("escooter_test", {}, test_file, '')
except AssertionError:
    pass

dataset_dicts = DatasetCatalog.get("escooter_test")

# List of labelled_images for visualization purposes
out_files = []

# Making a test_labels folder
sample_label_path = f'{input_path}\sample_labels'
os.system(f'mkdir {sample_label_path}')

# Just selecting 8 random images for visualizing purposes
for d in random.sample(dataset_dicts, 8):   
    
    img = cv2.imread(d['file_name'])

    visualizer = Visualizer(img[:, :, ::-1], scale=1)
    out = visualizer.draw_dataset_dict(d)

    out_filename = sample_label_path + '\\' + d['file_name'][:-4].split('\\')[-1] + '.png'
    cv2.imwrite(out_filename, out.get_image()[:, :, ::-1])
    out_files.append(out_filename)

# Displays images in notebook. Caution: 'display()' may not work outside this notebook
for images in out_files:
  display(Image(images))

[32m[06/20 07:13:51 d2.data.datasets.coco]: [0mLoaded 2261 images in COCO format from D:\Project_Escooter_Tracking\input\Test_1_Test.json


## Training the model

This Config _cfg_ variable is very important in Detectron2 and must be tuned perfectly for optimal results. So far, ideal values have been copied from internet and used but **HyperParameter Tuning** is yet to be done.
For our purpose, we are using transfer learning here to use an already COCO-pretrained R50-FPN Mask R-CNN model

In [4]:
from detectron2.engine import DefaultTrainer

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("escooter_train",)
cfg.DATASETS.TEST = ('escooter_test',)
cfg.DATALOADER.NUM_WORKERS = 42
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.WARMUP_ITERS = 100
cfg.SOLVER.MAX_ITER = 5000    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
#trainer.train()

[32m[06/20 07:11:15 d2.engine.defaults]: [0mModel:
GeneralizedRCNN(
  (backbone): FPN(
    (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
    (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (top_block): LastLevelMaxPool()
    (bottom_up): ResNet(
      (stem): BasicStem(
        (conv1): Conv2d(
          3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
      )
 

KeyError: "Dataset 'escooter_train' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_train_panoptic, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_panoptic, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, coco_2017_val_100_panoptic, lvis_v1_train, lvis_v1_val, lvis_v1_test_dev, lvis_v1_test_challenge, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, lvis_v0.5_train_cocofied, lvis_v0.5_val_cocofied, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, cityscapes_fine_panoptic_train, cityscapes_fine_panoptic_val, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, ade20k_sem_seg_train, ade20k_sem_seg_val"

In [27]:
# Looking at the values of loss and other parameters in the tensorboard
%load_ext tensorboard
%tensorboard --logdir output

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 5784), started 0:58:33 ago. (Use '!kill 5784' to kill it.)

## Inference

Now using the trained model to check the validation set (Not the same training data for obvious reasons😅).

In [None]:
from detectron2.utils.visualizer import ColorMode

# load weights
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
# Set training data-set path
cfg.DATASETS.TEST = ("escooter_valid", )
# Create predictor (model for inference)
predictor = DefaultPredictor(cfg)

valid_images = []
for d in random.sample(dataset_dicts, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
                   metadata=escooter_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
   
    validation_label_path = f'{input_path}\\valid_labels'
    os.system(f'mkdir {validation_label_path}')

    out_filename = validation_label_path + '\\' + d['file_name'][:-4].split('\\')[-1] + '.png'
    cv2.imwrite(out_filename, v.get_image()[:, :, ::-1])
    valid_images.append(out_filename)

# Displays images in notebook. Caution: 'display()' may not work outside this notebook
for images in valid_images:
  display(Image(images))

# Inference on images and Videos

With this code cell, we can test the model on individual images or videos.

In [20]:
# Inference on normal images

from detectron2.utils.visualizer import ColorMode

# load weights
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model

# Create predictor (model for inference)
predictor = DefaultPredictor(cfg)

dataset_dicts = DatasetCatalog.get("escooter_valid")
escooter_metadata = MetadataCatalog.get("escooter_valid")

input_path = r'D:\Project_Escooter_Tracking\input'
path_images = r'D:\Project_Escooter_Tracking\test\images'
existing_files = []
for img in os.listdir(path_images):
  fileName = path_images + '\\' + img
  existing_files.append(fileName)

final_test_label_path = f'{input_path}\\test_inference_video'
os.system(f'mkdir {final_test_label_path}')

def inference_single_image(filename, output):
    im = cv2.imread(filename)
    #im = frame
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
                   metadata=escooter_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
   
    out_filename = output + '\\' + filename[:-4].split('\\')[-1] + '.png'
    cv2.imwrite(out_filename, v.get_image()[:, :, ::-1])
    return out_filename

labeled_images = []
for d in existing_files:    
    labeled_images.append(inference_single_image(d, final_test_label_path))
    
def inference_video(filepath):
#     video = cv2.VideoCapture(filepath)
#     while video.isOpened():
#             success, frame = video.read()
#             if success:
#                 inference_single_image(frame, final_test_label_path)
#             else:
#                 break
    pass

print(size(labeled_images))
#inference_video(r'D:\Project_Escooter_Tracking\test\images')
# Displays images in notebook. Caution: 'display()' may not work outside this notebook
#for images in labeled_images:
  #display(Image(images))

[32m[06/20 07:38:47 d2.data.datasets.coco]: [0mLoaded 2260 images in COCO format from D:\Project_Escooter_Tracking\input\Test_1_Valid.json


NameError: name 'size' is not defined