# **OBJECT DETECTION MODELS applied to nuImages**

## **Transfer Learning with TensorFlow 2.x**

**This notebook is part of the BTTAI Challenge 2023**

In this phase of the project, we will be training a custom detector for pedestrians using pretrained models. We will do transfer learning to achieve this.

## **OUTLINE**

### **Part 1: Testing models (COMPLETED IN PREVIOUS NOTEBOOK)**

* Import and/or install libraries needed

* Import tensorflow models to test

* Install the TensorFlow Object Detection API

* Read images from the nuImages sample folder

* Test object detection with given model



### **Part 2: Train and evaluate a model with transfer learning (this notebook)**

* Import libraries

* Create custom folder structure

* Collect the dataset of images and label them to get their xml files

* Generate the TFRecord files required for training

* Edit the model pipeline config file and download the pre-trained model checkpoint

* Train and evaluate the model





## **Part 2: Transfer learning using a pretrained object detection model**

## **1) Import Libraries**

In [26]:
import os
import glob
import cv2 as cv2
import numpy as np
import xml.etree.ElementTree as ET
import pandas as pd
import tensorflow as tf


## **2) Create custom folder structure in Google Drive**
We will be using Google Drive to store all files needed to run these tests. Do the following:

* Create a folder named ***nuImg_customOD*** in your google drive.

* Create two folders inside the ***nuImg_customOD*** folder:

 -- One folder named ***training*** (this is where the checkpoints will be saved during training)

 -- Another folder named ***data*** (this is where we will add all images and annotations used for training)


### **Create and upload your image files and xml files.**
 Inside the folder ***data*** create a folder named ***images*** for your custom dataset images and create another folder named ***annotations*** for its corresponding xml files.

Remember the xml files are expected to have PASCAL_VOC XML format.

Next, upload the files to the corresponding  ***nuImg_customOD/data/*** folder (or upload .zip files and we will unzip them in Step 4)  


 Note: All image files should have extension as ".jpg" - this is because we want to save disk space. This will be useful as we need to generate the TensorFlow records later







##**4) Mount drive and link your folder**

In [27]:
from google.colab import drive

drive.mount('/content/gdrive')

# # this creates a symbolic link so that now the path /content/gdrive/My\ Drive/ is equal to /mydrive
!ln -s "/content/gdrive/My Drive/" /mydrive
!ls /mydrive
%cd /mydrive/MathWorks\ \#2\ \(BOS\)\ -\ Classify\ Object\ Behavior\ to\ Enhance\ the\ Safety\ of\ Autonomous\ Vehicles/







Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
ln: failed to create symbolic link '/mydrive/My Drive': File exists
 annotation_folder
'Colab Notebooks'
'everything else'
'F22: CS 1200 '
'HONR 1102'
'MathWorks #2 (BOS) - Classify Object Behavior to Enhance the Safety of Autonomous Vehicles'
'My Drive'
 nuImg_customOD
/content/gdrive/.shortcut-targets-by-id/1V6R5dIPvEZICGv8F8ogtd3KU8lk3Bltk/MathWorks #2 (BOS) - Classify Object Behavior to Enhance the Safety of Autonomous Vehicles


In [28]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### **Optional: Unzip files**  


In [29]:
# Uncomment the following lines of code and run if you uploaded .zip files

## unzip the datasets and their contents so that they are now in /mydrive/customTF2/data/ folder
#!unzip /mydrive/nuImg_customOD/images.zip -d .
#!unzip /mydrive/nuImg_customOD/annotations.zip -d .

## **5) Clone the tensorflow models and test the model builder**


In [39]:
# clone the tensorflow models on the colab cloud vm
!git clone --q https://github.com/tensorflow/models.git

#navigate to /models/research folder to compile protos
%cd models/research

# Compile protos.
!protoc object_detection/protos/*.proto --python_out=.

# Install TensorFlow Object Detection API.
!cp object_detection/packages/tf2/setup.py .
!python -m pip install .

fatal: could not create work tree dir 'models': Transport endpoint is not connected
[Errno 107] Transport endpoint is not connected: 'models/research'
/content/gdrive/.shortcut-targets-by-id/1V6R5dIPvEZICGv8F8ogtd3KU8lk3Bltk/MathWorks #2 (BOS) - Classify Object Behavior to Enhance the Safety of Autonomous Vehicles
Could not make proto path relative: object_detection/protos/*.proto: No such file or directory
cp: failed to access '.': Transport endpoint is not connected
[31mERROR: Invalid requirement: '.'[0m[31m
[0m

In [31]:
# testing the model builder
!python object_detection/builders/model_builder_tf2_test.py

python3: can't open file '/content/gdrive/.shortcut-targets-by-id/1V6R5dIPvEZICGv8F8ogtd3KU8lk3Bltk/MathWorks #2 (BOS) - Classify Object Behavior to Enhance the Safety of Autonomous Vehicles/object_detection/builders/model_builder_tf2_test.py': [Errno 107] Transport endpoint is not connected


## **6) Create test_labels & train_labels**
Current working directory is '/mydrive/nuImg_customOD/data/'

We will now divide annotations into test_labels(20%) and train_labels(80%).

In [38]:
# Creating two directories for training and testing
!mkdir test_labels train_labels

# Count the total number of files
total_files=$(ls annotations/* | wc -l)

# Calculate 20% of the total files
twenty_percent=$((total_files * 20 / 100))

# Lists the files inside 'annotations' in a random order (not really random, by their hash value instead)
# Moves the first 20% labels (20% of the labels) to the testing dir: `test_labels`
ls annotations/* | sort -R | head -$twenty_percent | xargs -I{} cp {} test_labels/

# Moves the rest of the labels to the training dir: `train_labels`
ls annotations/* | xargs -I{} cp {} train_labels/


SyntaxError: ignored

## **7) Generate TensorFlow record**

The TensorFlow Record is Tensorflow’s own binary storage format. If you are working with large datasets, using a binary file format for storage of your data can have a significant impact on the performance of your import pipeline and as a consequence on the training time of your model.

Read more here: https://medium.com/mostly-ai/tensorflow-records-what-they-are-and-how-to-use-them-c46bc4bbb564

### **First, create the CSV files and "label_map.pbtxt" file**

Run the script in the cell below to create test_labels.csv and train_labels.csv

This also creates the label_map.pbtxt file using the classes mentioned in the xml files.

In [None]:
def xml_to_csv(path):
  classes_names = []
  xml_list = []

  for xml_file in glob.glob(path + '/*.xml'):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    for member in root.findall('object'):
      classes_names.append(member[0].text)
      value = (root.find('filename').text  ,
               int(root.find('size')[0].text),
               int(root.find('size')[1].text),
               member[0].text,
               int(member[4][0].text),
               int(member[4][1].text),
               int(member[4][2].text),
               int(member[4][3].text))
      xml_list.append(value)
  column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
  xml_df = pd.DataFrame(xml_list, columns=column_name)
  classes_names = list(set(classes_names))
  classes_names.sort()
  return xml_df, classes_names

for label_path in ['train_labels', 'test_labels']:
  image_path = os.path.join(os.getcwd(), label_path)
  xml_df, classes = xml_to_csv(label_path)
  xml_df.to_csv(f'{label_path}.csv', index=None)
  print(f'Successfully converted {label_path} xml to csv.')

label_map_path = os.path.join("label_map.pbtxt")
pbtxt_content = ""

for i, class_name in enumerate(classes):
    pbtxt_content = (
        pbtxt_content
        + "item {{\n    id: {0}\n    name: '{1}'\n}}\n\n".format(i + 1, class_name)
    )
pbtxt_content = pbtxt_content.strip()
with open(label_map_path, "w") as f:
    f.write(pbtxt_content)
    print('Successfully created label_map.pbtxt ')

### **Second, create train.record & test.record files**

Current working directory is /mydrive/nuImg_customOD/data/

Run the *generate_tfrecord.py* script to create *train.record* and *test.record* files



In [None]:
#Usage:
#!python generate_tfrecord.py output.csv output_pb.txt /path/to/images output.tfrecords

#For train.record
!python /mydrive/nuImg_customOD/generate_tfrecord.py train_labels.csv  label_map.pbtxt images/ train.record

#For test.record
!python /mydrive/nuImg_customOD/generate_tfrecord.py test_labels.csv  label_map.pbtxt images/ test.record


## **8) Download pre-trained model checkpoint**

Current working directory is /mydrive/nuImg_customOD/data/

In the code cell below, we download the .tar.gz file of the model into the ***data*** folder & unzip it.

A list of detection models for tensorflow 2.x can be found [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md).



In [None]:
# As an example, we will use ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz

#Download the pre-trained model ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz into the data folder & unzip it.

#TODO: CHANGE IT TO FASTER RCNN

# !wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz
# !tar -xzvf ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz

# Download the pre-trained Faster R-CNN model
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_coco_640x640_coco17_tpu-8.tar.gz
!tar -xzvf faster_rcnn_resnet50_coco_640x640_coco17_tpu-8.tar.gz

## **9) Get the model pipeline config file, make changes to it and put it inside the *data* folder**

> # ⛔ Attention
> In order to run the training, a configuration file needs to be edited. Make sure you follow these instructions

Current working directory is /mydrive/nuImg_customOD/data/

Copy the model configuration file from ***/content/models/research/object_detection/configs/tf2*** to ***/mydrive/nuImg_customOD/data***

In the code below, an example is provided with
**ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config**

Edit based on the model you want to test.











In [None]:
#copy the edited config file from the configs/tf2 directory to the data/ folder in your drive

#TODO: CHANGE TO FASTER RCNN
#change num_classes to number of your classes.
#change test.record path, train.record path & labelmap path to the paths where you have created these files (paths should be relative to your current working directory while training).
#change fine_tune_checkpoint to the path of the directory where the downloaded checkpoint from step 12 is.
#change fine_tune_checkpoint_type with value classification or detection depending on the type..
#change batch_size to any multiple of 8 depending upon the capability of your GPU. (eg:- 24,128,...,512).Mine is set to 64.
#change num_steps to number of steps you want the detector to train.

!cp /content/models/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config /mydrive/customOD/data

# maybe this? TODO CHECK

!python model_main_tf2.py --pipeline_config_path=/mydrive/nuImg_customOD/data/faster_rcnn_resnet50_coco_640x640_coco17_tpu-8.config --model_dir=/mydrive/nuImg_customOD/training --alsologtostderr


Now, edit the pipeline config file inside the model folder we just copied.

**You need to make the following changes:**
*   change ***num_classes*** to number of your classes.
*   change ***test.record*** path, ***train.record*** path & ***labelmap*** path to the paths where you have created these files (paths should be relative to your current working directory while training).
* change ***fine_tune_checkpoint*** to the path of the directory where the downloaded checkpoint from step 12 is.
* change ***fine_tune_checkpoint_type*** with value **classification** or **detection** depending on the type..
* change ***batch_size*** to any multiple of 8 depending upon the capability of your GPU.
(eg:- 24,128,...,512).Mine is set to 64.
* change ***num_steps*** to number of steps you want the detector to train.


## **10) Load Tensorboard**

TensorBoard provides the visualization and tooling needed for machine learning experimentation:

* Tracking and visualizing metrics such as loss and accuracy
* Visualizing the model graph (ops and layers)
* Viewing histograms of weights, biases, or other tensors as they change over time
* Much more! See more info here: https://www.tensorflow.org/tensorboard

In [None]:
#load tensorboard

%load_ext tensorboard
%tensorboard --logdir '/content/gdrive/MyDrive/nuImg_customOD/training'

## **11) Train the model**






Navigate to the ***object_detection*** folder in colab vm


In [None]:
%cd /content/models/research/object_detection

**11 (a) Training using model_main_tf2.py**

Here **{PIPELINE_CONFIG_PATH}** points to the pipeline config and **{MODEL_DIR}** points to the directory in which training checkpoints and events will be written.

For best results, you should stop the training when the loss is less than 0.1 if possible, else train the model until the loss does not show any significant change for a while. The ideal loss should be below 0.05 (Try to get the loss as low as possible without overfitting the model. Don’t go too high on training steps to try and lower the loss if the model has already converged viz. if it does not reduce loss significantly any further and takes a while to go down. )

In [None]:
# Needed to prevent an existing bug in TensorFlow. Downgrade version running in Colab
pip install tensorflow==2.13.0

In [None]:
# Run the command below from the content/models/research/object_detection directory
"""
PIPELINE_CONFIG_PATH=path/to/pipeline.config
MODEL_DIR=path to training checkpoints directory
NUM_TRAIN_STEPS=50000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1

python model_main_tf2.py -- \
  --model_dir=$MODEL_DIR --num_train_steps=$NUM_TRAIN_STEPS \
  --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
  --pipeline_config_path=$PIPELINE_CONFIG_PATH \
  --alsologtostderr
"""
#TODO: CHANGE TO FASTER RCNN
!python model_main_tf2.py --pipeline_config_path=/mydrive/nuImg_customOD/data/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config --model_dir=/mydrive/nuImg_customOD/training --alsologtostderr

## **12) Test your trained model**

Export inference graph

Current working directory is /content/models/research/object_detection

In [None]:
## Export inference gra