# Object Detection using TAO YOLOv4 with 16-bit imagery

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">


## Sample prediction of YOLOv4
<img align="center" src="https://github.com/vpraveen-nv/model_card_images/blob/main/cv/notebook/common/sample.jpg?raw=true" width="960"> 

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Train a ResNet-18 YOLOv4 model on the 16-bit KITTI dataset
* Prune the trained YOLOv4 model
* Retrain the pruned model to recover lost accuracy
* Export the retrained model to .onnx model for inference
* Convert the .onnx model to TensorRT engine using tao deploy for inference

At the end of this notebook, you will have generated a trained and optimized `yolo_v4` model
trained on 16-bit input images, which you may deploy via [Triton](https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps)
or [DeepStream](https://developer.nvidia.com/deepstream-sdk).

## Table of Contents

This notebook shows an example usecase of YOLO v4 object detection with 16-bit PNG images using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Install the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2) <br>
     2.1 [Download the dataset](#head-2-1)<br>
     2.2 [Verify the downloaded dataset](#head-2-2)<br>
     2.3 [Generate tfrecords](#head-2-3)<br>
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Prune trained models](#head-6)
7. [Retrain pruned models](#head-7)
8. [Evaluate retrained model](#head-8)
9. [Model Export](#head-9)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/yolo_v4`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

In [None]:
# Setting up env variables for cleaner command line commands.
import os

%env USER_EXPERIMENT_DIR=/workspace/tao-experiments/yolo_v4_16bit_grayscale
%env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/yolo_v4_16bit_grayscale

# Please define this local project directory that needs to be mapped to the TAO docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/yolo_v4_16bit_grayscale
%env LOCAL_PROJECT_DIR=YOUR_LOCAL_PROJECT_DIR_PATH
os.environ["LOCAL_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "yolo_v4_16bit_grayscale")

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)
%env SPECS_DIR=/workspace/tao-experiments/yolo_v4_16bit_grayscale/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

In [None]:
# Create local dir
!mkdir -p $LOCAL_DATA_DIR
!mkdir -p $LOCAL_EXPERIMENT_DIR

The cell below maps the project directory on your local host to a workspace directory in the TAO docker instance, so that the data and the results are mapped from outside to inside of the docker instance.

In [None]:
# Mapping up the local directories to the TAO docker.
import json
mounts_file = os.path.expanduser("~/.tao_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
    "Mounts": [
        # Mapping the data directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
        # Mapping the specs directory.
        {
            "source": os.environ["LOCAL_SPECS_DIR"],
            "destination": os.environ["SPECS_DIR"]
        },
    ]
}

# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(drive_map, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Install the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in PyPI. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

After setting up your virtual environment with the above requirements, install TAO pip package.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install --upgrade nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info --verbose

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

 We will be using the KITTI detection dataset for the tutorial. To find more details please visit
 http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download the KITTI detection images (http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and labels (http://www.cvlibs.net/download.php?file=data_object_label_2.zip) to $DATA_DOWNLOAD_DIR.
 
 The data will then be extracted to have
 * training images in `$LOCAL_DATA_DIR/training/image_2`
 * training labels in `$LOCAL_DATA_DIR/training/label_2`
 * testing images in `$LOCAL_DATA_DIR/testing/image_2`
 
You may use this notebook with your own dataset as well. To use this example with your own dataset, please follow the same directory structure as mentioned below.

*Note: There are no labels for the testing images, therefore we use it just to visualize inferences for the trained model.*

### 2.1 Download the dataset <a class="anchor" id="head-2-1"></a>

Once you have gotten the download links in your email, please populate them in place of the `KITTI_IMAGES_DOWNLOAD_URL` and the `KITTI_LABELS_DOWNLOAD_URL`. This next cell, will download the data and place in `$LOCAL_DATA_DIR`

In [None]:
import os
!mkdir -p $LOCAL_DATA_DIR
os.environ["URL_IMAGES"]=KITTI_IMAGES_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then wget $URL_IMAGES -O $LOCAL_DATA_DIR/data_object_image_2.zip; else echo "image archive already downloaded"; fi 
os.environ["URL_LABELS"]=KITTI_LABELS_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then wget $URL_LABELS -O $LOCAL_DATA_DIR/data_object_label_2.zip; else echo "label archive already downloaded"; fi 

### 2.2 Verify the downloaded dataset <a class="anchor" id="head-2-2"></a>

In [None]:
# Check the dataset is present
!mkdir -p $LOCAL_DATA_DIR
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi

In [None]:
# This may take a while: verify integrity of zip files 
!sha256sum $LOCAL_DATA_DIR/data_object_image_2.zip | cut -d ' ' -f 1 | grep -xq '^351c5a2aa0cd9238b50174a3a62b846bc5855da256b82a196431d60ff8d43617$' ; \
if test $? -eq 0; then echo "images OK"; else echo "images corrupt, re-download!" && rm -f $LOCAL_DATA_DIR/data_object_image_2.zip; fi 
!sha256sum $LOCAL_DATA_DIR/data_object_label_2.zip | cut -d ' ' -f 1 | grep -xq '^4efc76220d867e1c31bb980bbf8cbc02599f02a9cb4350effa98dbb04aaed880$' ; \
if test $? -eq 0; then echo "labels OK"; else echo "labels corrupt, re-download!" && rm -f $LOCAL_DATA_DIR/data_object_label_2.zip; fi 

In [None]:
# unpack 
!unzip -u $LOCAL_DATA_DIR/data_object_image_2.zip -d $LOCAL_DATA_DIR
!unzip -u $LOCAL_DATA_DIR/data_object_label_2.zip -d $LOCAL_DATA_DIR

In [None]:
# verify
import os

DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
num_training_images = len(os.listdir(os.path.join(DATA_DIR, "training/image_2")))
num_training_labels = len(os.listdir(os.path.join(DATA_DIR, "training/label_2")))
num_testing_images = len(os.listdir(os.path.join(DATA_DIR, "testing/image_2")))
print("Number of images in the train/val set. {}".format(num_training_images))
print("Number of labels in the train/val set. {}".format(num_training_labels))
print("Number of images in the test set. {}".format(num_testing_images))

In [None]:
# Directory where splitted dataset will be located
!mkdir -p $LOCAL_DATA_DIR/kitti_split
# Generate val dataset out of training dataset
!python3 ../ssd/generate_split.py --input_image_dir=$LOCAL_DATA_DIR/training/image_2 \
                                  --input_label_dir=$LOCAL_DATA_DIR/training/label_2 \
                                  --output_dir=$LOCAL_DATA_DIR/kitti_split

In [None]:
# Convert RGB images to (fake) 16-bit grayscale
!pip3 install numpy==1.22.2 Pillow==9.0.1
import os
import numpy as np
from PIL import Image
def to16bit(img_file):
    image = Image.open(img_file).convert("L")
    # shifted to the higher byte to get a fake 16-bit image
    image_np = np.array(image) * 256
    image16 = Image.fromarray(image_np.astype(np.uint32))
    # overwrite the image file
    print(f"Converting {img_file} to 16-bit grayscale")
    image16.save(img_file)

In [None]:
# Generate 16-bit grayscale images for train/val splits
!mkdir -p $LOCAL_DATA_DIR/kitti_split/training/image_16bit_grayscale
!cp $LOCAL_DATA_DIR/kitti_split/training/image/* $LOCAL_DATA_DIR/kitti_split/training/image_16bit_grayscale
for img_file in os.listdir(os.path.join(os.environ["LOCAL_DATA_DIR"], "kitti_split/training/image_16bit_grayscale")):
    image_file = os.path.join(os.environ["LOCAL_DATA_DIR"], "kitti_split/training/image_16bit_grayscale", img_file)
    to16bit(image_file)

In [None]:
!mkdir -p $LOCAL_DATA_DIR/kitti_split/val/image_16bit_grayscale
!cp $LOCAL_DATA_DIR/kitti_split/val/image/* $LOCAL_DATA_DIR/kitti_split/val/image_16bit_grayscale
for img_file in os.listdir(os.path.join(os.environ["LOCAL_DATA_DIR"], "kitti_split/val/image_16bit_grayscale")):
    image_file = os.path.join(os.environ["LOCAL_DATA_DIR"], "kitti_split/val/image_16bit_grayscale", img_file)
    to16bit(image_file)

Additionally, if you have your own dataset already in a volume (or folder), you can mount the volume on `LOCAL_DATA_DIR` (or create a soft link). Below shows an example:
```bash
# if your dataset is in /dev/sdc1
mount /dev/sdc1 $LOCAL_DATA_DIR

# if your dataset is in folder /var/dataset
ln -sf /var/dataset $LOCAL_DATA_DIR
```

In [None]:
# If you use your own dataset, you will need to run the code below to generate the best anchor shape

# !tao model yolo_v4 kmeans -l $DATA_DOWNLOAD_DIR/kitti_split/training/label \
#                     -i $DATA_DOWNLOAD_DIR/kitti_split/training/image_16bit_grayscale \
#                     -n 9 \
#                     -x 1248 \
#                     -y 384

# The anchor shape generated by this script is sorted. Write the first 3 into small_anchor_shape in the config
# file. Write middle 3 into mid_anchor_shape. Write last 3 into big_anchor_shape.

### 2.3 Generate tfrecords <a class="anchor" id="head-2-3"></a>

In [None]:
!tao model yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train_16bit_grayscale.txt \
                             -o $DATA_DOWNLOAD_DIR/yolo_v4/tfrecords/train_16bit_grayscale \
                             -r $USER_EXPERIMENT_DIR/

In [None]:
!tao model yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_val_16bit_grayscale.txt \
                             -o $DATA_DOWNLOAD_DIR/yolo_v4/tfrecords/val_16bit_grayscale \
                             -r $USER_EXPERIMENT_DIR/

## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Augmentation parameters for on-the-fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.
* Whether to use quantization aware training (QAT)

In [None]:
!cat $LOCAL_SPECS_DIR/yolo_v4_train_resnet18_kitti_16bit_grayscale.txt

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tao model yolo_v4 train -e $SPECS_DIR/yolo_v4_train_resnet18_kitti_16bit_grayscale.txt \
                   -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                   --gpus 1

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/yolov4_training_log_resnet18.csv
%set_env EPOCH=080

## 5. Evaluate trained models <a class="anchor" id="head-5"></a>

In [None]:
!tao model yolo_v4 evaluate -e $SPECS_DIR/yolo_v4_train_resnet18_kitti_16bit_grayscale.txt \
                      -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_$EPOCH.hdf5

## 6. Prune trained models <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.5` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tao model yolo_v4 prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_$EPOCH.hdf5 \
                   -e $SPECS_DIR/yolo_v4_train_resnet18_kitti_16bit_grayscale.txt \
                   -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/yolov4_resnet18_pruned.hdf5 \
                   -eq intersection \
                   -pth 0.1

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

## 7. Retrain pruned models <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!sed -i 's,EXPERIMENT_DIR,'"$USER_EXPERIMENT_DIR"',' $LOCAL_SPECS_DIR/yolo_v4_retrain_resnet18_kitti_16bit_grayscale.txt
!cat $LOCAL_SPECS_DIR/yolo_v4_retrain_resnet18_kitti_16bit_grayscale.txt

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights
!tao model yolo_v4 train --gpus 1 \
                   -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti_16bit_grayscale.txt \
                   -r $USER_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Listing the newly retrained model.
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/yolov4_training_log_resnet18.csv
%set_env EPOCH=080

## 8. Evaluate retrained model <a class="anchor" id="head-8"></a>

In [None]:
!tao model yolo_v4 evaluate -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti_16bit_grayscale.txt \
                      -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.hdf5

## 9. Model Export <a class="anchor" id="head-9"></a>

If you trained a non-QAT model, you may export in FP32, FP16 or INT8 mode using the code block below. For INT8, you need to provide calibration image directory.

In [None]:
# tao <task> export will fail if .onnx already exists. So we clear the export folder before tao <task> export
!rm -rf $LOCAL_EXPERIMENT_DIR/export
!mkdir -p $LOCAL_EXPERIMENT_DIR/export
# Generate .onnx file using tao container
!tao model yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.hdf5 \
                    -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.onnx \
                    -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti_16bit_grayscale.txt \
                    --target_opset 12 \
                    --gen_ds_config

Using the `tao deploy` container, you can generate a TensorRT engine and verify the correctness of the generated through evaluate and inference.

The `tao deploy` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please run `tao deploy` command which will instantiate a deploy container, with the exported `.onnx` file on your target device. The `tao deploy` container only works for x86, with discrete NVIDIA GPU's.

For the jetson devices, please download the tao-converter for jetson and refer to [here](https://docs.nvidia.com/tao/tao-toolkit-archive/tao-30-2108/text/tensorrt.html#installing-the-tao-converter) for more details.

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.onnx` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

In [None]:
# Convert to TensorRT engine (FP32). Change --data_type to fp16 for FP16 mode
!tao deploy yolo_v4 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.onnx \
                                   -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti_16bit_grayscale.txt \
                                   --batch_size 16 \
                                   --min_batch_size 1 \
                                   --opt_batch_size 8 \
                                   --max_batch_size 16 \
                                   --data_type fp32 \
                                   --engine_file $USER_EXPERIMENT_DIR/export/trt.engine \
                                   --results_dir $USER_EXPERIMENT_DIR/export

In [None]:
# Convert to TensorRT engine (INT8).
!tao deploy yolo_v4 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.onnx \
                                   -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti_16bit_grayscale.txt \
                                   --cal_image_dir $DATA_DOWNLOAD_DIR/kitti_split/training/image_16bit_grayscale \
                                   --data_type int8 \
                                   --batch_size 16 \
                                   --min_batch_size 1 \
                                   --opt_batch_size 8 \
                                   --max_batch_size 16 \
                                   --batches 10 \
                                   --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                                   --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                                   --engine_file $USER_EXPERIMENT_DIR/export/trt.engine \
                                   --results_dir $USER_EXPERIMENT_DIR/export

In [None]:
print('Exported model:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/export