<p> <center> <a href="../Start_here.ipynb">Home Page</a> </center> </p>

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="1.Data_labeling_and_preprocessing.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 34%; text-align: center;">
        <a href="1.Data_labeling_and_preprocessing.ipynb">1</a>
        <a >2</a>
        <a href="3.Model_deployment_with_Triton_Inference_Server.ipynb">3</a>
        <a href="4.Model_deployment_with_DeepStream.ipynb">4</a>
        <a href="5.Measure_object_size_using_OpenCV.ipynb">5</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="3.Model_deployment_with_Triton_Inference_Server.ipynb">Next Notebook</a></span>
</div>

# Object detection using TAO YOLOv4

***

**The goal of this notebook is to make you understand how to:**

- Perform offline data augmentation to increase the dataset size
- Take a pretrained resnet18 model and train a ResNet-18 Yolo_v4 model on a dataset in KITTI format
- Prune the trained Yolo_v4 model
- Retrain the pruned model to recover lost accuracy
- Quantize the pruned model using QAT
- Export the pruned model
- Run inference on the trained model
- Export the pruned, quantized and retrained model to a .etlt file for deployment to DeepStream
- Run inference on the exported .etlt model to verify deployment using TensorRT

**Contents of this notebook:**

- [Transfer learning with TAO](#Transfer-learning-with-TAO)
    - [API key](#API-key)
- [Set up env variables and map drives](#Set-up-env-variables-and-map-drives)
- [Offline data augmentation](#Offline-data-augmentation)
    - [Configuring the augmentor](#Configuring-the-augmentor)
    - [Generate the augmentation spec file](#Generate-the-augmentation-spec-file)
    - [Augment the dataset](#Augment-the-dataset)
    - [Visualize augmented results](#Visualize-augmented-results)
- [Prepare dataset and pre-trained model](#Prepare-dataset-and-pre-trained-model)
    - [Generate anchor shape](#Generate-anchor-shape)
    - [Generate TFRecords](#Generate-TFRecords)
    - [Download pre-trained model](#Download-pre-trained-model)
- [Provide training specification](#Provide-training-specification)
- [Run TAO training](#Run-TAO-training)
- [Evaluate trained model](#Evaluate-trained-model)
- [Prune trained model](#Prune-trained-model)
- [Retrain pruned model](#Retrain-pruned-model)
- [Evaluate retrained model](#Evaluate-retrained-model)
- [Visualize inferences](#Visualize-inferences)
- [Model export](#Model-export)
- [Generate TensorRT engine](#Generate-TensorRT-engine)
- [Verify deployed model](#Verify-deployed-model)

## Transfer learning with TAO

In our previous notebook, we saw how to annotate a new dataset for object detection and how to convert it to KITTI format in order to perform transfer learning within the TAO Toolkit. Transfer learning is the commonly used process of transferring learned features from one domain to another with minimal effort by taking a model trained on one task and re-training it to perform a different task. 

Train Adapt Optimize (TAO) Toolkit by NVIDIA is a simple and easy-to-use Python-based AI toolkit for taking purpose-built AI models and customizing them with users' own data to create custom Computer Vision (CV) and Conversational AI models. This notebook shows an example use case of YOLO v4 object detection using TAO.

<img src="images/tao_toolkit.jpeg" width="720">
<div style="font-size:11px">Source: https://developer.nvidia.com</div><br>

In [None]:
# View the versions of the TAO launcher
!tao info

### API key

Before TAO can be used, you need to register at [ngc.nvidia.com](https://catalog.ngc.nvidia.com/) and proceed to generate an API key. Below is a step-by-step process to achieve this:
- From your browser visit `ngc.nvidia.com`
- Click on `Register for NGC`
- Click on the `Continue` button where `NVIDIA Account (Use existing or create a new NVIDIA account)` is written
- Fill in the required information and register. Thereafter you may proceed to log in with your new account credentials
- In the top right corner, click on your username and select `Setup` in the dropdown menu
- Proceed and click on the `Get API Key` button
- Next, you will find a `Generate API Key` button in the upper right corner. After clicking on this button, a dialog box should appear and you have to click on the `Confirm` button
- Finally, copy the generated API key and username and save them somewhere on your local system

<img src="images/ngc_setup_key.png" width="720">
<img src="images/ngc_key.png" width="720">

Your API key represents your credentials:
- Used for programmatic interaction (e.g. NGC docker registry `nvcr.io`)
- Uniquely identifies you (think of it as "username & password")
- There can only be one (regenerating your API key invalidates the old one)

## Set up env variables and map drives

When installed, the TAO launcher CLI abstracts the user from having to instantiate and run several docker containers and maps the commands accordingly. However, since the launcher uses docker containers under the hood, drives need to be mapped to the docker. The launcher instance can be configured in the `~/.tao_mounts.json` file.

<img src="images/tao_tf_user_interaction.png" width="720">
<div style="font-size:11px">Source: https://docs.nvidia.com/tao/tao-toolkit</div><br>

When using the purpose-built pretrained models from [NGC](https://catalog.ngc.nvidia.com/), please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the user's workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, the sample spec files are expected to be present in `$LOCAL_PROJECT_DIR/specs`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/yolo_v4`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.

In [None]:
# Setting up env variables for cleaner command line commands
import os

print("Please replace the variable with your key.")
%env KEY=nvidia_tlt

# If using a virtual environment and Docker, please define the local project directory that needs to be mapped to the TAO docker session.
# The dataset is expected to be present in $LOCAL_PROJECT_DIR/data, while the results from the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/yolo_v4
# The sample spec files are expected to be present in $LOCAL_PROJECT_DIR/specs

# Singularity, please do not modify
%env LOCAL_PROJECT_DIR=/workspace/tao-experiments
# Virtual environment + Docker, set full path to the local workspace
#%env LOCAL_PROJECT_DIR=~/end_to_end_CV/workspace

# Paths inside the container, please do not modify
%env DATA_DOWNLOAD_DIR=/workspace/tao-experiments/data
%env USER_EXPERIMENT_DIR=/workspace/tao-experiments/yolo_v4
%env SPECS_DIR=/workspace/tao-experiments/specs

# Local paths, if using Docker please set the LOCAL_PROJECT_DIR variable above
os.environ["LOCAL_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "yolo_v4")
os.environ["LOCAL_SPECS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "specs")

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

In [None]:
# Create local dir
!mkdir -p $LOCAL_EXPERIMENT_DIR
!mkdir -p $LOCAL_PROJECT_DIR/models

The cell below maps the project directory on your local host to a workspace directory in the TAO docker instance, so that the data and the results are mapped from outside to inside of the docker instance.

In [None]:
# Mapping up the local directories to the TAO docker
import json
mounts_file = os.path.expanduser("~/.tao_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
    "Mounts": [
        # Mapping the data directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
        # Mapping the specs directory
        {
            "source": os.environ["LOCAL_SPECS_DIR"],
            "destination": os.environ["SPECS_DIR"]
        },
    ]
}

# Writing the mounts file
with open(mounts_file, "w") as mfile:
    json.dump(drive_map, mfile, indent=4)

In [None]:
# Show the mounts file
!cat ~/.tao_mounts.json

## Offline data augmentation

The success of an object detection application is highly dependent on the quality of the data. Acquiring curated and annotated datasets can be very expensive, tiring, and time-consuming. Furthermore, it is very difficult to estimate all the corner cases that a network may go through. 

Online augmentation in the training data loader is a good way to increase the variation in the dataset and the overall performance. However, the augmented data is generated randomly and in order to achieve good accuracy, the model may need to be trained for a long time. To get around this and generate a dataset with the required augmentations and give full control to the user, TAO Toolkit provides an offline augmentation tool called `augment`. Offline augmentation can dramatically increase the size of the dataset when collecting and labeling data is expensive or not possible. The `augment` tool provides several custom GPU accelerated augmentation routines categorized into:
- **Spatial augmentation**
- **Color space augmentation**
- **Image blur**

Spatial augmentation comprises routines like `Rotate`, `Resize`, `Translate`, `Shear`, and `Flip` whereas color space augmentation supports `Hue rotation`, `Brightness offset`, and `Contrast shift`. Along with these augmentation operations, `augment` also enables users to blur images, using a Gaussian blur operator.

All augmentation routines currently provided with `augment` are supported only for an object detection dataset. The spatial augmentation routines are applied to the images as well as the labeled data coordinates, while the color augmentation routines and channel-wise blur operator are applied only to images as the object labels are not affected. The sample workflow of using `augment` fits into the general TAO pipeline diagram as follows:

<img src="images/augmenting.png" width="720">
<div style="font-size:11px">Source: https://docs.nvidia.com/tao/tao-toolkit</div><br>

The data is expected to be in KITTI format. The following sections detail how to configure and use the augmentation tool.

### Configuring the augmentor

The augmentor has several components which the user can configure by using a simple protobuf-based configuration file. The configuration file is made up of four main blocks:
- **Spatial augmentation config**
- **Color augmentation config**
- **Blur config**
- **Data dimensions**

which have to be specified with nested protobuf elements and global parameters. Let's take a quick overview of each component to understand how to create a configuration file from scratch. Configuration files are required for most TAO operations and therefore will be a recurring pattern throughout the notebook.

#### Spatial augmentation config

[Spatial augmentation config](https://docs.nvidia.com/tao/tao-toolkit/text/offline_data_augmentation.html#spatial-augmentation-config) contains parameters to configure the spatial augmentation routines. `spatial_config` is a nested protobuf element containing protobuf elements for all the supported spatial augmentation operations, namely `rotation_config`, `flip_config`, `translation_config`, `shear_config`. How to configure each of them is explained in the dedicated part in the documentation linked above. If you don’t wish to introduce any of the supported augmentation operations, just omit the corresponding field. When defining multiple proto elements, it implies that all the augmentation operations are cascaded.

Here is an example configuration file for `spatial_config` that augments the image by:

1. Flipping along the horizontal axis
2. Rotating an image by 10 degrees
3. Translating along y-axis by 20 pixels

```python
# Spatial augmentation config
spatial_config{
  flip_config{
    flip_horizontal: true
  }
  rotation_config{
    angle: 10.0
    units: "degrees"
  }
  translation_config{
    translate_y: 20
  }
}
```

#### Color augmentation config

[Color augmentation config](https://docs.nvidia.com/tao/tao-toolkit/text/offline_data_augmentation.html#color-augmentation-config) contains parameters to configure the color space augmentation routines. This is a nested protobuf element called `color_config` containing protobuf elements for all the color augmentation operations, namely `hue_saturation_config`, `contrast_config`, `brightness_config`. For more information on how to configure the parameters, please refer to the dedicated part in the documentation linked just above.

Here is an example configuration file for `color_config` that augments the image by:

1. Applying hue rotation and color saturation augmentation
2. Applying a channel-wise brightness shift

```python
# Color augmentation config
color_config{
  hue_saturation_config{
    hue_rotation_angle: 10.0
    saturation_shift: 1.0
  }
  brightness_config{
    offset: 10
  }
}
```

#### Blur config

The `blur_config` protobuf element configures the channel-wise [Gaussian blur](https://docs.nvidia.com/tao/tao-toolkit/text/offline_data_augmentation.html#blur-config) operator to an image. A Gaussian kernel is formulated based on the parameters `size` and `std` and then a 2D convolution is performed between the image and kernel per channel.

Here is an example configuration file for `blur_config` that augments the image by applying a Gaussian blur over a 5x5 square:

```python
# Blur config
blur_config{
  size: 5
  std: 1.0
}
```

#### Data dimensions

The last component the user can configure for the augmentor is the output data size. An example configuration file is shown below:

```python
# Data dimensions
output_image_width: 640
output_image_height: 384
output_image_channel: 3
image_extension: ".png"
```

### Generate the augmentation spec file

By chaining all the pieces together, we get a complete configuration file for augmenting images with `augment`. Run the cell below to view one that summarizes everything we've seen so far. If you want to modify the configuration hyperparameters or just experiment with different augmentations, you can access the file in the `spec` folder by searching for it in the upper left part of JupyterLab or by clicking [here](../specs/default_spec.txt). Please remember to save the file with `ctrl s` after each desired modification and then rerun the cell below to see if the changes have been reflected.

In [None]:
!cat $LOCAL_SPECS_DIR/default_spec.txt

### Augment the dataset

Once the configuration file has been generated, the TAO `augment` tool is invoked with a simple command line interface. You will see that this is also a recurring pattern for TAO throughout the notebook. The command to launch the `augment` tool expects the following parameters:
- `-d`: the path to the detection dataset
- `-a`: the path to augmentation spec file
- `-o`: the path to the augmented output dataset
- `-v`: optional flag to get detailed logs during the augmentation process

Please pay attention to the type of paths required by the TAO command below - these are paths accessible inside of the TAO docker instance that were previously mapped. In this notebook, local paths are recognizable by the prefix `LOCAL` instead.

In [None]:
!tao augment -d $DATA_DOWNLOAD_DIR/training \
             -a $SPECS_DIR/default_spec.txt \
             -o $DATA_DOWNLOAD_DIR/augmented_dataset \
#             -v

### Visualize augmented results

Now that the dataset has been augmented, it is worthwhile to render the augmented images and labels. The outputs of `augment` are generated in the following paths:
- images: `$LOCAL_DATA_DIR/augmented_dataset/image_2`
- labels: `$LOCAL_DATA_DIR/augmented_dataset/label_2`

If you would like to visualize images with overlain bounding boxes, then please run the cell above with the optional `-v` flag enabled. This generates annotated outputs at:
- annotated images: `$LOCAL_DATA_DIR/augmented_dataset/images_annotated`

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['LOCAL_DATA_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img)

In [None]:
# Visualizing the first 12 images
# If you would like to view sample annotated images, then please re-run the augment command with the -v flag
# and update the output path below to augmented_dataset/images_annotated
OUTPUT_PATH = 'augmented_dataset/image_2' # relative path from $LOCAL_DATA_DIR
COLS = 4 # number of columns in the visualizer grid
IMAGES = 12 # number of images to visualize

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## Prepare dataset and pre-trained model

For this tutorial, we will be using the fruit dataset in KITTI format from the previous notebook. The dataset is already augmented and structured to have:
- training images in `$LOCAL_DATA_DIR/training/image_2`
- training labels in `$LOCAL_DATA_DIR/training/label_2`
- testing images in `$LOCAL_DATA_DIR/testing/image_2`
 
You may use this notebook with your own dataset as well. To use this example with your own dataset, please follow the same directory structure as mentioned above. Note that there are no labels for the testing images, so we only use them just to visualize inferences for the trained model.

Let's check that the number of images in the directories is what we expect.

In [None]:
# verify data
DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
num_training_images = len(os.listdir(os.path.join(DATA_DIR, "training/image_2")))
num_training_labels = len(os.listdir(os.path.join(DATA_DIR, "training/label_2")))
num_testing_images = len(os.listdir(os.path.join(DATA_DIR, "testing/image_2")))
print("Number of images in the train/val set. {}".format(num_training_images))
print("Number of labels in the train/val set. {}".format(num_training_labels))
print("Number of images in the test set. {}".format(num_testing_images))

Let's see an example of a kitti label in our dataset.

In [None]:
!cat $LOCAL_DATA_DIR/training/label_2/0001.txt

Now we generate the validation dataset out of the training dataset by sampling 10% of the images.

In [None]:
!python3 ../source_code/N2/generate_val_dataset.py --input_image_dir=$LOCAL_DATA_DIR/training/image_2 \
                                                   --input_label_dir=$LOCAL_DATA_DIR/training/label_2 \
                                                   --output_dir=$LOCAL_DATA_DIR/val

Additionally, if you have your own dataset already in a volume (or folder), you can mount the volume on `LOCAL_DATA_DIR` (or create a soft link). Below shows an example:
```bash
# if your dataset is in /dev/sdc1
mount /dev/sdc1 $LOCAL_DATA_DIR

# if your dataset is in folder /var/dataset
ln -sf /var/dataset $LOCAL_DATA_DIR
```

### Generate anchor shape

If you use your own dataset, you will need to run the code below to generate the best anchor shape. The anchor shape should match most ground truth boxes in the dataset to help the network learn more precise bounding boxes. YOLOv4 uses this information to capture and incorporate in advance the scale and aspect ratio of specific object classes we want to detect.

The anchor shape generated by this script is sorted. Later on, write the first 3 lines into `small_anchor_shape` in the config files, write the middle 3 into `mid_anchor_shape`, and write the last 3 into `big_anchor_shape`.

In [None]:
# !tao yolo_v4 kmeans -l $DATA_DOWNLOAD_DIR/training/label_2 \
#                     -i $DATA_DOWNLOAD_DIR/training/image_2 \
#                     -n 9 \
#                     -x 640 \
#                     -y 384

### Generate TFRecords

YOLOv4 supports two data formats: the sequence format (images folder and raw labels folder with KITTI format) and the tfrecords format (images folder and TFRecords). If you prefer to use the sequence data format during training, you can skip this section. To use sequence data format, please use spec file `yolo_v4_train_resnet18_kitti_seq.txt` and `yolo_v4_retrain_resnet18_kitti_seq.txt` later on. You can check the [documentation](https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html#dataset-config) for more details about tfrecords generation and sequence data format usage.

To generate TFRecords for YOLOv4 training, use the `dataset_convert` command with arguments:
- `-d`: path to the dataset spec file
- `-o`: path to the output TFRecords file

In [None]:
!tao yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_train.txt \
                             -o $DATA_DOWNLOAD_DIR/training/tfrecords/train

In [None]:
!tao yolo_v4 dataset_convert -d $SPECS_DIR/yolo_v4_tfrecords_kitti_val.txt \
                             -o $DATA_DOWNLOAD_DIR/val/tfrecords/val

### Download pre-trained model

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](https://catalog.ngc.nvidia.com/) and click on `Setup` in the navigation bar. To view all the backbones that are supported by object detection architecture in TAO, run the cell below. We will use a pretrained resnet18 model for this tutorial.

In [None]:
!ngc registry model list nvidia/tao/pretrained_object_detection:*

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/

Now we download the pretrained resnet18 model from NGC.

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_object_detection:resnet18 \
                    --dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet18

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/pretrained_object_detection_vresnet18

## Provide training specification

TAO Toolkit requires a configuration spec file in order to train any model. For YOLOv4, it has six major components: `yolov4_config`, `training_config`, `eval_config`, `nms_config`, `augmentation_config`, and `dataset_config`. The format of the spec file is a protobuf text (prototxt) message, and each of its fields can be either a basic data type or a nested message. More information on how to configure each of these protobufs can be found in the [TAO YOLOv4 documentation](https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html#creating-a-configuration-file). 

The main parameters to be modified during the experimental phase concern:
- Augmentation parameters for on-the-fly data augmentation
- Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.
- Whether to use quantization aware training (QAT) or not

With the next command, we provide the pretrained model path on-the-fly, writing our experiment directory in the configuration file.

In [None]:
!sed -i 's,EXPERIMENT_DIR,'"$USER_EXPERIMENT_DIR"',' $LOCAL_SPECS_DIR/yolo_v4_train_resnet18_kitti.txt

By default, the provided sample spec files (`yolo_v4_train_resnet18_kitti.txt` & `yolo_v4_retrain_resnet18_kitti.txt`) disable QAT training. To enable QAT training on a sample spec file, uncomment and run the following lines. QAT emulates the inference time quantization during training, allowing the model to adapt and mitigate in advance the quantization error on weights and tensors when an actual quantized model is generated. The benefit of QAT training is usually a better accuracy when doing INT8 inference with TensorRT compared with traditional calibration-based INT8 TensorRT inference.

In [None]:
# !sed -i "s/enable_qat: false/enable_qat: true/g" $LOCAL_SPECS_DIR/yolo_v4_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: false/enable_qat: true/g" $LOCAL_SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt

The recommended workflow for training a quantization aware model is depicted in the diagram below:

<img src="images/tao_cv_qat_workflow.png" width="720">
<div style="font-size:11px">Source: https://docs.nvidia.com/tao/tao-toolkit</div><br>

You can restore non-QAT training by uncommenting and running the lines below.

In [None]:
# !sed -i "s/enable_qat: true/enable_qat: false/g" $LOCAL_SPECS_DIR/yolo_v4_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: true/enable_qat: false/g" $LOCAL_SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt

Run the cell below to view the model spec configuration file. If you want to modify the configuration hyperparameters, you can access the file in the `spec` folder by searching for it in the upper left part of JupyterLab. Please remember to save the file with `ctrl s` after each desired modification and then rerun the cell below to see if the changes have been reflected. If you generated the best anchor shape for your dataset, go ahead and edit the config file now.

Note that in the spec file `arch` is set to `"resnet"` as the backbone for feature extraction. Other choices include `“vgg”`, `“darknet”`, `“googlenet”`, `“mobilenet_v1”`, `“mobilenet_v2”`, `“cspdarknet”`, and `“squeezenet”`, but you will need to download the corresponding model first.

In [None]:
!cat $LOCAL_SPECS_DIR/yolo_v4_train_resnet18_kitti.txt

## Run TAO training

To launch training, provide the required sample spec file, the output directory location for models, and the encryption key to decrypt the model. Please note some important parameter definitions: 
- `-e`: the experiment specification file to set up the evaluation experiment
- `-r`: the path to the folder where the experiment output is written
- `-k`: the encryption key to decrypt the model
- `--gpus`: the number of GPUs to use for training

Additional optional parameters are available [here](https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html#training-the-model). In particular, TAO Toolkit now supports training with automatic mixed precision (AMP), speeding up math-intensive operations and memory-limited operations while not compromising accuracy. Enabling AMP is as simple as setting the `--use_amp` flag at the command line when running the `train` command. This will help speed up the training by using FP16 tensor cores. Note that AMP is only supported on GPUs with Volta or above architecture.

Please be aware that depending on the task, training may take several hours or one day to complete.

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tao yolo_v4 train -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
                   -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                   -k $KEY \
                   --gpus 1 \
                   --use_amp

In case you want to resume training from a checkpoint, please change `pretrain_model_path` to `resume_model_path` in the configuration file.

In [None]:
print("Model for each epoch:")
print("---------------------")
!ls -ltrh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.

In [None]:
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/yolov4_training_log_resnet18.csv
%set_env EPOCH=030

## Evaluate trained model

Once training is complete and a selected model is available, it can be evaluated from the command line by providing the experiment spec file, the encryption key, and the path to the model file with the `-m` flag. Evaluation can only run on a single GPU, so when the machine has multiple GPUs installed, the optional `--gpu_index` flag specifies the GPU index to use.

In [None]:
!tao yolo_v4 evaluate -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
                      -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                      -k $KEY

## Prune trained model

Ease of model pruning is one of the key differentiators for the TAO Toolkit. Pruning involves removing from the neural network nodes that contribute less to the overall accuracy, reducing the size of the model, significantly reducing its memory footprint, and increasing inference throughput. All these factors are very important for edge deployment and fast real-time execution at high frame rates.

<img src="images/pruned_vs_unpruned.png" width="720">
<div style="font-size:11px">Source: https://docs.nvidia.com/tao/tao-toolkit</div><br>

The `tao yolo_v4 prune` command includes these parameters:
- `-m`: path to pretrained YOLOv4 model
- `-e`: path to the experiment specification file
- `-o`: output directory to store the pruned model
- `-k`: the key to save and load the model
- `-eq`: equalization criterion (only for ResNets as they have element-wise operations or MobileNets)
- `-pth`: threshold for pruning

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade-off. Higher `pth` gives you a smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model: `0.4` in the block below is just a starting point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tao yolo_v4 prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                   -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
                   -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/yolov4_resnet18_pruned.tlt \
                   -eq intersection \
                   -pth 0.4 \
                   -k $KEY

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

## Retrain pruned model

Once the model has been pruned, there might be a slight decrease in accuracy because some previously useful weights may have been removed. The model needs to be retrained on the same dataset to restore the lost accuracy. Therefore, a new retraining specification file must be specified. Please be aware that depending on the task, retraining may take several hours or one day to complete.

Here we show the content of the retrain spec file. The newly pruned model has been included as pretrained weights. If you generated the anchor shape for your dataset, please remember to update it in the retrain spec file as well.

In [None]:
# Printing the retrain spec file
# Here we have updated the spec file to include the newly pruned model as pretrained weights
!sed -i 's,EXPERIMENT_DIR,'"$USER_EXPERIMENT_DIR"',' $LOCAL_SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt
!cat $LOCAL_SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain

Now it is possible to retrain using the pruned model as pretrained weights. To do this, the `tao yolo_v4 train` command is invoked again with an updated spec file that points to the newly pruned model.

In [None]:
!tao yolo_v4 train -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                   -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
                   -k $KEY \
                   --gpus 1 \
                   --use_amp

In [None]:
# Listing the newly retrained model
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights

Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.

In [None]:
!cat $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/yolov4_training_log_resnet18.csv
%set_env EPOCH=015

## Evaluate retrained model

Once retraining is complete, the new pruned version of the model can be assessed as before.

In [None]:
!tao yolo_v4 evaluate -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                      -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                      -k $KEY

## Visualize inferences

In this section, we run the inference tool for YOLOv4 networks to display the results in the form of bounding boxes for a subset of images. The `inference` command requires to specify the following parameters:
- `-i`: the directory of input images for inference
- `-o`: the directory path to output annotated images
- `-e`: path to the experiment spec file used for training
- `-m`: the path to the trained model (TAO model) or TensorRT engine
- `-k`: the key to load model (not needed if the model is a TensorRT engine)

In addition, predicted labels in KITTI format can be saved to an output label directory with the flag `-l` and the GPU to run inference on can be specified with the flag `--gpu_index`. Also, the `-t` flag sets the confidence threshold for drawing a bounding box and is set to `0.3` by default.

Below, we copy some test images to see our model in action on unseen data.

In [None]:
# Copy some test images
!mkdir -p $LOCAL_DATA_DIR/test_samples
!cp $LOCAL_DATA_DIR/testing/image_2/00* $LOCAL_DATA_DIR/test_samples/

We then run inference for detection on these images using the `inference` command.

In [None]:
!tao yolo_v4 inference -i $DATA_DOWNLOAD_DIR/test_samples \
                       -o $USER_EXPERIMENT_DIR/yolo_infer_images \
                       -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                       -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                       -l $USER_EXPERIMENT_DIR/yolo_infer_labels \
                       -k $KEY

This `inference` tool call produces two outputs:
- Overlain images in `$LOCAL_EXPERIMENT_DIR/yolo_infer_images`
- Frame by frame bbox labels in kitti format located in `$LOCAL_EXPERIMENT_DIR/yolo_infer_labels`

In [None]:
# Simple grid visualizer
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in sorted(os.listdir(output_path)) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images
OUTPUT_PATH = 'yolo_infer_images' # relative path from $USER_EXPERIMENT_DIR
COLS = 3 # number of columns in the visualizer grid
IMAGES = 9 # number of images to visualize

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## Model export

Exporting the model decouples the training process from inference and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment, while the exported model may be used universally across training and deployment hardware. The exported model is in `.etlt` format and is encrypted with the same key as the `.tlt` model from which it was exported. The key is required when deploying this model.

If you trained a non-QAT model, you may export in `FP32`, `FP16`, or `INT8` mode using the code block below. For `INT8`, you need to provide a calibration image directory.

In [None]:
# tao <task> export will fail if .etlt already exists. So we clear the export folder before tao <task> export
!rm -rf $LOCAL_EXPERIMENT_DIR/export
!mkdir -p $LOCAL_EXPERIMENT_DIR/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
                    -k $KEY \
                    -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt \
                    -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                    --batch_size 16 \
                    --data_type fp32 \
                    --gen_ds_config

# Uncomment to export in INT8 mode (generate calibration cache file). 
# !tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt  \
#                     -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt \
#                     -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
#                     -k $KEY \
#                     --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
#                     --data_type int8 \
#                     --batch_size 16 \
#                     --batches 10 \
#                     --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
#                     --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
#                     --gen_ds_config

Note that in this example, for ease of execution we restrict the number of calibrating batches to 10. TAO Toolkit recommends using at least 10% of the training dataset for int8 calibration.

If you train a QAT model, you may only export in `INT8` mode using the following code block. This generates a `.etlt` file and the corresponding calibration cache. You can throw away the calibration cache and just use the `.etlt` file in tao-converter or DeepStream for `FP32` or `FP16` mode, but please note this gives sub-optimal results. If you want to deploy in `FP32` or `FP16`, you should disable QAT in training.

In [None]:
# Uncomment to export QAT model in INT8 mode (generate calibration cache file).
# !rm -rf $LOCAL_EXPERIMENT_DIR/export
# !mkdir -p $LOCAL_EXPERIMENT_DIR/export
# !tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt  \
#                     -o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt \
#                     -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
#                     -k $KEY \
#                     --data_type int8 \
#                     --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin

In [None]:
print("Exported model:")
print("------------")
!ls -lh $LOCAL_EXPERIMENT_DIR/export

## Generate TensorRT engine

TAO Toolkit has been designed to integrate with Triton Inference Server, an open-source inference serving software, as well as DeepStream SDK, a streaming toolkit to accelerate building and deploying AI-based video analytic applications that we will discuss in a later notebook. To deploy a model trained with TAO Toolkit to DeepStream we have two options:
- **Option 1:** Integrate the `.etlt` model and calibration cache (for int8 mode) generated by `export` directly to DeepStream to automatically create the TensorRT engine file and then run inference on your target device
- **Option 2:** Generate a device-specific, optimized TensorRT engine ahead of time using `tao-converter`, that can also be ingested directly by DeepStream

In addition to the first option discussed before, in this section, we illustrate how to execute the second one as well, as a TensorRT engine will be used for deployment to Triton Inference Server. Please note that **a distinct engine should be generated for each environment and hardware configuration**, so the first option is the preferred one to minimize the chances of encountering problems. In fact, running an engine that was generated with a different version of TensorRT and CUDA is not supported and may fail to run altogether, or cause unknown behavior that affects inference speed, accuracy, and stability.

<img src="images/dstream_deploy_options.png" width="720">
<div style="font-size:11px">Source: https://docs.nvidia.com/tao/tao-toolkit</div><br>

The `tao-converter` utility included with the TAO docker produces optimized TensorRT engines for the platform that it resides on. Therefore, to get maximum performance, please instantiate this docker and execute the `tao-converter` command, with the exported `.etlt` file and calibration cache (for int8 mode) on your target device. The tao-converter utility included in this docker only works for x86 devices, with discrete NVIDIA GPUs. For the jetson devices, please download the tao-converter for jetson from the dev zone link [here](https://developer.nvidia.com/tao-converter). 

The `-p` argument in the following command is the optimization profile for `.etlt` models with dynamic shape. This should be in format `<input_node>,<min_shape>,<opimal_shape>,<max_shape>`, where each shape has the format `<n>x<c>x<h>x<w>`. In YOLOv4, the three shapes should only have differences at the first batch dimension.

In [None]:
# Convert to TensorRT engine (FP32)
!tao converter -k $KEY \
                   -p Input,1x3x384x640,8x3x384x640,16x3x384x640 \
                   -e $USER_EXPERIMENT_DIR/export/trt.engine \
                   -t fp32 \
                   $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt

# Convert to TensorRT engine (FP16)
# !tao converter -k $KEY \
#                    -p Input,1x3x384x640,8x3x384x640,16x3x384x640 \
#                    -e $USER_EXPERIMENT_DIR/export/trt.engine \
#                    -t fp16 \
#                    $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt

# Convert to TensorRT engine (INT8).
# !tao converter -k $KEY  \
#                    -p Input,1x3x384x640,8x3x384x640,16x3x384x640 \
#                    -c $USER_EXPERIMENT_DIR/export/cal.bin \
#                    -e $USER_EXPERIMENT_DIR/export/trt.engine \
#                    -b 8 \
#                    -t int8 \
#                    $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt

In [None]:
print("Exported engine:")
print("------------")
!ls -lh $LOCAL_EXPERIMENT_DIR/export/trt.engine

## Verify deployed model 

We can verify the converted engine by visualizing TensorRT inferences using the `tao yolo_v4 inference` command seen before.

In [None]:
# Infer using TensorRT engine
!tao yolo_v4 inference -m $USER_EXPERIMENT_DIR/export/trt.engine \
                       -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                       -i $DATA_DOWNLOAD_DIR/test_samples \
                       -o $USER_EXPERIMENT_DIR/yolo_infer_images \
                       -t 0.6

In [None]:
# Visualizing the sample images
OUTPUT_PATH = 'yolo_infer_images' # relative path from $USER_EXPERIMENT_DIR
COLS = 3 # number of columns in the visualizer grid
IMAGES = 9 # number of images to visualize

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

In this notebook, we have shown how to increase the dataset size with offline data augmentation and how to perform transfer learning starting from a pretrained resnet18 yolo_v4 model. We covered model pruning and optimization and provided all it is needed to include quantization in the training loop as well. Finally, we exported our best model first into a .etlt file for deployment to DeepStream, and a TensorRT engine for deployment on Triton Inference Server. We will explore both of these deployment options in detail going forward.

In order to successfully continue the lab, now please go back to the `README` file and follow the instructions there before running the next notebook covering deployment with Triton Inference Server.

***

## References

- [1] *https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/cv_samples/version/v1.4.1/files/yolo_v4/yolo_v4.ipynb*
- [2] *https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/cv_samples/version/v1.4.1/files/augment/augment.ipynb*

## Licensing

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0).

<br>
<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="1.Data_labeling_and_preprocessing.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 34%; text-align: center;">
        <a href="1.Data_labeling_and_preprocessing.ipynb">1</a>
        <a >2</a>
        <a href="3.Model_deployment_with_Triton_Inference_Server.ipynb">3</a>
        <a href="4.Model_deployment_with_DeepStream.ipynb">4</a>
        <a href="5.Measure_object_size_using_OpenCV.ipynb">5</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="3.Model_deployment_with_Triton_Inference_Server.ipynb">Next Notebook</a></span>
</div>

<br>
<p> <center> <a href="../Start_here.ipynb">Home Page</a> </center> </p>