# Object Detection using TAO DetectNet_v2

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/embedded-transfer-learning-toolkit-software-stack-1200x670px.png" width="1080"> 

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained resnet18 model and train a ResNet-18 DetectNet_v2 model on the KITTI dataset
* Prune the trained detectnet_v2 model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Quantize the pruned model using QAT
* Run Inference on the trained model
* Export the pruned, quantized and retrained model to a .etlt file for deployment to DeepStream
* Run inference on the exported. etlt model to verify deployment using TensorRT

### Table of Contents

This notebook shows an example usecase of Object Detection using DetectNet_v2 in the Train Adapt Optimize (TAO) Toolkit.

1. [Set up env variables and map drives](#head-0)
1. [Prepare dataset and pre-trained model](#head-2)
    1. [Download the dataset](#head-2-1)
    1. [Verify downloaded dataset](#head-2-2)
    1. [Prepare tfrecords from kitti format dataset](#head-2-3)
    2. [Download pre-trained model](#head-2-4)
2. [Provide training specification](#head-3)
3. [Run TAO training](#head-4)
4. [Evaluate trained models](#head-5)
5. [Prune trained models](#head-6)
6. [Retrain pruned models](#head-7)
7. [Evaluate retrained model](#head-8)
8. [Visualize inferences](#head-9)
9. [Model Export](#head-10)
    1. [Int8 Optimization](#head-10-1)
    2. [Generate TensorRT engine](#head-10-2)
10. [Verify Deployed Model](#head-11)
    1. [Inference using TensorRT engine](#head-11-1)
11. [QAT workflow](#head-12)
    1. [Convert pruned model to QAT and retrain](#head-12-1)
    2. [Evaluate QAT converted model](#head-12-2)
    3. [Export QAT trained model to int8](#head-12-3)
    4. [Evaluate a QAT trained model using the exported TensorRT engine](#head-12-4)
    5. [Inference using QAT engine](#head-12-5)

## 1. Set up env variables and map drives <a class="anchor" id="head-0"></a>
When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/detectnet_v2`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

*Note: This notebook currently is by default set up to run training using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` accordingly*

In [2]:
# Setting up env variables for cleaner command line commands.
import os

%env KEY=tlt_encode
%env NUM_GPUS=1

# Please define the local project directory 
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/detectnet_v2
# !PLEASE MAKE SURE TO UPDATE THIS PATH! AND ALL LOCAL PATHS IN SPECS FILES 

os.environ["LOCAL_PROJECT_DIR"] = FIX_ME

os.environ["LOCAL_DATA_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "data"
)
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "detectnet_v2"
)

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "detectnet_v2/specs"
)

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

env: KEY=tlt_encode
env: NUM_GPUS=1
total 40
-rw-r--r-- 1 jupyter jupyter 2432 Apr 29 15:01 detectnet_v2_inference_kitti_etlt_qat.txt
-rw-r--r-- 1 jupyter jupyter 2423 Apr 29 15:01 detectnet_v2_inference_kitti_etlt.txt
-rw-r--r-- 1 jupyter jupyter 2393 Apr 29 15:01 detectnet_v2_inference_kitti_tlt.txt
-rw-r--r-- 1 jupyter jupyter  284 Apr 29 15:01 detectnet_v2_tfrecords_kitti_trainval.txt
-rw-r--r-- 1 jupyter jupyter 5518 May  3 09:16 detectnet_v2_retrain_resnet18_kitti.txt
-rw-r--r-- 1 jupyter jupyter 5581 May  3 09:17 detectnet_v2_retrain_resnet18_kitti_qat.txt
-rw-r--r-- 1 jupyter jupyter 5500 May  3 09:17 detectnet_v2_train_resnet18_kitti.txt


## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

We will be using the kitti object detection dataset for this example. To find more details, please visit http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download both, the left color images of the object dataset from [here](http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and, the training labels for the object dataset from [here](http://www.cvlibs.net/download.php?file=data_object_label_2.zip), and place the zip files in `$LOCAL_DATA_DIR`

The data will then be extracted to have
* training images in `$LOCAL_DATA_DIR/training/image_2`
* training labels in `$LOCAL_DATA_DIR/training/label_2`
* testing images in `$LOCAL_DATA_DIR/testing/image_2`

You may use this notebook with your own dataset as well. To use this example with your own dataset, please follow the same directory structure as mentioned below.

*Note: There are no labels for the testing images, therefore we use it just to visualize inferences for the trained model.*

### A. Download the dataset <a class="anchor" id="head-2-1"></a>
Once you have gotten the download links in your email, please populate them in place of the `KITTI_IMAGES_DOWNLOAD_URL` and the `KITTI_LABELS_DOWNLOAD_URL`. This next cell, will download the data and place in `$LOCAL_DATA_DIR`

In [6]:
#import os
!mkdir -p $LOCAL_DATA_DIR
os.environ["URL_IMAGES"]=KITTI_IMAGES_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then wget $URL_IMAGES -O $LOCAL_DATA_DIR/data_object_image_2.zip; else echo "image archive already downloaded"; fi 
os.environ["URL_LABELS"]=KITTI_LABELS_DOWNLOAD_URL
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then wget $URL_LABELS -O $LOCAL_DATA_DIR/data_object_label_2.zip; else \ echo "label archive already downloaded"; fi 

--2022-05-16 18:09:07--  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs3.eu-central-1.amazonaws.com%2Favg-kitti%2Fdata_object_image_2.zip&amp;data=05%7C01%7Cskouchak%40nvidia.com%7Ce64409fd86654b5ff95008da37662d23%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637883209608652060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=BoPNISdzxjx3qCJ1ScV5jl7B90jAic%2BvhBbRO9a5g%2BA%3D&amp;reserved=0
Resolving nam11.safelinks.protection.outlook.com (nam11.safelinks.protection.outlook.com)... 104.47.57.156, 104.47.58.156, 2a01:111:f400:7eab::28, ...
Connecting to nam11.safelinks.protection.outlook.com (nam11.safelinks.protection.outlook.com)|104.47.57.156|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip [following]
--2022-05-16 18:09:08--  https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip

### B. Verify downloaded dataset <a class="anchor" id="head-2-2"></a>

In [7]:
# Check the dataset is present
!if [ ! -f $LOCAL_DATA_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $LOCAL_DATA_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi

Found Image zip file.
Found Labels zip file.


In [8]:
# This may take a while: verify integrity of zip files 
!sha256sum $LOCAL_DATA_DIR/data_object_image_2.zip | cut -d ' ' -f 1 | grep -xq '^351c5a2aa0cd9238b50174a3a62b846bc5855da256b82a196431d60ff8d43617$' ; \
if test $? -eq 0; then echo "images OK"; else echo "images corrupt, redownload!" && rm -f $LOCAL_DATA_DIR/data_object_image_2.zip; fi 
!sha256sum $LOCAL_DATA_DIR/data_object_label_2.zip | cut -d ' ' -f 1 | grep -xq '^4efc76220d867e1c31bb980bbf8cbc02599f02a9cb4350effa98dbb04aaed880$' ; \
if test $? -eq 0; then echo "labels OK"; else echo "labels corrupt, redownload!" && rm -f $LOCAL_DATA_DIR/data_object_label_2.zip; fi 

images OK
labels OK


In [9]:
# unpack downloaded datasets to $DATA_DOWNLOAD_DIR.
# The training images will be under $DATA_DOWNLOAD_DIR/training/image_2 and 
# labels will be under $DATA_DOWNLOAD_DIR/training/label_2.
# The testing images will be under $DATA_DOWNLOAD_DIR/testing/image_2.
!unzip -u $LOCAL_DATA_DIR/data_object_image_2.zip -d $LOCAL_DATA_DIR
!unzip -u $LOCAL_DATA_DIR/data_object_label_2.zip -d $LOCAL_DATA_DIR

Archive:  /home/jupyter/data/data_object_image_2.zip
   creating: /home/jupyter/data/training/image_2/
 extracting: /home/jupyter/data/training/image_2/002480.png  
 extracting: /home/jupyter/data/training/image_2/005952.png  
 extracting: /home/jupyter/data/training/image_2/000709.png  
 extracting: /home/jupyter/data/training/image_2/000814.png  
 extracting: /home/jupyter/data/training/image_2/006192.png  
 extracting: /home/jupyter/data/training/image_2/006017.png  
 extracting: /home/jupyter/data/training/image_2/002731.png  
 extracting: /home/jupyter/data/training/image_2/005295.png  
 extracting: /home/jupyter/data/training/image_2/005347.png  
 extracting: /home/jupyter/data/training/image_2/005326.png  
 extracting: /home/jupyter/data/training/image_2/005713.png  
 extracting: /home/jupyter/data/training/image_2/007000.png  
 extracting: /home/jupyter/data/training/image_2/004353.png  
 extracting: /home/jupyter/data/training/image_2/003969.png  
 extracting: /home/jupyter/da

In [10]:
# verify
import os

DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
num_training_images = len(os.listdir(os.path.join(DATA_DIR, "training/image_2")))
num_training_labels = len(os.listdir(os.path.join(DATA_DIR, "training/label_2")))
num_testing_images = len(os.listdir(os.path.join(DATA_DIR, "testing/image_2")))
print("Number of images in the train/val set. {}".format(num_training_images))
print("Number of labels in the train/val set. {}".format(num_training_labels))
print("Number of images in the test set. {}".format(num_testing_images))

Number of images in the train/val set. 7481
Number of labels in the train/val set. 7481
Number of images in the test set. 7518


In [11]:
# Sample kitti label.
!cat $LOCAL_DATA_DIR/training/label_2/000110.txt

Car 0.27 0 2.50 862.65 129.39 1241.00 304.96 1.73 1.74 4.71 5.50 1.30 8.19 3.07
Car 0.68 3 -0.76 1184.97 141.54 1241.00 187.84 1.52 1.60 4.42 22.39 0.48 24.57 -0.03
Car 0.00 1 1.73 346.64 175.63 449.93 248.90 1.58 1.76 4.18 -5.13 1.67 17.86 1.46
Car 0.00 0 1.75 420.44 170.72 540.83 256.12 1.65 1.88 4.45 -2.78 1.64 16.30 1.58
Car 0.00 0 -0.35 815.59 143.96 962.82 198.54 1.90 1.78 4.72 10.19 0.90 26.65 0.01
Car 0.00 1 -2.09 966.10 144.74 1039.76 182.96 1.80 1.65 3.55 19.49 0.49 35.99 -1.59
Van 0.00 2 -2.07 1084.26 132.74 1173.25 177.89 2.11 1.75 4.31 26.02 0.24 36.41 -1.45
Car 0.00 2 -2.13 1004.98 144.16 1087.13 178.96 1.64 1.70 3.91 21.91 0.30 36.47 -1.59
Car 0.00 2 1.77 407.73 178.44 487.07 230.28 1.55 1.71 4.50 -5.35 1.76 24.13 1.55
Car 0.00 1 1.45 657.19 166.33 702.65 198.71 1.50 1.71 4.44 3.39 1.22 35.96 1.55
Car 0.00 1 -1.46 599.30 171.76 631.96 197.12 1.58 1.71 3.75 0.39 1.54 47.31 -1.45
Car 0.00 0 -1.02 557.79 165.74 591.61 181.27 1.66 1.65 4.45 -3.89 0.91 80.12 -1.07


### C. Prepare tf records from kitti format dataset <a class="anchor" id="head-2-3"></a>

* Update the tfrecords spec file to take in your kitti format dataset
* Create the tfrecords using the detectnet_v2 dataset_convert 

*Note: TfRecords only need to be generated once.*

In [12]:
!echo $LOCAL_SPECS_DIR

/home/jupyter/detectnet_v2/specs


In [13]:
print("TFrecords conversion spec file for kitti training")
!cat $LOCAL_SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt

TFrecords conversion spec file for kitti training
kitti_config {
  root_directory_path: "/home/jupyter/data/training"
  image_dir_name: "image_2"
  label_dir_name: "label_2"
  image_extension: ".png"
  partition_mode: "random"
  num_partitions: 2
  val_split: 14
  num_shards: 10
}
image_directory_path: "/home/jupyter/data/training"


In [14]:
#edit  Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!detectnet_v2 dataset_convert \
                  -d $LOCAL_SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
                  -o $LOCAL_DATA_DIR/tfrecords/kitti_trainval/kitti_trainval

Converting Tfrecords for kitti trainval dataset
Using TensorFlow backend.
Using TensorFlow backend.
2022-05-16 18:54:03,001 [INFO] iva.detectnet_v2.dataio.build_converter: Instantiating a kitti converter
2022-05-16 18:54:03,001 [INFO] root: Instantiating a kitti converter
2022-05-16 18:54:03,002 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Creating output directory /home/jupyter/data/tfrecords/kitti_trainval
2022-05-16 18:54:03,002 [INFO] root: Generating partitions
2022-05-16 18:54:03,026 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Num images in
Train: 6434	Val: 1047
2022-05-16 18:54:03,026 [INFO] root: Num images in
Train: 6434	Val: 1047
2022-05-16 18:54:03,026 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2022-05-16 18:54:03,026 [INFO] root: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fo

In [15]:
# Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!mkdir -p $LOCAL_DATA_DIR/tfrecords && rm -rf $LOCAL_DATA_DIR/tfrecords/*
!detectnet_v2 dataset_convert \
                  -d $LOCAL_SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
                  -o $LOCAL_DATA_DIR/tfrecords/kitti_trainval/kitti_trainval

Converting Tfrecords for kitti trainval dataset
Using TensorFlow backend.
Using TensorFlow backend.
2022-05-16 18:56:46,587 [INFO] iva.detectnet_v2.dataio.build_converter: Instantiating a kitti converter
2022-05-16 18:56:46,587 [INFO] root: Instantiating a kitti converter
2022-05-16 18:56:46,587 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Creating output directory /home/jupyter/data/tfrecords/kitti_trainval
2022-05-16 18:56:46,588 [INFO] root: Generating partitions
2022-05-16 18:56:46,613 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Num images in
Train: 6434	Val: 1047
2022-05-16 18:56:46,613 [INFO] root: Num images in
Train: 6434	Val: 1047
2022-05-16 18:56:46,613 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2022-05-16 18:56:46,613 [INFO] root: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fo

In [16]:
!ls -rlt $LOCAL_DATA_DIR/tfrecords/kitti_trainval/

total 7136
-rw-r--r-- 1 jupyter jupyter 101218 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00000-of-00010
-rw-r--r-- 1 jupyter jupyter 105057 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00001-of-00010
-rw-r--r-- 1 jupyter jupyter 102687 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00002-of-00010
-rw-r--r-- 1 jupyter jupyter 102222 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00003-of-00010
-rw-r--r-- 1 jupyter jupyter 104216 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00004-of-00010
-rw-r--r-- 1 jupyter jupyter  95742 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00005-of-00010
-rw-r--r-- 1 jupyter jupyter 100018 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00006-of-00010
-rw-r--r-- 1 jupyter jupyter  98332 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00007-of-00010
-rw-r--r-- 1 jupyter jupyter 100922 May 16 18:56 kitti_trainval-fold-000-of-002-shard-00008-of-00010
-rw-r--r-- 1 jupyter jupyter 106122 May 16 18:56 kitti_trainval-fold-000-of-002-

### D. Download pre-trained model <a class="anchor" id="head-2-4"></a>
Download the correct pretrained model from the NGC model registry for your experiment. Please note that for DetectNet_v2, the input is expected to be 0-1 normalized with input channels in RGB order. Therefore, for optimum results please download model templates from `nvidia/tao/pretrained_detectnet_v2`. The templates are now organized as version strings. For example, to download a resnet18 model suitable for detectnet please resolve to the ngc object shown as `nvidia/tao/pretrained_detectnet_v2:resnet18`. 

All other models are in BGR order expect input preprocessing with mean subtraction and input channels. Using them as pretrained weights may result in suboptimal performance.

You may also use this notebook with the following purpose-built pretrained models 
* [PeopleNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplenet)
* [TrafficCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:trafficcamnet)
* [DashCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:dashcamnet)
* [FaceDetect-IR](https://ngc.nvidia.com/catalog/models/nvidia:tao:facedetectir) 

In [17]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_cat_linux.zip
--2022-05-16 18:58:15--  https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 108.156.91.29, 108.156.91.12, 108.156.91.42, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|108.156.91.29|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32589718 (31M) [application/zip]
Saving to: ‘/home/jupyter/ngccli/ngccli_cat_linux.zip’


2022-05-16 18:58:15 (106 MB/s) - ‘/home/jupyter/ngccli/ngccli_cat_linux.zip’ saved [32589718/32589718]

Archive:  /home/jupyter/ngccli/ngccli_cat_linux.zip
  inflating: /home/jupyter/ngccli/ngc  
 extracting: /home/jupyter/ngccli/ngc.md5  


In [18]:
# List models available in the model registry.
!ngc registry model list nvidia/tao/pretrained_detectnet_v2:*

+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU   | Memor | File  | Statu | Creat |
| on    | acy   | s     | Size  | Model | y Foo | Size  | s     | ed    |
|       |       |       |       |       | tprin |       |       | Date  |
|       |       |       |       |       | t     |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| vgg19 | 82.6  | 80    | 1     | V100  | 153.8 | 153.7 | UPLOA | Aug   |
|       |       |       |       |       |       | 7 MB  | D_COM | 24,   |
|       |       |       |       |       |       |       | PLETE | 2021  |
| vgg16 | 82.2  | 80    | 1     | V100  | 113.2 | 113.2 | UPLOA | Aug   |
|       |       |       |       |       |       | MB    | D_COM | 24,   |
|       |       |       |       |       |       |       | PLETE | 2021  |
| squee | 65.67 | 80    | 1     | V100  | 6.5   | 6.46  | UPLOA | Aug   |
| zenet |       |       |       |     

In [19]:
# Create the target destination to download the model.
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/

In [20]:
# Download the pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_detectnet_v2:resnet18 \
    --dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet18

Downloaded 82.28 MB in 6s, Download speed: 13.69 MB/s               
----------------------------------------------------
Transfer id: pretrained_detectnet_v2_vresnet18 Download status: Completed.
Downloaded local path: /home/jupyter/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18
Total files downloaded: 1 
Total downloaded size: 82.28 MB
Started at: 2022-05-16 18:59:36.972406
Completed at: 2022-05-16 18:59:42.986202
Duration taken: 6s
----------------------------------------------------


In [21]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/pretrained_resnet18/pretrained_detectnet_v2_vresnet18

total 91160
-rw------- 1 jupyter jupyter 93345248 May 16 18:59 resnet18.hdf5


## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Tfrecords for the train datasets
    * To use the newly generated tfrecords, update the dataset_config parameter in the spec file at `$SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt` 
    * Update the fold number to use for evaluation. In case of random data split, please use fold `0` only
    * For sequence-wise split, you may use any fold generated from the dataset convert tool
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [22]:
!cat $LOCAL_SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/home/jupyter/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/home/jupyter/data/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "cyclist"
    value: "cyclist"
  }
  target_class_mapping {
    key: "pedestrian"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "person_sitting"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "van"
    value: "car"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models

*Note: The training may take hours to complete. Also, the remaining notebook, assumes that the training was done in single-GPU mode. When run in multi-GPU mode, please expect to update the pruning and inference steps with new pruning thresholds and updated parameters in the clusterfile.json accordingly for optimum performance.*

*Detectnet_v2 now supports restart from checkpoint. In case the training job is killed prematurely, you may resume training from the closest checkpoint by simply re-running the **same** command line. Please do make sure to use the <u>**same number of GPUs**</u> when restarting the training.*

*When running the training with NUM_GPUs>1, you may need to modify the `batch_size_per_gpu` and `learning_rate` to get similar mAP as a 1GPU training run. In most cases, scaling down the batch-size by a factor of NUM_GPU's or scaling up the learning rate by a factor of NUM_GPU's would be a good place to start.* 

In [3]:
!detectnet_v2 train -e $LOCAL_SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
                        -r $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -k $KEY \
                        -n resnet18_detector \
                        --gpus $NUM_GPUS

Using TensorFlow backend.
Using TensorFlow backend.












2022-05-16 22:58:45,300 [INFO] iva.common.logging.logging: Log file already exists at /home/jupyter/detectnet_v2/experiment_dir_unpruned/status.json
2022-05-16 22:58:45,301 [INFO] __main__: Loading experiment spec at /home/jupyter/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt.
2022-05-16 22:58:45,302 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt
2022-05-16 22:58:46,233 [INFO] __main__: Cannot iterate over exactly 6434 samples with a batch size of 4; each epoch will therefore take one extra step.




















2022-05-16 22:59:03,334 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to            

In [4]:
print('Model for each epoch:')
print('---------------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

Model for each epoch:
---------------------
total 43M
-rw-r--r-- 1 jupyter jupyter 43M May 17 02:19 resnet18_detector.tlt


## 5. Evaluate the trained model <a class="anchor" id="head-5"></a>

In [15]:
!detectnet_v2 evaluate -e $LOCAL_SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt\
                           -m $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
                           -k $KEY

Using TensorFlow backend.
Using TensorFlow backend.


2022-05-17 03:54:01,035 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt


2022-05-17 03:54:01,046 [INFO] root: Loading model weights.


















2022-05-17 03:54:05,380 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
2022-05-17 03:54:05,381 [INFO] root: Building dataloader.
2022-05-17 03:54:06,263 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-05-17 03:54:06,263 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-05-17 03:54:06,263 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-05-17 03:54:06,263 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute t

## 6. Prune the trained model <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Equalization criterion (`Applicable for resnets and mobilenets`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

*Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is dependent on the dataset. A pth value `5.2e-6` is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.*

*For some internal studies, we have noticed that a pth value of 0.01 is a good starting point for detectnet_v2 models.*

In [16]:
# Create an output directory if it doesn't exist.
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

In [17]:
!detectnet_v2 prune \
                  -m $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
                  -o $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/resnet18_nopool_bn_detectnet_v2_pruned.tlt \
                  -eq union \
                  -pth 0.0000052 \
                  -k $KEY

Using TensorFlow backend.
Using TensorFlow backend.
2022-05-17 04:00:05,933 [INFO] modulus.pruning.pruning: Exploring graph for retainable indices
2022-05-17 04:00:06,776 [INFO] modulus.pruning.pruning: Pruning model and appending pruned nodes to new graph
2022-05-17 04:00:31,147 [INFO] iva.common.magnet_prune: Pruning ratio (pruned model / original model): 0.5614269469945746
2022-05-17 04:00:32,018 [INFO] root: Pruning ratio (pruned model / original model): 0.5614269469945746
2022-05-17 04:00:32,018 [INFO] root: {
    "pruning_ratio": 0.5614269469945746,
    "size": 24.193321228027344,
    "param_count": 6.289679
}


In [18]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned/

total 24776
-rw-r--r-- 1 jupyter jupyter 25368536 May 17 04:00 resnet18_nopool_bn_detectnet_v2_pruned.tlt


## 7. Retrain the pruned model <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification with pretrained weights as pruned model.

*Note: For retraining, please set the `load_graph` option to `true` in the model_config to load the pruned model graph. Also, if after retraining, the model shows some decrease in mAP, it could be that the originally trained model was pruned a little too much. Please try reducing the pruning threshold (thereby reducing the pruning ratio) and use the new model to retrain.*

*Note: DetectNet_v2 now supports Quantization Aware Training, to help with optmizing the model. By default, the training in the cell below doesn't run the model with QAT enabled. For information on training a model with QAT, please refer to the cells under [section 11](#head-11)*

In [19]:
# Printing the retrain experiment file. 
# Note: We have updated the experiment file to include the 
# newly pruned model as a pretrained weights and, the
# load_graph option is set to true 
!cat $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/home/jupyter/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/home/jupyter/data/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "cyclist"
    value: "cyclist"
  }
  target_class_mapping {
    key: "pedestrian"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "person_sitting"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "van"
    value: "car"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale

In [20]:
# Retraining using the pruned model as pretrained weights 
!detectnet_v2 train -e $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                        -r $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain \
                        -k $KEY \
                        -n resnet18_detector_pruned \
                        --gpus $NUM_GPUS

Using TensorFlow backend.
Using TensorFlow backend.












2022-05-17 04:02:27,576 [INFO] iva.common.logging.logging: Log file already exists at /home/jupyter/detectnet_v2/experiment_dir_retrain/status.json
2022-05-17 04:02:27,577 [INFO] __main__: Loading experiment spec at /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti.txt.
2022-05-17 04:02:27,579 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti.txt
2022-05-17 04:02:28,513 [INFO] __main__: Cannot iterate over exactly 6434 samples with a batch size of 4; each epoch will therefore take one extra step.


















2022-05-17 04:02:31,307 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to           

In [None]:
#  Listing the newly retrained model.
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights

total 24776
-rw-r--r-- 1 jupyter jupyter 25368536 May 17 10:47 resnet18_detector_pruned.tlt


## 8. Evaluate the retrained model <a class="anchor" id="head-8"></a>

This section evaluates the pruned and retrained model, using the `evaluate` command.

In [22]:
!detectnet_v2 evaluate -e $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                           -m $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                           -k $KEY

Using TensorFlow backend.
Using TensorFlow backend.


2022-05-17 10:48:04,554 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti.txt


2022-05-17 10:48:04,577 [INFO] root: Loading model weights.


















2022-05-17 10:48:07,491 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
2022-05-17 10:48:07,491 [INFO] root: Building dataloader.
2022-05-17 10:48:08,173 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-05-17 10:48:08,173 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-05-17 10:48:08,173 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-05-17 10:48:08,174 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute

## 9. Visualize inferences <a class="anchor" id="head-9"></a>
In this section, we run the `inference` tool to generate inferences on the trained models. To render bboxes from more classes, please edit the spec file `detectnet_v2_inference_kitti_tlt.txt` to include all the classes you would like to visualize and edit the rest of the file accordingly.

In [23]:
# Running inference for detection on n images
!detectnet_v2 inference -e $LOCAL_SPECS_DIR/detectnet_v2_inference_kitti_tlt.txt \
                            -o $LOCAL_EXPERIMENT_DIR/tlt_infer_testing \
                            -i $LOCAL_DATA_DIR/testing/image_2 \
                            -k $KEY

Using TensorFlow backend.
Using TensorFlow backend.
2022-05-17 10:49:14,160 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_inference_kitti_tlt.txt
2022-05-17 10:49:14,187 [INFO] __main__: Overlain images will be saved in the output path.
2022-05-17 10:49:14,187 [INFO] iva.detectnet_v2.inferencer.build_inferencer: Constructing inferencer




2022-05-17 10:49:14,860 [INFO] iva.detectnet_v2.inferencer.tlt_inferencer: Loading model from /home/jupyter/detectnet_v2/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt:


















_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 3, 384, 1248)      0         
_________________________________________________________________
model_1 (Model)              [(None, 3, 24, 78), (None 6289679   
Total params: 6,289,679
Trainable params: 6,282,335

The `inference` tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/tlt_infer_testing/images_annotated`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/tlt_infer_testing/labels`

*Note: To run inferences for a single image, simply replace the path to the -i flag in `inference` command with the path to the image.*

In [24]:
# Simple grid visualizer
!pip3 install matplotlib==3.3.3
%matplotlib inline
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

Defaulting to user installation because normal site-packages is not writeable


## 10. Model Export <a class="anchor" id="head-10"></a>

In [None]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_final
# Removing a pre-existing copy of the etlt if there has been any.
import os
output_file=os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'],
                         "experiment_dir_final/resnet18_detector.etlt")
if os.path.exists(output_file):
    os.system("rm {}".format(output_file))
!detectnet_v2 export \
                  -m $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                  -o $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
                  -k $KEY

In [None]:
print('Exported model:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_final

### A. Int8 Optimization <a class="anchor" id="head-10-1"></a>
DetectNet_v2 model supports int8 inference mode in TensorRT. 
In order to use int8 mode, we must calibrate the model to run 8-bit inferences -

* Generate calibration tensorfile from the training data using detectnet_v2 calibration_tensorfile
* Use tao <task> export to generate int8 calibration table.

*Note: For this example, we generate a calibration tensorfile containing 10 batches of training data.
Ideally, it is best to use at least 10-20% of the training data to do so. The more data provided during calibration, the closer int8 inferences are to fp32 inferences.*

*Note: If the model was trained with QAT nodes available, please refrain from using the post training int8 optimization as mentioned below. Please export the model in int8 mode (using the arg `--data_type int8`) with just the path to the calibration cache file (using the argument `--cal_cache_file`)*

In [6]:
!detectnet_v2 calibration_tensorfile -e $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
                                         -m 10 \
                                         -o $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor

Using TensorFlow backend.
Using TensorFlow backend.
2022-04-29 20:50:21,971 [INFO] __main__: This method is soon to be deprecated. Please use the -e option in the export command to instantiate the dataloader and generate samples for calibration from the training dataloader.
2022-04-29 20:50:21,972 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti.txt


2022-04-29 20:50:22,537 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-04-29 20:50:22,537 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-04-29 20:50:22,537 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-04-29 20:50:22,538 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute threads: 4, buffered batches

In [9]:
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!detectnet_v2 export \
                  -m $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                  -o $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
                  -k $KEY  \
                  --cal_data_file $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \
                  --data_type int8 \
                  --batches 10 \
                  --batch_size 4 \
                  --max_batch_size 4\
                  --engine_file $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
                  --cal_cache_file $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
                  --verbose

Using TensorFlow backend.
Using TensorFlow backend.
2022-04-29 20:52:12,292 [INFO] root: Building exporter object.
2022-04-29 20:52:15,361 [INFO] root: Exporting the model.
2022-04-29 20:52:15,361 [INFO] root: Using input nodes: ['input_1']
2022-04-29 20:52:15,361 [INFO] root: Using output nodes: ['output_cov/Sigmoid', 'output_bbox/BiasAdd']
2022-04-29 20:52:15,361 [INFO] iva.common.export.keras_exporter: Using input nodes: ['input_1']
2022-04-29 20:52:15,361 [INFO] iva.common.export.keras_exporter: Using output nodes: ['output_cov/Sigmoid', 'output_bbox/BiasAdd']
NOTE: UFF has been tested with TensorFlow 1.14.0.
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['output_cov/Sigmoid', 'output_bbox/BiasAdd'] as outputs
2022-04-29 20:52:23,757 [INFO] iva.common.export.keras_exporter: Calibration takes time especially if number of batches is large.
2022-04-29 20:52:23,758 [INFO] root: Calibration takes time especially if number of batches is 

### B. Generate TensorRT engine <a class="anchor" id="head-10-2"></a>
Verify engine generation using the `tao-converter` utility included with the docker.

The `tao-converter` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please instantiate this docker and execute the `tao-converter` command, with the exported `.etlt` file and calibration cache (for int8 mode) on your target device. The tao-converter utility included in this docker only works for x86 devices, with discrete NVIDIA GPU's. 

For the jetson devices, please download the tao-converter for jetson from the dev zone link [here](https://developer.nvidia.com/tao-converter). 

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.etlt` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

In [10]:
!converter $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
                   -k $KEY \
                    -c $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
                   -o output_cov/Sigmoid,output_bbox/BiasAdd \
                   -d 3,384,1248 \
                   -i nchw \
                   -m 64 \
                   -t int8 \
                   -e $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt \
                   -b 4

[INFO] [MemUsageChange] Init CUDA: CPU +252, GPU +0, now: CPU 258, GPU 481 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 343 MiB, GPU 481 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +350, GPU +160, now: CPU 736, GPU 641 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +274, GPU +132, now: CPU 1010, GPU 773 (MiB)
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 75968
[INFO] Total Device Persistent Memory: 22944256
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TR

## 11. Verify Deployed Model <a class="anchor" id="head-11"></a>
Verify the exported model by visualizing inferences on TensorRT.
In addition to running inference on a `.tlt` model in [step 9](#head-9), the `inference` tool is also capable of consuming the converted `TensorRT engine` from [step 10.B](#head-10-2).

*If after int-8 calibration the accuracy of the int-8 inferences seem to degrade, it could be because the there wasn't enough data in the calibration tensorfile used to calibrate thee model or, the training data is not entirely representative of your test images, and the calibration maybe incorrect. Therefore, you may either regenerate the calibration tensorfile with more batches of the training data and recalibrate the model, or calibrate the model on a few images from the test set. This may be done using `--cal_image_dir` flag in the `export` tool. For more information, please follow the instructions in the USER GUIDE.

### A. Inference using TensorRT engine <a class="anchor" id="head-11-1"></a>

In [12]:
!detectnet_v2 inference -e $LOCAL_SPECS_DIR/detectnet_v2_inference_kitti_etlt.txt \
                            -o $LOCAL_EXPERIMENT_DIR/etlt_infer_testing \
                            -i $LOCAL_DATA_DIR/testing/image_2 \
                            -k $KEY

Using TensorFlow backend.
Using TensorFlow backend.
2022-04-29 20:56:03,186 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_inference_kitti_etlt.txt
2022-04-29 20:56:03,213 [INFO] __main__: Overlain images will be saved in the output path.
2022-04-29 20:56:03,213 [INFO] iva.detectnet_v2.inferencer.build_inferencer: Constructing inferencer
2022-04-29 20:56:03,586 [INFO] iva.detectnet_v2.inferencer.trt_inferencer: Reading from engine file at: /home/jupyter/detectnet_v2/experiment_dir_final/resnet18_detector.trt
2022-04-29 20:56:04,675 [INFO] __main__: Initialized model
2022-04-29 20:56:04,709 [INFO] __main__: Commencing inference
100%|█████████████████████████████████████████| 470/470 [23:16<00:00,  2.97s/it]
2022-04-29 21:19:21,254 [INFO] iva.detectnet_v2.inferencer.trt_inferencer: Clearing input buffers.
2022-04-29 21:19:21,255 [INFO] iva.detectnet_v2.inferencer.trt_inferencer: Clearing output buffers.
2022-04-2

## 11. QAT workflow <a class="anchor" id="head-12"></a>
This section delves into the newly enabled Quantization Aware Training feature with DetectNet_v2. The workflow defined below converts a pruned model from section [5](#head-5) to enable QAT and retrain this model to while accounting the noise introduced due to quantization in the forward pass. 

### A. Convert pruned model to QAT and retrain <a class="anchor" id="head-12-1"></a>
All detectnet models, unpruned and pruned models can be converted to QAT models by setting the `enable_qat` parameter in the `training_config` component of the spec file to `true`.

In [14]:
# Printing the retrain experiment file. 
# Note: We have updated the experiment file to convert the
# pretrained model to qat mode by setting the enable_qat
# parameter.
!cat $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/home/jupyter/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/home/jupyter/data/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "cyclist"
    value: "cyclist"
  }
  target_class_mapping {
    key: "pedestrian"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "person_sitting"
    value: "pedestrian"
  }
  target_class_mapping {
    key: "van"
    value: "car"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale

In [15]:
!detectnet_v2 train -e $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
                        -r $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain_qat \
                        -k $KEY \
                        -n resnet18_detector_pruned_qat \
                        --gpus $NUM_GPUS

Using TensorFlow backend.
Using TensorFlow backend.












2022-04-29 21:25:25,067 [INFO] iva.common.logging.logging: Log file already exists at /home/jupyter/detectnet_v2/experiment_dir_retrain_qat/status.json
2022-04-29 21:25:25,067 [INFO] __main__: Loading experiment spec at /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti_qat.txt.
2022-04-29 21:25:25,070 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti_qat.txt
2022-04-29 21:25:25,647 [INFO] __main__: Cannot iterate over exactly 6434 samples with a batch size of 4; each epoch will therefore take one extra step.


















2022-04-29 21:25:47,341 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected t

In [16]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights

total 44072
-rw-r--r-- 1 jupyter jupyter 45129352 Apr 29 20:42 resnet18_detector_pruned_qat.tlt


### B. Evaluate QAT converted model <a class="anchor" id="head-12-2"></a>
This section evaluates a QAT enabled pruned retrained model. The mAP of this model should be comparable to that of the pruned retrained model without QAT. However, due to quantization, it is possible sometimes to see a drop in the mAP value for certain datasets.

In [17]:
!detectnet_v2 evaluate -e $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
                           -m $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
                           -k $KEY \
                           -f tlt

Using TensorFlow backend.
Using TensorFlow backend.


2022-04-29 21:29:06,022 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti_qat.txt


2022-04-29 21:29:06,030 [INFO] root: Loading model weights.


















2022-04-29 21:29:09,337 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
2022-04-29 21:29:09,337 [INFO] root: Building dataloader.
2022-04-29 21:29:09,923 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-04-29 21:29:09,923 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-04-29 21:29:09,923 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-04-29 21:29:09,923 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, com

### C. Export QAT trained model to int8 <a class="anchor" id="head-12-3"></a>
Export a QAT trained model to TensorRT parsable model. This command generates an .etlt file from the trained model and the serializes corresponding int8 scales as a TRT readable calibration cache file.

In [18]:
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.etlt
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin
!detectnet_v2 export \
                  -m $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet18_detector_pruned_qat.tlt \
                  -o $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.etlt \
                  -k $KEY  \
                  --data_type int8 \
                  --batch_size 64 \
                  --max_batch_size 64\
                  --engine_file $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.trt.int8 \
                  --cal_cache_file $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin \
                  --verbose

Using TensorFlow backend.
Using TensorFlow backend.
2022-04-29 21:30:33,765 [INFO] root: Building exporter object.
2022-04-29 21:30:36,834 [INFO] root: Exporting the model.
2022-04-29 21:30:36,834 [INFO] root: Using input nodes: ['input_1']
2022-04-29 21:30:36,834 [INFO] root: Using output nodes: ['output_cov/Sigmoid', 'output_bbox/BiasAdd']
2022-04-29 21:30:36,834 [INFO] iva.common.export.keras_exporter: Using input nodes: ['input_1']
2022-04-29 21:30:36,834 [INFO] iva.common.export.keras_exporter: Using output nodes: ['output_cov/Sigmoid', 'output_bbox/BiasAdd']
NOTE: UFF has been tested with TensorFlow 1.14.0.
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['output_cov/Sigmoid', 'output_bbox/BiasAdd'] as outputs
2022-04-29 21:31:02,092 [INFO] root: Extracting scales generated during QAT.
2022-04-29 21:32:27,425 [INFO] root: Export complete.
2022-04-29 21:32:27,425 [INFO] root: {
    "param_count": 11.203023,
    "size": 22.1571588516

### D. Evaluate a QAT trained model using the exported TensorRT engine <a class="anchor" id="head-12-4"></a>
This section evaluates a QAT enabled pruned retrained model using the TensorRT int8 engine that was exported in [Section C](#head-12-3). Please note that there maybe a slight difference (~0.1-0.5%) in the mAP from [Section B](#head-12-2), oweing to some differences in the implementation of quantization in TensorRT.

*Note: The TensorRT evaluator might be slightly slower than the TAO evaluator here, because the evaluation dataloader is pinned to the CPU to avoid any clashes between TensorRT and TAO instances in the GPU. Please note that this tool was not intended and has not been developed for profiling the model. It is just a means to qualitatively analyse the model.*

*Please use native TensorRT or DeepStream for the most optimized inferences.*

In [19]:
!detectnet_v2 evaluate -e $LOCAL_SPECS_DIR/detectnet_v2_retrain_resnet18_kitti_qat.txt \
                           -m $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector_qat.trt.int8 \
                           -f tensorrt

Using TensorFlow backend.
Using TensorFlow backend.


2022-04-29 21:34:33,977 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_retrain_resnet18_kitti_qat.txt


2022-04-29 21:34:33,984 [INFO] root: Loading model weights.
2022-04-29 21:34:35,513 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.
Outputs of the TensorRT engine are: ['bbox', 'cov']
2022-04-29 21:34:35,513 [INFO] root: Building dataloader.
2022-04-29 21:34:36,078 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-04-29 21:34:36,078 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-04-29 21:34:36,078 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-04-29 21:34:36,079 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: num

### E. Inference using QAT engine <a class="anchor" id="head-12-5"></a>
Run inference and visualize detections on test images, using the exported TensorRT engine from [Section C](#head-12-3).

In [20]:
!detectnet_v2 inference -e $LOCAL_SPECS_DIR/detectnet_v2_inference_kitti_etlt_qat.txt \
                            -o $LOCAL_EXPERIMENT_DIR/tlt_infer_testing_qat \
                            -i $LOCAL_DATA_DIR/testing/image_2 \
                            -k $KEY

Using TensorFlow backend.
Using TensorFlow backend.
2022-04-29 21:36:17,363 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/jupyter/detectnet_v2/specs/detectnet_v2_inference_kitti_etlt_qat.txt
2022-04-29 21:36:17,404 [INFO] __main__: Creating output inference directory
2022-04-29 21:36:17,419 [INFO] __main__: Overlain images will be saved in the output path.
2022-04-29 21:36:17,420 [INFO] iva.detectnet_v2.inferencer.build_inferencer: Constructing inferencer
2022-04-29 21:36:17,769 [INFO] iva.detectnet_v2.inferencer.trt_inferencer: Reading from engine file at: /home/jupyter/detectnet_v2/experiment_dir_final/resnet18_detector_qat.trt.int8
2022-04-29 21:36:18,834 [INFO] __main__: Initialized model
2022-04-29 21:36:18,856 [INFO] __main__: Commencing inference
100%|█████████████████████████████████████████| 470/470 [23:56<00:00,  3.06s/it]
2022-04-29 22:00:14,869 [INFO] iva.detectnet_v2.inferencer.trt_inferencer: Clearing input buffers.
2022-04-29 22:00:14