# TLT SSD example usecase

This notebook shows an example usecase of SSD object detection using Transfer Learning Toolkit.

0. [Set up env variables](#head-0)
1. [Prepare dataset and pre-trained model](#head-1) <br>
    1.1 [Prepare tfrecords from kitti format dataset](#head-1-1) <br>
    1.2 [Download pre-trained model](#head-1-2) <br>
2. [Provide training specification](#head-2)
3. [Run TLT training](#head-3)
4. [Evaluate trained models](#head-4)
5. [Prune trained models](#head-5)
6. [Retrain pruned models](#head-6)
7. [Evaluate retrained model](#head-7)
8. [Visualize inferences](#head-8)
9. [Deploy](#head-9)
10. [Verify deployed model](#head-10)

## 0. Set up env variables <a class="anchor" id="head-0"></a>


In [1]:
# Setting up env variables for cleaner command line commands.
print("Please replace the variable with your key.")
%set_env KEY=OHB1YTZ0Z2RxYTBzdnE3YTNpcnVydmM4cXI6OGVkNDU4ZGQtNjViOC00NzYxLWFhMDUtMjgxMDQ2ZTVmNzAx
%set_env USER_EXPERIMENT_DIR=/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano
%set_env DATA_DOWNLOAD_DIR=/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/data
%set_env SPECS_DIR=/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/specs
!mkdir -p $USER_EXPERIMENT_DIR
!mkdir -p $DATA_DOWNLOAD_DIR
!mkdir -p $SPECS_DIR

Please replace the variable with your key.
env: KEY=OHB1YTZ0Z2RxYTBzdnE3YTNpcnVydmM4cXI6OGVkNDU4ZGQtNjViOC00NzYxLWFhMDUtMjgxMDQ2ZTVmNzAx
env: USER_EXPERIMENT_DIR=/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano
env: DATA_DOWNLOAD_DIR=/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/data
env: SPECS_DIR=/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/specs


## 1. Prepare dataset and pre-trained model <a class="anchor" id="head-1"></a>

 We will be using the KITTI detection dataset for the tutorial. To find more details please visit
 http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download the KITTI detection images (http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and labels (http://www.cvlibs.net/download.php?file=data_object_label_2.zip) to $DATA_DOWNLOAD_DIR.

In [4]:
# Check the dataset is present
!mkdir -p $DATA_DOWNLOAD_DIR
!if [ ! -f $DATA_DOWNLOAD_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $DATA_DOWNLOAD_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi

Found Image zip file.
Found Labels zip file.


In [None]:
# unpack 
!unzip -u $DATA_DOWNLOAD_DIR/data_object_image_2.zip -d $DATA_DOWNLOAD_DIR
!unzip -u $DATA_DOWNLOAD_DIR/data_object_label_2.zip -d $DATA_DOWNLOAD_DIR

In [3]:
# verify
!ls -l $DATA_DOWNLOAD_DIR/

total 12280824
-rwxrwxrwx 1 1000 1000 12569945557 Jun 28 05:00 data_object_image_2.zip
-rw-r--r-- 1 root root     5601213 May 11  2018 data_object_label_2.zip
drwxr-xr-x 3 root root        4096 Jun 28 09:15 testing
drwxr-xr-x 4 root root        4096 Jun 28 10:58 training


Additionally, if you have your own dataset already in a volume (or folder), you can mount the volume on `DATA_DOWNLOAD_DIR` (or create a soft link). Below shows an example:
```bash
# if your dataset is in /dev/sdc1
mount /dev/sdc1 $DATA_DOWNLOAD_DIR

# if your dataset is in folder /var/dataset
ln -sf /var/dataset $DATA_DOWNLOAD_DIR
```

### 1.1 Prepare tfrecords from kitti format dataset <a class="anchor" id="head-1-1"></a>

* Update the tfrecords spec file to take in your kitti format dataset
* Create the tfrecords using the tlt-dataset-convert 
* TFRecords only need to be generated once.

In [7]:
print("TFrecords conversion spec file for training")
!cat $SPECS_DIR/ssd_tfrecords_kitti_trainval.txt

TFrecords conversion spec file for training
kitti_config {
  root_directory_path: "/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/data/training"
  image_dir_name: "image_2"
  label_dir_name: "label_2"
  image_extension: ".png"
  partition_mode: "random"
  num_partitions: 2
  val_split: 14
  num_shards: 10
}
image_directory_path: "/workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/data/training"

In [8]:
# Creating a new directory for the output tfrecords dump.
!mkdir -p $USER_EXPERIMENT_DIR/tfrecords
#KITTI trainval
!tlt-dataset-convert -d $SPECS_DIR/ssd_tfrecords_kitti_trainval.txt \
                     -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

Using TensorFlow backend.
2020-06-28 12:01:16,152 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter
2020-06-28 12:01:16,152 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Creating output directory /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/data/tfrecords/kitti_trainval
2020-06-28 12:01:16,174 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in
Train: 6434	Val: 1047
2020-06-28 12:01:16,174 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2020-06-28 12:01:16,176 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0
2020-06-28 12:01:16,287 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1
2020-06-28 12:01:16,390 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 2
2020-06-28 12:01:16,492 - iva.detectnet

In [9]:
!ls -rlt $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval

total 7136
-rw-r--r-- 1 root root 104563 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00000-of-00010
-rw-r--r-- 1 root root 102704 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00001-of-00010
-rw-r--r-- 1 root root 102153 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00002-of-00010
-rw-r--r-- 1 root root  97636 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00003-of-00010
-rw-r--r-- 1 root root  98167 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00004-of-00010
-rw-r--r-- 1 root root 103687 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00005-of-00010
-rw-r--r-- 1 root root 101947 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00006-of-00010
-rw-r--r-- 1 root root 100557 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00007-of-00010
-rw-r--r-- 1 root root  99253 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00008-of-00010
-rw-r--r-- 1 root root 110021 Jun 28 12:01 kitti_trainval-fold-000-of-002-shard-00009-of-00010
-rw-r--r-- 1 root root 62662

### 1.2 Download pre-trained model <a class="anchor" id="head-1-2"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [11]:
!ngc registry model list nvidia/tlt_pretrained_object_detection:*

+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU   | Memor | File  | Statu | Creat |
| on    | acy   | s     | Size  | Model | y Foo | Size  | s     | ed    |
|       |       |       |       |       | tprin |       |       | Date  |
|       |       |       |       |       | t     |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| resne | 74.38 | 80    | 1     | V100  | 38.3  | 38.31 | UPLOA | Apr   |
| t10   |       |       |       |       |       | MB    | D_COM | 29,   |
|       |       |       |       |       |       |       | PLETE | 2020  |
| resne | 76.74 | 80    | 1     | V100  | 89.0  | 88.96 | UPLOA | Apr   |
| t18   |       |       |       |       |       | MB    | D_COM | 29,   |
|       |       |       |       |       |       |       | PLETE | 2020  |
| resne | 77.04 | 80    | 1     | V100  | 170.7 | 170.6 | UPLOA | Apr   |
| t34   |       |       |

In [12]:
!mkdir -p $USER_EXPERIMENT_DIR/pretrained_resnet18/

In [13]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tlt_pretrained_object_detection:resnet18 --dest $USER_EXPERIMENT_DIR/pretrained_resnet18

Downloaded 82.38 MB in 1m 1s, Download speed: 1.35 MB/s                
----------------------------------------------------
Transfer id: tlt_pretrained_object_detection_vresnet18 Download status: Completed.
Downloaded local path: /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/pretrained_resnet18/tlt_pretrained_object_detection_vresnet18
Total files downloaded: 1 
Total downloaded size: 82.38 MB
Started at: 2020-06-28 12:03:25.873262
Completed at: 2020-06-28 12:04:26.957205
Duration taken: 1m 1s
----------------------------------------------------


In [15]:
print("Check that model is downloaded into dir.")
!ls -lh $USER_EXPERIMENT_DIR/pretrained_resnet18/tlt_pretrained_object_detection_vresnet18

Check that model is downloaded into dir.
total 89M
-rw------- 1 root root 89M Jun 28 12:04 resnet_18.hdf5


## 2. Provide training specification <a class="anchor" id="head-2"></a>
* Tfrecords for the train datasets
    * In order to use the newly generated tfrecords, update the dataset_config parameter in the spec file at `$SPECS_DIR/ssd_train_resnet18_kitti.txt` 
    * Update the fold number to use for evaluation. In case of random data split, please use fold 0 only
    * For sequence wise you may use any fold generated from the dataset convert tool
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [17]:
!cat $SPECS_DIR/ssd_train_resnet18_kitti.txt

random_seed: 42
ssd_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 1.0/3.0]"
  scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
  two_boxes_for_ar1: true
  clip_boxes: false
  loss_loc_weight: 0.8
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "resnet"
  nlayers: 18
  freeze_bn: false
}
training_config {
  batch_size_per_gpu: 24
  num_epochs: 80
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 5e-5
    max_learning_rate: 2e-2
    soft_start: 0.15
    annealing: 0.5
    }
  }
  regularizer {
    type: L1
    weight: 3e-06
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  batch_size: 32
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.01
  clustering_iou_threshold: 0.6
  top_k: 200
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    output_image_c

## 3. Run TLT training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [18]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_unpruned

In [19]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tlt-train ssd -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
               -k $KEY \
               -m $USER_EXPERIMENT_DIR/pretrained_resnet18/tlt_pretrained_object_detection_vresnet18/resnet_18.hdf5 \
               --gpus 1

To run with multigpu, please change --gpus based on the number of available GPUs in your machine.
Using TensorFlow backend.
2020-06-28 12:21:50,953 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/specs/ssd_train_resnet18_kitti.txt
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset co

ssd_expand_block_1_conv_0 (Conv (24, 128, 24, 78)    32896       ssd_expand_block_0_relu_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_1_relu_0 (ReLU (24, 128, 24, 78)    0           ssd_expand_block_1_conv_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_1_conv_1 (Conv (24, 256, 12, 39)    294912      ssd_expand_block_1_relu_0[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_1_bn_1 (BatchN (24, 256, 12, 39)    1024        ssd_expand_block_1_conv_1[0][0]  
__________________________________________________________________________________________________
ssd_expand_block_1_relu_1 (ReLU (24, 256, 12, 39)    0           ssd_expand_block_1_bn_1[0][0]    
__________________________________________________________________________________________________


Epoch 1/80

Epoch 00001: saving model to /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/experiment_dir_unpruned/weights/ssd_resnet18_epoch_001.tlt
Epoch 2/80

Epoch 00002: saving model to /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/experiment_dir_unpruned/weights/ssd_resnet18_epoch_002.tlt
Epoch 3/80

Epoch 00003: saving model to /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/experiment_dir_unpruned/weights/ssd_resnet18_epoch_003.tlt
Epoch 4/80
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 37, in main
  File "./ssd/scripts/train.py", line 245, in main
  File "./ssd/scripts/train.py", line 182, in run_experiment
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training_arrays.py", line 154, in fit_loop
    outs = f(ins)


In [20]:
print("To resume from checkpoint, please uncomment and run this instead. Change last two arguments accordingly.")
!tlt-train ssd -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
               -k $KEY \
               -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_003.tlt \
               --gpus 1 \
               --initial_epoch 4 

To resume from checkpoint, please uncomment and run this instead. Change last two arguments accordingly.
Using TensorFlow backend.
2020-06-28 12:42:23,114 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/specs/ssd_train_resnet18_kitti.txt
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dat

Epoch 4/80

Epoch 00004: saving model to /workspace/tlt_docker_files/mydata/tlt-tensorrt-nano/experiment_dir_unpruned/weights/ssd_resnet18_epoch_004.tlt
Epoch 5/80
 53/269 [====>.........................] - ETA: 3:03 - loss: 3.1524^C


In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
# Note csv epoch number is 1 less than model file epoch. For example, epoch 79 in csv corresponds to _080.tlt
!cat $USER_EXPERIMENT_DIR/experiment_dir_unpruned/ssd_training_log_resnet18.csv
%set_env EPOCH=080

## 4. Evaluate trained models <a class="anchor" id="head-4"></a>

In [None]:
!tlt-evaluate ssd -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_$EPOCH.tlt \
                  -k $KEY

## 5. Prune trained models <a class="anchor" id="head-5"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.5` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tlt-prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_$EPOCH.tlt \
           -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/ssd_resnet18_pruned.tlt \
           -eq intersection \
           -pth 0.5 \
           -k $KEY

In [None]:
!ls -rlt $USER_EXPERIMENT_DIR/experiment_dir_pruned/

## 6. Retrain pruned models <a class="anchor" id="head-6"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!cat $SPECS_DIR/ssd_retrain_resnet18_kitti.txt

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights 
!tlt-train ssd --gpus 1 \
               -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
               -m $USER_EXPERIMENT_DIR/experiment_dir_pruned/ssd_resnet18_pruned.tlt \
               -k $KEY

In [None]:
# Listing the newly retrained model.
!ls -rlt $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
# Note csv epoch number is 1 less than model file epoch. For example, epoch 79 in csv corresponds to _080.tlt
!cat $USER_EXPERIMENT_DIR/experiment_dir_retrain/ssd_training_log_resnet18.csv
%set_env EPOCH=100

## 7. Evaluate retrained model <a class="anchor" id="head-7"></a>

In [None]:
!tlt-evaluate ssd -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
                  -k $KEY

## 8. Visualize inferences <a class="anchor" id="head-8"></a>
In this section, we run the tlt-infer tool to generate inferences on the trained models and visualize the results.

In [None]:
# Running inference for detection on n images
!tlt-infer ssd -i $DATA_DOWNLOAD_DIR/testing/image_2 \
               -o $USER_EXPERIMENT_DIR/ssd_infer_images \
               -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
               -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
               -l $USER_EXPERIMENT_DIR/ssd_infer_labels \
               -k $KEY

The `tlt-infer` tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/ssd_infer_images`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/ssd_infer_labels`

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['USER_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx / num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
OUTPUT_PATH = 'ssd_infer_images' # relative path from $USER_EXPERIMENT_DIR.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 9. Deploy! <a class="anchor" id="head-9"></a>

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
                -k $KEY \
                -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
                -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
                --batch_size 1 \
                --data_type fp32

# Uncomment to export in INT8 mode (generate calibration cache file). \
# !tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt  \
#                 -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
#                 -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
#                 -k $KEY \
#                 --cal_image_dir  $USER_EXPERIMENT_DIR/data/testing/image_2 \
#                 --data_type int8 \
#                 --batch_size 1 \
#                 --batches 10 \
#                 --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
#                 --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

`Note:` In this example, for ease of execution we restrict the number of calibrating batches to 10. TLT recommends the use of at least 10% of the training dataset for int8 calibration.

In [None]:
print('Exported model:')
print('------------')
!ls -lh $USER_EXPERIMENT_DIR/export

Verify engine generation using the `tlt-converter` utility included with the docker.

The `tlt-converter` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please instantiate this docker and execute the `tlt-converter` command, with the exported `.etlt` file and calibration cache (for int8 mode) on your target device. The converter utility included in this docker only works for x86 devices, with discrete NVIDIA GPU's. 

For the jetson devices, please download the converter for jetson from the dev zone link [here](https://developer.nvidia.com/tlt-converter). 

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.etlt` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

In [None]:
# Convert to TensorRT engine (FP16)
!tlt-converter -k $KEY \
               -d 3,384,1248 \
               -o NMS \
               -e $USER_EXPERIMENT_DIR/export/trt.engine \
               -m 1 \
               -t fp16 \
               -i nchw \
               $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt

# Uncomment to convert to TensorRT engine (INT8).
# !tlt-converter -k $KEY  \
#                -d 3,384,1248 \
#                -o NMS \
#                -c $USER_EXPERIMENT_DIR/export/cal.bin \
#                -e $USER_EXPERIMENT_DIR/export/trt.engine \
#                -b 8 \
#                -m 1 \
#                -t int8 \
#                -i nchw \
#                $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt

In [None]:
print('Exported engine:')
print('------------')
!ls -lh $USER_EXPERIMENT_DIR/export/trt.engine

## 10. Verify the deployed model <a class="anchor" id="head-10"></a>
Verify the converted engine by visualizing TensorRT inferences.

In [None]:
# Infer using TensorRT engine
# Note that tlt-infer currently only supports TensorRT engines with batch of 1. 
# Please make sure to use `-m 1` in tlt-converter and `--batch_size 1` in tlt-export

# When integrating with DS, please feel free to use any batch size that the GPU may be able to fit. 
# The engine batch size once created, cannot be alterred. So if you wish to run with a different batch-size,
# please re-run tlt-convert with the new batch-size for DS.

!tlt-infer ssd --trt -p $USER_EXPERIMENT_DIR/export/trt.engine \
                     -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
                     -i $DATA_DOWNLOAD_DIR/testing/image_2 \
                     -o $USER_EXPERIMENT_DIR/ssd_infer_images \
                     -t 0.4