# TLT Object detection example usecase

#### This notebook shows an example usecase of Object Detection using Transfer Learning Toolkit. **_It is not optimized for accuracy._**

0. [Set up env variables](#head-0)
1. [Prepare dataset and pre-trained model](#head-1)<br>
    1.1 [Convert to kitti format](#head-1-1)<br>
    1.2 [Prepare tf records from kitti format dataset](#head-1-2)<br>
    1.3 [Download pre-trained model](#head-1-3)<br>
2. [Provide training specfication](#head-2)
3. [Run TLT training](#head-3)
4. [Evaluate trained models](#head-4)
5. [Prune trained models](#head-5)
6. [Retrain pruned models](#head-6)
7. [Evaluate retrained model](#head-7)
8. [Test models](#head-8)
9. [Visualize inferences](#head-9)
10. [Deploy](#head-10)

## 0. Set up env variables <a class="anchor" id="head-0"></a>

Please replace the **$API_KEY** with your api key on **ngc.nvidia.com**

In [2]:
# Setting up env variables for cleaner command line commands.
print("Please replace the variable with your api key.")
%env API_KEY=aTk1bTNzdm9hamg5ZzZmaWM5aW1qczJ1Nmk6YWY1MDU3MDItNTU3NS00MzVmLWFjZWEtMTMxNWQ4NWIyMmRk
%env USER_EXPERIMENT_DIR=/data/tlt/workspace/
%env DATA_DOWNLOAD_DIR=/data/tlt/workspace/data
%env SPECS_DIR=/data/tlt/workspace/examples/specs

Please replace the variable with your api key.
env: API_KEY=aTk1bTNzdm9hamg5ZzZmaWM5aW1qczJ1Nmk6YWY1MDU3MDItNTU3NS00MzVmLWFjZWEtMTMxNWQ4NWIyMmRk
env: USER_EXPERIMENT_DIR=/data/tlt/workspace/
env: DATA_DOWNLOAD_DIR=/data/tlt/workspace/data
env: SPECS_DIR=/data/tlt/workspace/examples/specs


## 1. Prepare dataset and pre-trained model <a class="anchor" id="head-1"></a>

We will be using the pascal VOC dataset for the tutorial. To find more details please visit 
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit. Please download the dataset present at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar to $DATA_DOWNLOAD_DIR.

### 1.1 Convert to kitti format <a class="anchor" id="head-1-1"></a>

In [7]:
from voc_utils import convert_to_kitti
import os

image_width = 496
image_height = 320

DATA_DIR = os.environ['DATA_DOWNLOAD_DIR']
voc_root = DATA_DIR
convert_to_kitti(voc_root, image_height, image_width)

HBox(children=(IntProgress(value=0, max=34935), HTML(value=u'')))

KeyboardInterrupt: 

In [None]:
print(len(os.listdir(os.path.join(voc_root, 'Annotations_kitti/test'))))
print(len(os.listdir(os.path.join(voc_root, 'Annotations_kitti/trainval'))))

In [None]:
!cat $DATA_DOWNLOAD_DIR/VOCdevkit/VOC2012/Annotations_kitti/trainval/2010_001110.txt

### 1.2 Prepare tf records from kitti format dataset <a class="anchor" id="head-1-2"></a>

* Update the tfrecords spec file to take in your kitti format dataset
* Create the tfrecords using the tlt-dataset-convert 
    * Note: The output directory should be created before hand to update place the tfrecords
* TFRecords only need to be generated once.

In [None]:
print("TFrecords conversion spec file for kitti training")
!cat $SPECS_DIR/det_tfrecords_pascal_voc_trainval.txt

In [None]:
# Creating a new directory for the output tfrecords dump.
!mkdir -p $USER_EXPERIMENT_DIR/tfrecords/pascal_voc
!tlt-dataset-convert -d $SPECS_DIR/det_tfrecords_pascal_voc_trainval.txt \
                     -o $USER_EXPERIMENT_DIR/tfrecords/pascal_voc/pascal_voc

In [None]:
!ls -rlt $USER_EXPERIMENT_DIR/tfrecords/pascal_voc/

### 1.3 Download pre-trained model <a class="anchor" id="head-1-3"></a>

Print the list of models. Find your **ORG** and **TEAM** on **ngc.nvidia.com** and replace the **-o** and **-t** arguments.  

In [None]:
!tlt-pull -k $API_KEY -lm -o nvtltea -t iva

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/pretrained_resnet18/

Download the resnet18 object detection model.

In [None]:
# Pull pretrained model from NGC
!tlt-pull -k $API_KEY -m tlt_iva_object_detection_resnet18 -v 1 -d $USER_EXPERIMENT_DIR/pretrained_resnet18/ -o nvtltea -t iva

In [None]:
!ls -rlt $USER_EXPERIMENT_DIR/pretrained_resnet18

## 2. Provide training specfication <a class="anchor" id="head-2"></a>
* Tfrecords for the train datasets
    * Inorder to use the newly generated tfrecords, update the dataset_config parameter in the spec file at `$SPECS_DIR/train_resnet18_spec.txt` 
    * Update the fold number to use for evaluation. In case of random data split, please use fold 0 only
    * For sequence wise you may use any fold generated from the dataset convert tool
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [None]:
!cat $SPECS_DIR/det_train_resnet18_pascal_voc.txt

## 3. Run TLT training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models

#### The training can take several hours to complete depending on your GPU.

In [None]:
!tlt-train detection -e $SPECS_DIR/det_train_resnet18_pascal_voc.txt \
                     -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                     -k $API_KEY \
                     -n resnet18_detector

In [None]:
print("For multi-GPU, please uncomment and run this instead. Change --gpus based on your machine.")
# !tlt-train detection -e $SPECS_DIR/det_train_resnet18_pascal_voc.txt \
#                      -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_mgpu_8 \
#                      -k $API_KEY \
#                      -n resnet18_detector \
#                      --gpus 2

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -lh $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights

## 4. Evaluate trained models <a class="anchor" id="head-4"></a>

In [None]:
!tlt-evaluate detection -e $SPECS_DIR/det_train_resnet18_pascal_voc.txt\
                        -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
                        -k $API_KEY \

## 5. Prune trained models <a class="anchor" id="head-5"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations`)
* Threshold for pruning.
* API key to save and load the model
* Output directory to store the model

In [None]:
!tlt-prune -pm $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
           -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/ \
           -eq intersection \
           -pth 0.94 \
           -k $API_KEY

In [None]:
!ls -rlt $USER_EXPERIMENT_DIR/experiment_dir_pruned/

## 6. Retrain pruned models <a class="anchor" id="head-6"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification

In [None]:
# Printing the retrain export file. 
# Here we have updated the export file to include the newly pruned model as a pretrained weights.
!cat $SPECS_DIR/det_retrain_resnet18_pascal_voc.txt

In [None]:
# Retraining using the pruned model as pretrained weights 
!tlt-train detection -e $SPECS_DIR/det_retrain_resnet18_pascal_voc.txt \
                     -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
                     -k $API_KEY \
                     -n resnet18_detector_pruned

In [None]:
# Listing the newly retrained model.
!ls -rlt $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights

## 7. Evaluate retrained model <a class="anchor" id="head-7"></a>

In [None]:
!tlt-evaluate detection -e $SPECS_DIR/det_retrain_resnet18_pascal_voc.txt \
                        -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                        -k $API_KEY

## 8. Testing a model <a class="anchor" id="head-8"></a>
Inorder to use the model with a test dataset we may use the evaluate tool, with the tfrecords generated from the test dataset. The steps are similar to that in training. 
* Create a `tlt-evaluate` ingestible tfrecords using `tlt-dataset-convert`
* Use the `det_tfrecords_pascal_voc_test.txt` spec file in the `detection/specs` directory with the `tlt-evaluate` command

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/tfrecords/pascal_voc_test
!tlt-dataset-convert -d $SPECS_DIR/det_tfrecords_pascal_voc_test.txt \
                     -o $USER_EXPERIMENT_DIR/tfrecords/pascal_voc_test/pascal_voc_test

In [None]:
!tlt-evaluate detection -e $SPECS_DIR/det_test_resnet18_pascal_voc.txt\
                        -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                        -k $API_KEY

## 9. Visualize inferences <a class="anchor" id="head-9"></a>
In this section, we run the tlt-infer tool to generate inferences on the trained models. In this case, since our example notebook are trained for just 1 epoch, we may run inferences using our pretrained model uploaded to ngc.

In [None]:
# Running inference for detection on n images
!tlt-infer detection -i $USER_EXPERIMENT_DIR/data/VOCdevkit/VOC2012/JPEGImages_kitti/test \
                     -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
                     -ek $API_KEY \
                     -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
                     -cp $SPECS_DIR/det_clusterfile_pascal_voc.json \
                     -k -bo -lw 3 \
                     -g 0 \
                     -bs 64

The `tlt-infer` tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/tlt_infer_testing/images_annotated`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/tlt_infer_testing/labels`

*Note: To run inferences for a single image, simple replace the path to the -i flag in `tlt-infer` command with the path to the image.

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['USER_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx / num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the first 12 images.
OUTPUT_PATH = 'tlt_infer_testing/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 10. Deploy! <a class="anchor" id="head-10"></a>

In [None]:
!tlt-export $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
            -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
            --outputs output_cov/Sigmoid,output_bbox/BiasAdd \
            --enc_key $API_KEY \
            --input_dims 3,320,496 \
            --max_workspace_size 1100000

In [None]:
print('Exported model:')
print('------------')
!ls -lh $USER_EXPERIMENT_DIR/experiment_dir_final