# TLT Classification example usecase

#### This notebook shows an example use case for classification using the Transfer Learning Toolkit. **_It is not optimized for accuracy._**

0. [Set up env variables](#head-0)
1. [Prepare dataset and pretrained model](#head-1)<br>
    1.1 [Split the dataset into train/test/val](#head-1-1)<br>
    1.2 [Download pre-trained model](#head-1-2)<br>
2. [Provide training specfication](#head-2)
3. [Run TLT training](#head-3)
4. [Evaluate trained models](#head-4)
5. [Prune trained models](#head-5)
6. [Retrain pruned models](#head-6)
7. [Testing the model](#head-7)
8. [Visualize inferences](#head-8)
0. [Export and Deploy!](#head-9)

## 0. Setup env variables <a class="anchor" id="head-0"></a>

Please replace the **$API_KEY** with your api key on **ngc.nvidia.com**

In [None]:
%env USER_EXPERIMENT_DIR=/workspace/tlt-experiments
%env DATA_DOWNLOAD_DIR=/workspace/tlt-experiments/data
%env SPECS_DIR=/workspace/examples/specs
%env API_KEY=$API_KEY

## 1. Prepare datasets and pre-trained model <a class="anchor" id="head-1"></a>

We will be using the pascal VOC dataset for the tutorial. To find more details please visit 
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit. Please download the dataset present at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar to $DATA_DOWNLOAD_DIR.

In [None]:
# Check that file is present
import os
DATA_DIR = os.environ.get('DATA_DOWNLOAD_DIR')
if not os.path.isfile(os.path.join(DATA_DIR , 'VOCtrainval_11-May-2012.tar')):
    print('tar file for dataset not found. Please download.')
else:
    print('Found dataset.')

In [None]:
# unpack 
!tar -xvf $DATA_DOWNLOAD_DIR/VOCtrainval_11-May-2012.tar -C $DATA_DOWNLOAD_DIR 

In [None]:
# verify
!ls $DATA_DOWNLOAD_DIR/VOCdevkit/VOC2012

### 1.1 Split the dataset into train/val/test <a class="anchor" id="head-1-1"></a>

Pascal VOC Dataset is converted to our format (for classification) and then to train/val/test in the next two blocks.

In [None]:
from os.path import join as join_path
import os
import glob
import re
import shutil

DATA_DIR=os.environ.get('DATA_DOWNLOAD_DIR')
source_dir = join_path(DATA_DIR, "VOCdevkit/VOC2012")
target_dir = join_path(DATA_DIR, "formatted")


suffix = '_trainval.txt'
classes_dir = join_path(source_dir, "ImageSets", "Main")
images_dir = join_path(source_dir, "JPEGImages")
classes_files = glob.glob(classes_dir+"/*"+suffix)
for file in classes_files:
    # get the filename and make output class folder
    classname = os.path.basename(file)
    if classname.endswith(suffix):
        classname = classname[:-len(suffix)]
        target_dir_path = join_path(target_dir, classname)
        if not os.path.exists(target_dir_path):
            os.makedirs(target_dir_path)
    else:
        continue
    print(classname)


    with open(file) as f:
        content = f.readlines()


    for line in content:
        tokens = re.split('\s+', line)
        if tokens[1] == '1':
            # copy this image into target dir_path
            target_file_path = join_path(target_dir_path, tokens[0] + '.jpg')
            src_file_path = join_path(images_dir, tokens[0] + '.jpg')
            shutil.copyfile(src_file_path, target_file_path)

In [None]:
import os
import glob
import shutil
from random import shuffle

DATA_DIR=os.environ.get('DATA_DOWNLOAD_DIR')
SOURCE_DIR=join_path(DATA_DIR, 'formatted')
TARGET_DIR=os.path.join(DATA_DIR,'split')
# list dir
dir_list = os.walk(SOURCE_DIR).next()[1]
# for each dir, create a new dir in split
for dir_i in dir_list:
        # print("Splitting {}".format(dir_i))
        newdir_train = os.path.join(TARGET_DIR, 'train', dir_i)
        newdir_val = os.path.join(TARGET_DIR, 'val', dir_i)
        newdir_test = os.path.join(TARGET_DIR, 'test', dir_i)
        
        if not os.path.exists(newdir_train):
                os.makedirs(newdir_train)
        if not os.path.exists(newdir_val):
                os.makedirs(newdir_val)
        if not os.path.exists(newdir_test):
                os.makedirs(newdir_test)

        img_list = glob.glob(os.path.join(SOURCE_DIR, dir_i, '*.jpg'))
        # shuffle data
        shuffle(img_list)

        for j in range(int(len(img_list)*0.7)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'train', dir_i))

        for j in range(int(len(img_list)*0.7), int(len(img_list)*0.8)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'val', dir_i))
                
        for j in range(int(len(img_list)*0.8), len(img_list)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'test', dir_i))
                
print('Done splitting dataset.')

In [None]:
!ls $DATA_DOWNLOAD_DIR/split/test/cat

### 1.2 Download pretrained models <a class="anchor" id="head-1-2"></a>

Print the list of available models. Find your **ORG** and **TEAM** on ngc.nvidia.com and replace the **-o** and **-t** arguments.

In [None]:
!tlt-pull -k $API_KEY -lm -o nvtltea -t iva

Download the resnet18 classification model.

In [None]:
!tlt-pull -d $USER_EXPERIMENT_DIR -k $API_KEY  -m tlt_iva_classification_resnet18 -v 1 -o nvtltea -t iva

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $USER_EXPERIMENT_DIR

## 2. Provide training specfication <a class="anchor" id="head-2"></a>
* Training dataset
* Validation dataset
* Pre-trained models
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [None]:
!cat $SPECS_DIR/classification_spec.cfg

## 3. Run TLT training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models

In [None]:
print('Create an output dir')
!mkdir $USER_EXPERIMENT_DIR/output

In [None]:
print('Model checkpoints and logs:')
print('---------------------')
!ls -l $USER_EXPERIMENT_DIR/output

### Please change the **train_dataset_path, val_dataset_path, pretrained_model_path** in the spec file below if these values are different. 

In [None]:
print("Check spec file")

!cat $SPECS_DIR/classification_spec.cfg

In [None]:
!tlt-train classification -e $SPECS_DIR/classification_spec.cfg -r $USER_EXPERIMENT_DIR/output -k $API_KEY

## 4. Evaluate trained models <a class="anchor" id="head-4"></a>

In [None]:
!tlt-evaluate classification -d $DATA_DOWNLOAD_DIR/split/test \
                               -pm $USER_EXPERIMENT_DIR/output/weights/resnet_001.tlt \
                               -b 32 -k $API_KEY

## 5. Prune trained models <a class="anchor" id="head-5"></a>
* Specify pre-trained model
* Equalization criterion
* Threshold for pruning
* Exclude prediction layer that you don't want pruned (e.g. predictions)

In [None]:
!tlt-prune -pm $USER_EXPERIMENT_DIR/output/weights/resnet_001.tlt \
                -o $USER_EXPERIMENT_DIR/output/resnet_001_pruned \
                -eq union \
                -pth 0.7 -k $API_KEY

In [None]:
print('Pruned model:')
print('------------')
!ls -1 $USER_EXPERIMENT_DIR/output/resnet_001_pruned

## 6. Retrain pruned models <a class="anchor" id="head-6"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification

### Please change the **train_dataset_path, val_dataset_path, pretrained_model_path** in the spec file below if these values are different. 

In [None]:
!cat $SPECS_DIR/classification_retrain_spec.cfg

In [None]:
!tlt-train classification -e $SPECS_DIR/classification_retrain_spec.cfg -r $USER_EXPERIMENT_DIR/output_retrain -k $API_KEY

## 7. Testing the model! <a class="anchor" id="head-7"></a>

In [None]:
!tlt-evaluate classification -d $DATA_DOWNLOAD_DIR/split/test \
                               -pm $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_001.tlt \
                               -b 32 -k $API_KEY

## 8. Visualize Inferences <a class="anchor" id="head-8"></a>

To see the output results of our model on test images, we can use the tlt-infer tool. Note that using models trained for higher epochs will result in better results. We'll run inference on a directory of images.

In [None]:
!tlt-infer classification -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_001.tlt \
                          -k $API_KEY -b 32 -d $DATA_DOWNLOAD_DIR/split/test/person \
                          -cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

Optionally, you can also run inference on a single image. Uncomment the code below for an example.

In [None]:
#!tlt-infer classification -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_001.tlt \
#                          -k $API_KEY -b 32 -i $DATA_DOWNLOAD_DIR/split/test/person/2008_000032.jpg \
#                          -cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

As explained in Getting Started Guide, this outputs a results.csv file in the same directory. We can use a simple python program to see the visualize the output of csv file.

In [None]:
import matplotlib.pyplot as plt
from PIL import Image 
import os
import csv
from math import ceil

DATA_DIR = os.environ.get('DATA_DOWNLOAD_DIR')
csv_path = os.path.join(DATA_DIR, 'split', 'test', 'person', 'result.csv')
results = []
with open(csv_path) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        results.append((row[1], row[2]))

w,h = 200,200
fig = plt.figure(figsize=(30,30))
columns = 5
rows = 1
for i in range(1, columns*rows + 1):
    ax = fig.add_subplot(rows, columns,i)
    img = Image.open(results[i][0])
    img = img.resize((w,h), Image.ANTIALIAS)
    plt.imshow(img)
    ax.set_title(results[i][1], fontsize=40)

## 9. Export and Deploy! <a class="anchor" id="head-9"></a>

In [None]:
!tlt-export $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_001.tlt \
                --input_dim 3,224,224 \
                -o $USER_EXPERIMENT_DIR/export/final_model.uff \
                --enc_key $API_KEY \
                --outputs predictions/Softmax

In [None]:
print('Exported model:')
print('------------')
!ls -lh $USER_EXPERIMENT_DIR/export/