# TAO Image Classification (TF2)

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080"> 

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained resnet18 model and finetune on a sample dataset converted from PascalVOC
* Prune the finetuned model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Run Inference on the trained model
* Export the pruned and retrained model to a .etlt file for deployment to DeepStream

At the end of this notebook, you will have generated a trained and optimized `classification` model
which you may deploy via [Triton](https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps)
or [DeepStream](https://developer.nvidia.com/deepstream-sdk).

### Table of Contents
This notebook shows an example use case for classification using the Train Adapt Optimize (TAO) Toolkit.

1. [Set up env variables ](#head-1)
2. [Prepare dataset and pretrained model](#head-2)
    1. [Split the dataset into train/test/val](#head-2-1)
    2. [Download pre-trained model](#head-2-2)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Prune trained models](#head-6)
7. [Retrain pruned models](#head-7)
8. [Testing the model](#head-8)
9. [Visualize inferences](#head-9)


## 1. Set up env variables and map drives <a class="anchor" id="head-0"></a>
When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/classification_tf2`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

*Note: This notebook currently is by default set up to run training using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` accordingly*

In [1]:
# Setting up env variables for cleaner command line commands.
import os

%env KEY=nvidia_tlt
%env NUM_GPUS=1


# Please define this local project directory that needs to be mapped to the TAO docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/classification_tf2
# !PLEASE MAKE SURE TO UPDATE THIS PATH!.
os.environ["LOCAL_PROJECT_DIR"] = '/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2'


os.environ["LOCAL_DATA_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "data"
)
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "classification_tf2"
)

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "tao_voc/specs"
)
#%env SPECS_DIR=/workspace/tao-experiments/classification_tf2/tao_voc/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

env: KEY=nvidia_tlt
env: NUM_GPUS=1
total 20
-rw-r--r-- 1 jupyter jupyter 1517 Jan 20 19:04 spec_16bit_imgs.yaml
-rw-r--r-- 1 jupyter jupyter 2637 Jan 20 19:07 spec_retrain_16bit_imgs.yaml
-rw-r--r-- 1 jupyter jupyter 1465 Jan 23 23:26 spec.yaml
-rw-r--r-- 1 jupyter jupyter 1698 Jan 24 17:47 spec_retrain.yaml
-rw-r--r-- 1 jupyter jupyter 1630 Jan 24 20:54 spec_retrain_qat.yaml


## 2. Prepare datasets and pre-trained model <a class="anchor" id="head-2"></a>

We will be using the pascal VOC dataset for the tutorial. To find more details please visit 
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit. Please download the dataset present at http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar to $DATA_DOWNLOAD_DIR.

In [2]:
# Check that file is present
import os
DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
print(DATA_DIR)
if not os.path.isfile(os.path.join(DATA_DIR , 'VOCtrainval_11-May-2012.tar')):
    print('tar file for dataset not found. Please download.')
else:
    print('Found dataset.')

/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data
Found dataset.


In [None]:
# unpack 
!tar -xvf $LOCAL_DATA_DIR/VOCtrainval_11-May-2012.tar -C $LOCAL_DATA_DIR 

In [3]:
# verify
!ls $LOCAL_DATA_DIR/VOCdevkit/VOC2012

Annotations  JPEGImages			 SegmentationClass
ImageSets    JPEGImages_16bit_grayscale  SegmentationObject


### A. Split the dataset into train/val/test <a class="anchor" id="head-2-1"></a>

Pascal VOC Dataset is converted to our format (for classification) and then to train/val/test in the next two blocks.

In [4]:
# install pip requirements
!pip3 install tqdm
!pip3 install matplotlib==3.3.3

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [4]:
from os.path import join as join_path
import os
import glob
import re
import shutil

DATA_DIR=os.environ.get('LOCAL_DATA_DIR')
source_dir = join_path(DATA_DIR, "VOCdevkit/VOC2012")
target_dir = join_path(DATA_DIR, "formatted")


suffix = '_trainval.txt'
classes_dir = join_path(source_dir, "ImageSets", "Main")
images_dir = join_path(source_dir, "JPEGImages")
classes_files = glob.glob(classes_dir+"/*"+suffix)
for file in classes_files:
    # get the filename and make output class folder
    classname = os.path.basename(file)
    if classname.endswith(suffix):
        classname = classname[:-len(suffix)]
        target_dir_path = join_path(target_dir, classname)
        if not os.path.exists(target_dir_path):
            os.makedirs(target_dir_path)
    else:
        continue
    print(classname)


    with open(file) as f:
        content = f.readlines()


    for line in content:
        tokens = re.split('\s+', line)
        if tokens[1] == '1':
            # copy this image into target dir_path
            target_file_path = join_path(target_dir_path, tokens[0] + '.jpg')
            src_file_path = join_path(images_dir, tokens[0] + '.jpg')
            shutil.copyfile(src_file_path, target_file_path)

cow
chair
horse
cat
bicycle
train
sheep
bottle
aeroplane
tvmonitor
bird
car
person
motorbike
diningtable
boat
sofa
pottedplant
dog
bus


In [5]:
import os
import glob
import shutil
from random import shuffle
from tqdm import tqdm

DATA_DIR=os.environ.get('LOCAL_DATA_DIR')
SOURCE_DIR=os.path.join(DATA_DIR, 'formatted')
TARGET_DIR=os.path.join(DATA_DIR,'split')
# list dir
print(os.walk(SOURCE_DIR))
dir_list = next(os.walk(SOURCE_DIR))[1]
# for each dir, create a new dir in split
for dir_i in tqdm(dir_list):
        newdir_train = os.path.join(TARGET_DIR, 'train', dir_i)
        newdir_val = os.path.join(TARGET_DIR, 'val', dir_i)
        newdir_test = os.path.join(TARGET_DIR, 'test', dir_i)
        
        if not os.path.exists(newdir_train):
                os.makedirs(newdir_train)
        if not os.path.exists(newdir_val):
                os.makedirs(newdir_val)
        if not os.path.exists(newdir_test):
                os.makedirs(newdir_test)

        img_list = glob.glob(os.path.join(SOURCE_DIR, dir_i, '*.jpg'))
        # shuffle data
        shuffle(img_list)

        for j in range(int(len(img_list)*0.7)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'train', dir_i))

        for j in range(int(len(img_list)*0.7), int(len(img_list)*0.8)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'val', dir_i))
                
        for j in range(int(len(img_list)*0.8), len(img_list)):
                shutil.copy2(img_list[j], os.path.join(TARGET_DIR, 'test', dir_i))
                
print('Done splitting dataset.')

<generator object walk at 0x7f30dd5a6f20>


100%|██████████| 20/20 [00:12<00:00,  1.66it/s]

Done splitting dataset.





In [2]:
!ls $LOCAL_DATA_DIR/split/test/cat

2008_000060.jpg  2008_006793.jpg  2009_005051.jpg  2010_003402.jpg
2008_000062.jpg  2008_006817.jpg  2009_005095.jpg  2010_003421.jpg
2008_000096.jpg  2008_006910.jpg  2009_005119.jpg  2010_003435.jpg
2008_000112.jpg  2008_006956.jpg  2009_005158.jpg  2010_003467.jpg
2008_000115.jpg  2008_006973.jpg  2009_005160.jpg  2010_003468.jpg
2008_000196.jpg  2008_007059.jpg  2009_005219.jpg  2010_003481.jpg
2008_000222.jpg  2008_007085.jpg  2009_005251.jpg  2010_003509.jpg
2008_000227.jpg  2008_007130.jpg  2010_000001.jpg  2010_003527.jpg
2008_000306.jpg  2008_007151.jpg  2010_000009.jpg  2010_003539.jpg
2008_000345.jpg  2008_007176.jpg  2010_000043.jpg  2010_003569.jpg
2008_000358.jpg  2008_007216.jpg  2010_000048.jpg  2010_003598.jpg
2008_000502.jpg  2008_007260.jpg  2010_000054.jpg  2010_003641.jpg
2008_000581.jpg  2008_007269.jpg  2010_000067.jpg  2010_003672.jpg
2008_000619.jpg  2008_007289.jpg  2010_000099.jpg  2010_003747.jpg
2008_000641.jpg  2008_007324.jpg  2010_000109.jpg  2010_003752

### B. Download pretrained models <a class="anchor" id="head-2-2"></a>

 We will use NGC CLI to get the pre-trained models. For more details, go to ngc.nvidia.com and click the SETUP on the navigation bar.

In [3]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

env: CLI=ngccli_cat_linux.zip
--2023-01-25 18:43:38--  https://ngc.nvidia.com/downloads/ngccli_cat_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)... 13.32.164.19, 13.32.164.118, 13.32.164.13, ...
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.32.164.19|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41720199 (40M) [application/zip]
Saving to: ‘/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/ngccli/ngccli_cat_linux.zip’


2023-01-25 18:43:38 (108 MB/s) - ‘/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/ngccli/ngccli_cat_linux.zip’ saved [41720199/41720199]

Archive:  /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/ngccli/ngccli_cat_linux.zip
   creating: /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/ngccli/ngc-cli/
   creating: /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/ngccli/ngc-cli/yarl/
  inflating: /home/jupyter/notebooks/tao_launcher_starter_k

In [4]:
!ngc registry model list nvidia/tao/pretrained_classification_tf2:*

+-------+-------+-------+-------+-------+-------+------+-------+-------+
| Versi | Accur | Epoch | Batch | GPU   | Memor | File | Statu | Creat |
| on    | acy   | s     | Size  | Model | y Foo | Size | s     | ed    |
|       |       |       |       |       | tprin |      |       | Date  |
|       |       |       |       |       | t     |      |       |       |
+-------+-------+-------+-------+-------+-------+------+-------+-------+
| effic |       |       |       |       |       | 45.6 | UPLOA | Dec   |
| ientn |       |       |       |       |       | MB   | D_COM | 08,   |
| et_b0 |       |       |       |       |       |      | PLETE | 2022  |
+-------+-------+-------+-------+-------+-------+------+-------+-------+


In [5]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_efficientnet_b0

In [6]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_classification_tf2:efficientnet_b0 --dest $LOCAL_EXPERIMENT_DIR/pretrained_efficientnet_b0

Downloaded 38.14 MB in 5s, Download speed: 7.61 MB/s               
--------------------------------------------------------------------------------
   Transfer id: pretrained_classification_tf2_vefficientnet_b0
   Download status: Completed
   Downloaded local path: /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/pretrained_efficientnet_b0/pretrained_classification_tf2_vefficientnet_b0-1
   Total files downloaded: 4
   Total downloaded size: 38.14 MB
   Started at: 2023-01-25 18:44:00.289519
   Completed at: 2023-01-25 18:44:05.299463
   Duration taken: 5s
--------------------------------------------------------------------------------


In [7]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_EXPERIMENT_DIR/pretrained_efficientnet_b0/pretrained_classification_tf2_vefficientnet_b0

Check that model is downloaded into dir.
total 4980
-rw------- 1 jupyter jupyter  506069 Jan 23 23:14 keras_metadata.pb
-rw------- 1 jupyter jupyter 4584557 Jan 23 23:14 saved_model.pb
drwx------ 2 jupyter jupyter    4096 Jan 23 23:14 variables


## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Training dataset
* Validation dataset
* Pre-trained models
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [16]:
!cat $LOCAL_SPECS_DIR/spec.yaml

results_dir: '/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output'
key: 'nvidia_tlt'
data:
  train_dataset_path: "/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/train"
  val_dataset_path: "/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/val"
  preprocess_mode: 'torch'
augment:
  enable_color_augmentation: True
  enable_center_crop: True
train:
  qat: False
  pretrained_model_path: ''
  batch_size_per_gpu: 64
  num_epochs: 80
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.05
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005
model:
  arch: 'efficientnet-b0'
  input_image_size: [3,256,256]
  input_image_depth: 8
evaluate:
  dataset_path: "/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/test"
  model_path: "/home/jupyter/notebooks/tao_launcher_starter_k

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models

In [17]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/output
!sed -i "s|RESULTSDIR|$LOCAL_EXPERIMENT_DIR/output|g" $LOCAL_SPECS_DIR/spec.yaml
!sed -i "s|ENC_KEY|$KEY|g" $LOCAL_SPECS_DIR/spec.yaml

In [18]:
!classification_tf2 train -e $LOCAL_SPECS_DIR/spec.yaml

Setting up communication with ClearML server.
ClearML task init failed with error ClearML configuration could not be found (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at https://app.clear.ml
Training will still continue.
Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output/status.json
Starting classification training.
Found 16268 images belonging to 20 classes.
Processing dataset (train): /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/train
Found 4531 images belonging to 20 classes.
Processing dataset (validation): /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/val
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Conne

In [24]:
print("To run this training in data parallelism using multiple GPU's, please uncomment the line below and "
      "update the --gpus parameter to the number of GPU's you wish to use.")
# !classification_tf2 train -e $LOCAL_SPECS_DIR/spec.yaml --gpus 2

To run this training in data parallelism using multiple GPU's, please uncomment the line below and update the --gpus parameter to the number of GPU's you wish to use.


In [25]:
print("To resume from a checkpoint,  just relaunch training with the same spec file.")
# !classification_tf2 train -e $LOCAL_SPECS_DIR/spec.yaml --gpus 2

To resume from a checkpoint,  just relaunch training with the same spec file.


## 5. Evaluate trained models <a class="anchor" id="head-5"></a>

In this step, we assume that the training is complete and the model from the final epoch (`efficientnet-b0_080.tlt`) is available. If you would like to run evaluation on an earlier model, please edit the spec file at `$SPECS_DIR/spec.yaml` to point to the intended model.

In [19]:
# get the last checkpoints
last_checkpoint = ''
for f in os.listdir(os.path.join(os.environ["LOCAL_EXPERIMENT_DIR"],'output', 'weights')):
    if f.startswith('efficientnet-b'):
        last_checkpoint = last_checkpoint if last_checkpoint > f else f
print(f'Last checkpoint: {last_checkpoint}')

Last checkpoint: efficientnet-b0_080.tlt


In [20]:
# Set LAST_CHECKPOINT in the spec file
%env LAST_CHECKPOINT={last_checkpoint}
!sed -i "s|EVALMODEL|$LOCAL_EXPERIMENT_DIR/output/weights/$LAST_CHECKPOINT|g" $LOCAL_SPECS_DIR/spec.yaml

env: LAST_CHECKPOINT=efficientnet-b0_080.tlt


In [21]:
!classification_tf2 evaluate -e $LOCAL_SPECS_DIR/spec.yaml

Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output/status.json
Starting classification evaluation.
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Input (InputLayer)             [(None, 3, 256, 256  0           []                               
                                )]                                                                
                                                                                                  
 stem_conv (Conv2D)             (None, 32, 128, 128  864         ['Input[0][0]']                  
                                )                                                                 
                                                                                                  
 stem_bn (Batc

## 6. Prune trained models <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Equalization criterion
* Threshold for pruning
* Exclude prediction layer that you don't want pruned (e.g. predictions)

Usually, you just need to adjust `prune.threshold` for accuracy and model size trade off. Higher `threshold` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is depend on the dataset. 0.68 is just a starting point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [22]:
# Specifying the checkpoint to be used for the pruning.
!mkdir -p $LOCAL_EXPERIMENT_DIR/output/efficientnet-b0_pruned
!sed -i "s|PRUNEDMODEL|$LOCAL_EXPERIMENT_DIR/output/efficientnet-b0_pruned/model_pruned.tlt|g" $LOCAL_SPECS_DIR/spec.yaml
!classification_tf2 prune -e $LOCAL_SPECS_DIR/spec.yaml

Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output/status.json
Starting classification pruning.
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Input (InputLayer)             [(None, 3, 256, 256  0           []                               
                                )]                                                                
                                                                                                  
 stem_conv (Conv2D)             (None, 32, 128, 128  864         ['Input[0][0]']                  
                                )                                                                 
                                                                                                  
 stem_bn (BatchNo

In [8]:
print('Pruned model:')
print('------------')
!ls -rlt $LOCAL_EXPERIMENT_DIR/output/efficientnet-b0_pruned

Pruned model:
------------
total 7792
-rw-r--r-- 1 jupyter jupyter 7975204 Jan 23 23:27 model_pruned.tlt


## 7. Retrain pruned models <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification

In [21]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/output_retrain
!sed -i "s|RESULTSDIR|$LOCAL_EXPERIMENT_DIR/output_retrain|g" $LOCAL_SPECS_DIR/spec_retrain.yaml
!sed -i "s|ENC_KEY|$KEY|g" $LOCAL_SPECS_DIR/spec_retrain.yaml
!sed -i "s|PRUNEDMODEL|$LOCAL_EXPERIMENT_DIR/output/efficientnet-b0_pruned/model_pruned.tlt|g" $LOCAL_SPECS_DIR/spec_retrain.yaml

!cat $LOCAL_SPECS_DIR/spec_retrain.yaml

results_dir: '/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain'
key: 'nvidia_tlt'
data:
  train_dataset_path: "/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/train"
  val_dataset_path: "/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/val"
  preprocess_mode: 'torch'
augment:
  enable_color_augmentation: True
  enable_center_crop: True
train:
  qat: False
  pretrained_model_path: '/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output/efficientnet-b0_pruned/model_pruned.tlt'
  batch_size_per_gpu: 64
  num_epochs: 80
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.05
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005
model:
  arch: 'efficientnet-b0'
  input_image_size: [3,256,256]
  input_image_depth: 8
evaluate:
  dataset_path: '/home

In [22]:
!classification_tf2 train -e $LOCAL_SPECS_DIR/spec_retrain.yaml

Setting up communication with ClearML server.
ClearML task init failed with error ClearML configuration could not be found (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at https://app.clear.ml
Training will still continue.
Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain/status.json
Starting classification training.
Found 16268 images belonging to 20 classes.
Processing dataset (train): /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/train
Found 4531 images belonging to 20 classes.
Processing dataset (validation): /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/val
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #  

In [23]:
# Retrain with QAT (Optional)
!mkdir -p $LOCAL_EXPERIMENT_DIR/output_retrain_qat
!sed -i "s|RESULTSDIR|$LOCAL_EXPERIMENT_DIR/output_retrain_qat|g" $LOCAL_SPECS_DIR/spec_retrain_qat.yaml
!sed -i "s|ENC_KEY|$KEY|g" $LOCAL_SPECS_DIR/spec_retrain_qat.yaml
!sed -i "s|PRUNEDMODEL|$LOCAL_EXPERIMENT_DIR/output/efficientnet-b0_pruned/model_pruned.tlt|g" $LOCAL_SPECS_DIR/spec_retrain_qat.yaml

!cat $LOCAL_SPECS_DIR/spec_retrain_qat.yaml

results_dir: '/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain_qat'
key: 'nvidia_tlt'
data:
  train_dataset_path: "/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/train"
  val_dataset_path: "/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/val"
  preprocess_mode: 'torch'
augment:
  enable_color_augmentation: True
  enable_center_crop: True
train:
  qat: True
  pretrained_model_path: '/home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output/efficientnet-b0_pruned/model_pruned.tlt'
  batch_size_per_gpu: 64
  num_epochs: 80
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.05
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005
model:
  arch: 'efficientnet-b0'
  input_image_size: [3,256,256]
  input_image_depth: 8
evaluate:
  dataset_path: '/h

In [None]:
!classification_tf2 train -e $LOCAL_SPECS_DIR/spec_retrain_qat.yaml

Setting up communication with ClearML server.
ClearML task init failed with error ClearML configuration could not be found (missing `~/clearml.conf` or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own `clearml-server`, or create a free account at https://app.clear.ml
Training will still continue.
Starting classification training.
Found 16268 images belonging to 20 classes.
Processing dataset (train): /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/train
Found 4531 images belonging to 20 classes.
Processing dataset (validation): /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/data/split/val
No training configuration found in save file, so the model was *not* compiled. Compile it manually.
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Inp

## 8. Testing the model! <a class="anchor" id="head-8"></a>

In this step, we assume that the training is complete and the model from the final epoch (`efficientnet-b0_080.tlt`) is available. If you would like to run evaluation on an earlier model, please edit the spec file at `$SPECS_DIR/spec_retrain.yaml` to point to the intended model.

In [24]:
# get the last checkpoints
last_checkpoint = ''
for f in os.listdir(os.path.join(os.environ["LOCAL_EXPERIMENT_DIR"],'output_retrain', 'weights')):
    if f.startswith('efficientnet-b'):
        last_checkpoint = last_checkpoint if last_checkpoint > f else f
print(f'Last checkpoint: {last_checkpoint}')

Last checkpoint: efficientnet-b0_080.tlt


In [25]:
# Set LAST_CHECKPOINT in the spec file
%env LAST_CHECKPOINT={last_checkpoint}
!sed -i "s|EVALMODEL|$LOCAL_EXPERIMENT_DIR/output/weights/$LAST_CHECKPOINT|g" $LOCAL_SPECS_DIR/spec_retrain.yaml

env: LAST_CHECKPOINT=efficientnet-b0_080.tlt


In [26]:
!classification_tf2 evaluate -e $LOCAL_SPECS_DIR/spec_retrain.yaml

Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain/status.json
Starting classification evaluation.
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Input (InputLayer)             [(None, 3, 256, 256  0           []                               
                                )]                                                                
                                                                                                  
 stem_conv (Conv2D)             (None, 32, 128, 128  864         ['Input[0][0]']                  
                                )                                                                 
                                                                                                  
 stem_

In [27]:
# evaluate with the QAT model
# get the last checkpoints
last_checkpoint = ''
for f in os.listdir(os.path.join(os.environ["LOCAL_EXPERIMENT_DIR"],'output_retrain_qat', 'weights')):
    if f.startswith('efficientnet-b'):
        last_checkpoint = last_checkpoint if last_checkpoint > f else f
print(f'Last checkpoint: {last_checkpoint}')

Last checkpoint: efficientnet-b0_076.tlt


In [28]:
# Set LAST_CHECKPOINT in the spec file
%env LAST_CHECKPOINT={last_checkpoint}
!sed -i "s|EVALMODEL|$LOCAL_EXPERIMENT_DIR/output/weights/$LAST_CHECKPOINT|g" $LOCAL_SPECS_DIR/spec_retrain_qat.yaml

env: LAST_CHECKPOINT=efficientnet-b0_076.tlt


In [15]:
!classification_tf2 evaluate -e $LOCAL_SPECS_DIR/spec_retrain_qat.yaml

Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain_qat/status.json
Starting classification evaluation.
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Input (InputLayer)             [(None, 3, 256, 256  0           []                               
                                )]                                                                
                                                                                                  
 stem_conv (Conv2D)             (None, 32, 128, 128  864         ['Input[0][0]']                  
                                )                                                                 
                                                                                                  
 s

## 9. Visualize Inferences <a class="anchor" id="head-9"></a>

To see the output results of our model on test images, we can use the `tao inference` tool. Note that using models trained for higher epochs will usually result in better results. We'll run inference with the directory mode. You can also use the single image mode.

In [29]:
!classification_tf2 inference -e $LOCAL_SPECS_DIR/spec_retrain.yaml

Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain/status.json
Starting classification inference.
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Input (InputLayer)             [(None, 3, 256, 256  0           []                               
                                )]                                                                
                                                                                                  
 stem_conv (Conv2D)             (None, 32, 128, 128  864         ['Input[0][0]']                  
                                )                                                                 
                                                                                                  
 stem_b

In [30]:
!cat $LOCAL_EXPERIMENT_DIR/output_retrain/result.csv

2008_000021.jpg,aeroplane,0.69029677
2008_000033.jpg,aeroplane,0.9979176
2008_000037.jpg,aeroplane,0.99230886
2008_000151.jpg,aeroplane,0.7522845
2008_000197.jpg,aeroplane,0.94808793
2008_000251.jpg,aeroplane,0.9174481
2008_000291.jpg,aeroplane,0.4680187
2008_000716.jpg,aeroplane,0.9716223
2008_000756.jpg,aeroplane,0.96800965
2008_000804.jpg,aeroplane,0.9380956
2008_000883.jpg,aeroplane,0.6955378
2008_001380.jpg,aeroplane,0.9926408
2008_001546.jpg,aeroplane,0.995372
2008_001774.jpg,bird,0.72103626
2008_001805.jpg,aeroplane,0.9083259
2008_001971.jpg,aeroplane,0.95436054
2008_002000.jpg,aeroplane,0.9781818
2008_002138.jpg,aeroplane,0.7355203
2008_002221.jpg,aeroplane,0.97134733
2008_002454.jpg,aeroplane,0.64140695
2008_002551.jpg,aeroplane,0.96684957
2008_002673.jpg,aeroplane,0.9500102
2008_002698.jpg,aeroplane,0.96409625
2008_002977.jpg,car,0.73342836
2008_003059.jpg,bird,0.62112254
2008_003196.jpg,bird,0.47354585
2008_003369.jpg,aeroplane,0.9899255
2008_003478.jpg,person,0.7159678
2008

In [31]:
# Run inference with the QAT model
!classification_tf2 inference -e $LOCAL_SPECS_DIR/spec_retrain_qat.yaml

Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain_qat/status.json
Starting classification inference.
Model: "efficientnet-b0"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Input (InputLayer)             [(None, 3, 256, 256  0           []                               
                                )]                                                                
                                                                                                  
 stem_conv (Conv2D)             (None, 32, 128, 128  864         ['Input[0][0]']                  
                                )                                                                 
                                                                                                  
 st

In [20]:
!cat $LOCAL_EXPERIMENT_DIR/output_retrain_qat/result.csv

2008_000021.jpg,aeroplane,0.6958836
2008_000033.jpg,aeroplane,0.99474055
2008_000037.jpg,aeroplane,0.9944882
2008_000151.jpg,aeroplane,0.66739255
2008_000197.jpg,aeroplane,0.9468485
2008_000251.jpg,aeroplane,0.8522675
2008_000291.jpg,aeroplane,0.44891924
2008_000716.jpg,aeroplane,0.98090667
2008_000756.jpg,aeroplane,0.95618707
2008_000804.jpg,aeroplane,0.9329324
2008_000883.jpg,aeroplane,0.5838565
2008_001380.jpg,aeroplane,0.98914766
2008_001546.jpg,aeroplane,0.99455017
2008_001774.jpg,bird,0.8108868
2008_001805.jpg,aeroplane,0.95129406
2008_001971.jpg,aeroplane,0.9500996
2008_002000.jpg,aeroplane,0.9681667
2008_002138.jpg,aeroplane,0.73988605
2008_002221.jpg,aeroplane,0.9349809
2008_002454.jpg,aeroplane,0.6407609
2008_002551.jpg,aeroplane,0.9522527
2008_002673.jpg,aeroplane,0.9458077
2008_002698.jpg,aeroplane,0.962953
2008_002977.jpg,car,0.6343677
2008_003059.jpg,bird,0.5459438
2008_003196.jpg,bird,0.5170918
2008_003369.jpg,aeroplane,0.9837452
2008_003478.jpg,person,0.72584975
2008_00

## 10. Export and Deploy! <a class="anchor" id="head-10"></a>

In [32]:
!mkdir -p $LOCAL_EXPERIMENT_DIR/export

!sed -i "s|EXPORTDIR|$LOCAL_EXPERIMENT_DIR/export|g" $LOCAL_SPECS_DIR/spec_retrain.yaml
!classification_tf2 export -e $LOCAL_SPECS_DIR/spec_retrain.yaml

Log file already exists at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/output_retrain/status.json
Starting classification export.
Signatures found in model: [serving_default].
Output names: ['predictions']
Using tensorflow=2.9.1, onnx=1.12.0, tf2onnx=1.12.0/ddca3a
Using opset <onnx, 13>
Computed 0 values for constant folding
Optimizing ONNX model
After optimization: BatchNormalization -42 (49->7), Cast -1 (33->32), Const -378 (569->191), GlobalAveragePool +16 (0->16), Identity -2 (2->0), ReduceMean -16 (16->0), Reshape -16 (33->17), Transpose -17 (17->0), Unsqueeze -64 (64->0)
The etlt model is saved at /home/jupyter/notebooks/tao_launcher_starter_kit/classification_tf2/classification_tf2/export/efficientnet-b0.etlt
Export finished successfully.
Sending telemetry data.
Telemetry data couldn't be sent, but the command ran successfully.
[Error]: <urlopen error [Errno -2] Name or service not known>
Execution status: PASS


In [33]:
# Check if etlt model is correctly saved.
!ls -l $LOCAL_EXPERIMENT_DIR/export

total 15852
-rw-r--r-- 1 jupyter jupyter 16228384 Jan 25 19:36 efficientnet-b0.etlt


Using the `tao-deploy` container, you can generate a TensorRT engine and verify the correctness of the generated through evaluate and inference.

The `tao-deploy` produces optimized tensorrt engines for the platform that it resides on. Therefore, to get maximum performance, please run `tao-deploy` command which will instantiate a deploy container, with the exported `.etlt` file on your target device. The `tao-deploy` container only works for x86, with discrete NVIDIA GPU's.

For the jetson devices, please download the tao-converter for jetson and refer to [here](https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#installing-the-tao-converter) for more details.

If you choose to integrate your model into deepstream directly, you may do so by simply copying the exported `.etlt` file along with the calibration cache to the target device and updating the spec file that configures the `gst-nvinfer` element to point to this newly exported model. Usually this file is called `config_infer_primary.txt` for detection models and `config_infer_secondary_*.txt` for classification models.

In [None]:
# Convert to TensorRT engine (FP32).
!tao-deploy classification_tf2 gen_trt_engine -e $SPECS_DIR/spec_retrain.yaml

In [None]:
# Convert to TensorRT engine (INT8).
!sed -i "s|fp32|int8|g" $LOCAL_SPECS_DIR/spec_retrain.yaml
!tao-deploy classification_tf2 gen_trt_engine -e $SPECS_DIR/spec_retrain.yaml

In [None]:
print('Exported model:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/export/

In [None]:
# Convert QAT model to TensorRT engine
!mkdir -p $LOCAL_EXPERIMENT_DIR/export_qat
!sed -i "s|EXPORTDIR|$USER_EXPERIMENT_DIR/export_qat|g" $LOCAL_SPECS_DIR/spec_retrain_qat.yaml
!tao classification_tf2 export -e $SPECS_DIR/spec_retrain_qat.yaml

In [None]:
print('Exported QAT model:')
print('------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/export_qat/

## 11. Verify the deployed model <a class="anchor" id="head-11"></a>

Verify the converted engine by visualizing TensorRT inferences.

In [None]:
# Set engine as model_path
!sed -i "s|$USER_EXPERIMENT_DIR/output/weights/$LAST_CHECKPOINT|$USER_EXPERIMENT_DIR/export/efficientnet-b0.fp32.engine|g" $LOCAL_SPECS_DIR/spec_retrain.yaml
!sed -i "s|batch_size: 256|batch_size: 16|g" $LOCAL_SPECS_DIR/spec_retrain.yaml
# Running inference 
!tao-deploy classification_tf2 inference -e $SPECS_DIR/spec_retrain.yaml

In [None]:
!cat $LOCAL_EXPERIMENT_DIR/output_retrain/result.csv