<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# Introduction to the TAO Toolkit #
The TAO Toolkit, Train Adapt Optimize, is a framework that simplifies the AI/ML model development workflow. It lets developers fine-tune pretrained models with custom data to produce highly accurate computer vision models efficiently, eliminating the need for large training runs and deep AI expertise. In addition, it also enables model optimization for inference performance. 

<p><img src="images/tao_toolkit.png" width=720></p>

## Learning Objectives ##
In this notebook, you will gain the foundational understanding necessary to use the TAO Toolkit effectively, including: 
* Video AI Model Training Challenges
* What is Transfer Learning
* How to Optimize AI Models for Video AI Applications
* How to Use the TAO Toolkit CLI
* Pre-trained Models Support by the TAO Toolkit

**Table of Contents**<br>
This notebooks covers the below sections: 
1. [Video AI Model Training Workflow](#s1)
    * [Deep Learning Challenges](#s1.1)
    * [Transfer Learning](#s1.2)
    * [TAO Toolkit for Video AI](#s1.3)
2. [Video AI Pre-trained Models Supported](#s2)
3. [TAO Toolkit Workflow](#s3)
    * [TAO Launcher, CLI (Command Line Interface), and Spec Files](#s3.1)
    * [Exercise #1 - Explore TAO Toolkit CLI](#e1)

<a name='s1'></a>
## Video AI Model Training Workflow ## 
At the heart of a video AI application is one or more deep learning models for extracting insights as such as detecting cars and classifying them. They are tuned and optimized to deliver the right level of accuracy and performance. Building a deep learning model consists of several steps, including collecting large, high-quality data sets, preparing the data, training the model, and optimizing the model for deployment. When we train a neural network model, we leverage its ability to perform automatic feature extraction from raw data and associate them to our target. Deep learning model performance increases when we train with more data, but it's a time consuming and computationally intensive process. Once a model is trained, it can be deployed and used for inference. Given the complex nature of the computation involved, models can be large and become a bottle neck for the video AI application. To ensure that the streaming analytics pipeline are effective, the video AI models have to be efficient without sacrificing on the accuracy.
<p><img src='images/video_ai_model_training_workflow.png' width=720></p>

<a name='s1.1'></a>
### Deep Learning Challenges ###
There are some common challenges related to building deep learning models for video AI applications: 
* Requires knowledge of one or more deep learning frameworks, such as [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), or [Caffe](https://caffe.berkeleyvision.org/). 
* Training accurate deep learning models from scratch requires a large amount of data and acquiring them is a costly process. 
* Deep learning models require significant effort to fine-tune before it is optimized for inference and production ready. 

<a name='s1.2'></a>
### Transfer Learning ###
In practice, it is rare and inefficient to initiate the learning task on a network with randomly initialized weights due to factors like data scarcity (inadequate number of training samples) or prolonged training times. One of the most common solutions to overcome this is to use transfer learning. Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where developers use a model trained on one task and re-train to use it on a different task. This works surprisingly well as many of the early layers in a neural network are the same for similar tasks. For example, many of the early layers in a convolutional neural network used for a Computer Vision (CV) model are primarily used to identify outlines, curves, and other features in an image. The network formed by these layers are refered to as the **backbone** of a more complex model. Also known as feature extractors, they take as input the image and extracts the feature map upon which the rest of the network is based. The learned features from these layers can be applied to similar tasks carrying out the same identification in other domains. Transfer learning enables adaptation (fine-tuning) of an existing neural network to a new one, which requires significantly less domain-specific data. In most cases, fine-tuning takes significantly less time (a reduction by x10 factor is common), saving time and resources. As it relates to vision AI, transfer learning is used for scene adaptaion by transferring weights from one application to another, adapting to a new point of view or a camera angle. Transfer learning is also used for adding new classifications. 

<p><img src='images/transfer_learning.png' width=720></p>

More information about transfer learning can be found in this [blog](https://blogs.nvidia.com/blog/2019/02/07/what-is-transfer-learning/).

<a name='s1.3'></a>
### TAO Toolkit for Video AI ###
The TAO Toolkit uses pre-trained models to accelerate the AI development process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for video AI applications in smart cities, retail, healthcare, industrial inspection and more. The TAO Toolkit offers useful features such as: 
* Zero-coding approach that requires no AI framework expertise, reducing the barrier of entry for anyone who wants to get started building video AI applications. 
* Flexible configurations that allow customization to help advance users prototype faster. 
* Large catalogue of production-ready pre-trained models for common CV tasks that can also be customized with users' own data. 
* Easy to use interface for model optimization such as pruning and quantization-aware training. 
* Integration with the DeepStream SDK
<p><img src='images/transfer_learning.jpg' width=540></p>

<a name='s2'></a>
## Video AI Pre-trained Models Supported ##
Developers, system builders, and software partners building video AI applications and services can bring their own custom data to train with and fine-tune pre-trained models quickly instead of going through significant effort in large data collection and training from scratch. There are two types of pre-trained models that users can start with: **general purpose vision models** and **purpose-built pre-trained models**. 

* **General purpose vision models** provide pre-trained weights for popular network architectures to train an image classification model, an object detection model, or a segmentation model. This gives users the flexibility and control to build AI models for any number of applications, from smaller lightweight models for edge deployment to larger models for more complex tasks. They are trained on [Open Images](https://opensource.google/projects/open-images-dataset) data set and provide a much better starting point for training versus training from scratch or starting from random weights. 

    The TAO Toolkit adapts popular network architectures and backbones to custom data, allowing developers to train, fine tune, prune, and export highly optimized and accurate AI models. When working with TAO, first choose the model architecture to be built, then choose one of the supported backbones. 
<p><img src='images/tao_matrix.png' width=720></p>

    _Note: The pre-trained weights from each feature extraction network merely act as a starting point and may not be used without re-training. In addition, the pre-trained weights are network specific and shouldn't be shared across models that use different architectures._

* **Purpose-built pre-trained models** are production-quality models that are built for high accuracy and performance. They are trained on millions of objects for common video AI tasks and provide an excellent starting point for any application in smart city, retail, public safety, healthcre, and others. Purpose-built models are freely available on [NGC](https://ngc.nvidia.com/). For each model, there is a pruned version that can be deployed as is or an unpruned version which can be used to re-train with more data for specific use cases. 
<p><img src='images/purpose-built_models_table.png' width=720></p>

    Find the complete list and details [here](https://docs.nvidia.com/tao/tao-toolkit/text/overview.html#pre-trained-models). 

* _No third party pre-trained models are supported by the TAO Toolkit. Only NVIDIA pre-trained models from NGC are currently supported which can be retrained with custom data._

<a name='s3'></a>
### TAO Toolkit Workflow ###
Building video AI systems and applications is hard. And tailoring even a single component to the needs of the enterprise for deployment is even harder. Deployment for a domain-specific application typically requires several cycles of re-training, fine-tuning, and deploying the model until it satisfies the requirements. It typically follows the below steps: 

0. Configuration
1. Download a pre-trained model from NGC
2. Prepare the data for training
3. Train the model using transfer learning
4. Evaluate the model for target predictions
5. Optimize the model for inference performance
6. Export the model for inference

<p><img src='images/tao_toolkit_workflow.png' width=1080></p>

<a name='s3.1'></a>
### TAO Launcher, CLI (Command Line Interface), and Spec Files ###
The TAO Toolkit is a zero-coding framework that makes it easy to get started. It uses a **launcher** to pull from NGC registry and instantiate the appropriate TAO container that performs the desired subtasks such as convert data, train, evaluate, or export. Users interact with the launcher with its **Command Line Interface** that is configured using simple [**Protocol Buffer**](https://developers.google.com/protocol-buffers) **specification files** to include parameters such as the data set parameters, model parameters, and optimizer and training hyperparameters. More information about the TAO Toolkit Launcher can be found in the [TAO Docs](https://docs.nvidia.com/tao/tao-toolkit/text/tao_launcher.html#tao-launcher). 

_Note: The TAO Toolkit comes with a set of reference scripts and configuration specifications with default parameter values that enable developers to kick-start training and fine-tuning. This lowers the bar and enables users without a deep understanding of models, expertise in deep learning, or beginning coding skills to be able to train new models and fine-tune the pretrained ones._

**Getting Started with the TAO Launcher CLI**

The tasks can be invoked from the TAO Toolkit Launcher using the following convention on the command-line: `tao <task> <subtask> <args_per_subtask>`, where `<args_per_subtask>` are the arguments required for a given subtask. Once the container is launched, the subtasks are run by the TAO Toolkit containers using the appropriate hardware resources. 
<p><img src='images/tao_launcher.gif' width=720></p>

To see the usage of different functionality that are supported, use the `--help` option. For more information, see the [TAO Toolkit Quick Start Guide](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html). 
Here is the **sample output**: 

usage: tao [-h]
           {list,stop,info,action_recognition,augment,bpnet,classification,converter,detectnet_v2,dssd,efficientdet,emotionnet,faster_rcnn,fpenet,gazenet,gesturenet,heartratenet,intent_slot_classification,lprnet,mask_rcnn,multitask_classification,n_gram,punctuation_and_capitalization,question_answering,retinanet,spectro_gen,speech_to_text,speech_to_text_citrinet,ssd,text_classification,token_classification,unet,vocoder,yolo_v3,yolo_v4,yolo_v4_tiny}
           ...

Launcher for TAO Toolkit.

optional arguments:
  -h, --help            show this help message and exit

tasks:
  {list,stop,info,action_recognition,augment,bpnet,classification,converter,detectnet_v2,dssd,efficientdet,emotionnet,faster_rcnn,fpenet,gazenet,gesturenet,heartratenet,intent_slot_classification,lprnet,mask_rcnn,multitask_classification,n_gram,punctuation_and_capitalization,question_answering,retinanet,spectro_gen,speech_to_text,speech_to_text_citrinet,ssd,text_classification,token_classification,unet,vocoder,yolo_v3,yolo_v4,yolo_v4_tiny}

<p><img src='images/important.png' width=720></p>

**For the purposes of this course - we will not be using the TAO Launcher. Instead, our environment is set up to emulate working inside of a running TAO Toolkit container already.**

With the TAO Toolkit, users can train models for object detection, classification, segmentation, optical character recognition, facial landmark estimation, gaze estimation, and more. In TAO's terminology, these would be the **tasks**, which support **subtasks** such as `train`, `prune`, `evaluate`, `export`, etc. Each task/subtask requires different combinations of configuration files to accomodate for different parameters, such as the dataset parameters, model parameters, and optimizer and training hyperparameters. Part of what makes TAO Toolkit so easy to use is that most of those parameters are hidden away in the form of experiment specification files (spec files). They are detailed in the [Getting Started Guide](https://docs.nvidia.com/tao/archive/tlt-10/pdf/Transfer-Learning-Toolkit-Getting-Started-Guide-IVA.pdf) for reference. It's very helpful to have these resources handy when working with the TAO Toolkit. In addition, there are a number of specific tasks that help with handling the launched commands. Below is a list of available options for task. We grayed out the tasks for Conversational AI as they are out of scope for this course. 

<img src='images/tao_tasks.png' width=740>

We can use the `--help` option explore the functionality of different tasks. 

<a name='e1'></a>
#### Exercise #1: Explore TAO Toolkit CLI ####
Let's explore some TAO Toolkit tasks. 

**Instructions**:<br>
* Modify the `<FIXME>`s only and execute the cell, choosing a task from options such as: `[classification, detectnet_v2, mask_rcnn, emotionnet, etc]`, follow by a subtask from options such as: `[calibration_tensorfile, dataset_convert, evaluate, export, inference, prune, train]`. 

In [2]:
# Example: !detectnet_v2 train --help
! detectnet_v2 prune --help

Using TensorFlow backend.
usage: detectnet_v2 prune [-h] [--num_processes NUM_PROCESSES] [--gpus GPUS]
                          [--gpu_index GPU_INDEX [GPU_INDEX ...]] [--use_amp]
                          [--log_file LOG_FILE] -m MODEL -o OUTPUT_FILE -k KEY
                          [-n NORMALIZER] [-eq EQUALIZATION_CRITERION]
                          [-pg PRUNING_GRANULARITY] [-pth PRUNING_THRESHOLD]
                          [-nf MIN_NUM_FILTERS]
                          [-el [EXCLUDED_LAYERS [EXCLUDED_LAYERS ...]]] [-v]
                          {calibration_tensorfile,dataset_convert,evaluate,export,inference,prune,train}
                          ...

optional arguments:
  -h, --help            show this help message and exit
  --num_processes NUM_PROCESSES, -np NUM_PROCESSES
                        The number of horovod child processes to be spawned.
                        Default is -1(equal to --gpus).
  --gpus GPUS           The number of GPUs to be used for the job.
  --

**Well Done**! When you're ready, let's move to the [next notebook](./02_preparation_for_model_training.ipynb). 

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>