<p> <center> <a href="../Start_Here.ipynb">Home Page</a> </center> </p>

 
<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="Summary.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 33%; text-align: center;">
        <a >6</a>
        <a href="qa-riva-deployment.ipynb">7</a>
        <a href="challenge.ipynb">8</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="challenge.ipynb">Next Notebook</a></span>
</div>

# Question Answering using Train Adapt Optimize (TAO) Toolkit
---

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:
- Take a [BERT](https://arxiv.org/pdf/1810.04805.pdf) QA model and [**Train/Finetune**](#training) it on the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) dataset
- Run [**Inference**](#inference)
- [**Export**](#export-onnx) the model for the [ONNX](https://onnx.ai/) format, or [export](#export-riva) in a format suitable for deployment in [Riva](https://developer.nvidia.com/riva).


## Transfer Learning with TAO
Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task.

`Train Adapt Optimize (TAO) Toolkit` is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<center><img src="https://developer.nvidia.com/sites/default/files/akamai/embedded-transfer-learning-toolkit-software-stack-1200x670px.png"><\center>

## NGC CLI installation and Rgistration

Before TAO can be use, you need to register at ngc.nvidia.com and proceed to generate an API Key. A step-by-step process to achieving this is given below:

- From your browser visit `ngc.nvidia.com`
- Click on `Welcome Guest` and you would see a dropdown menu and then click on `Sign In/Sign Up`.  
- Click on `continue` button where `NVIDIA Account (use existing or create a new NVIDIA account)` is written.

<img src="images/ngc_signin.jpg">

- Click on `Create account` and get registered. Thereafter you may proceed to login with your new account credentials.
- At the top right corner, click on your `username`, you would see a dropdown menu, then click on `Setup`.

<center><img src="images/ngc_setup.jpg"> </center>

- Click on download NGC CLI  (optional if you are running the notebook within a container)

<img src="images/download_cli.jpg">
    
- follow the steps listed to install NGC CLI (optional if you are running the notebook within a container)

<img src="images/ngccli_installation.jpg">


- proceed and click on `Get API Key` button.
- Next, you would find at the top right corner a `Generate API Key` button, click on this button. A dialog box would appear after the click, you must click on the `confirm` button on it.
- Finally, copy your generated API Key and Username, and save it somewhere on your local system.

<img align="center" src="images/ngc_key.jpg">


## API Key

- Your API key represents your credentials
  - Used for programmatic interaction (e.g., docker, REST API, etc.)
  - Uniquely identifies you (think “Username & Password”)
  - There can be only one (regenerating your API key invalidates the old one)
- Programmatic interface at `nvcr.io`: Use API Key

To see the docker image versions and the tasks that perform, use the `tao info` command.

In [None]:
!tao info --verbose

In addition to installing TAO package, please make sure of the following software requirements:

1. python 3.6.9
2. docker-ce > 19.03.5
3. docker-API 1.40
4. nvidia-container-toolkit > 1.3.0-1
5. nvidia-container-runtime > 3.4.0-1
6. nvidia-docker2 > 2.5.0-1
7. nvidia-driver >= 455.23

Check if the GPU device(s) is visible

In [None]:
!nvidia-smi

---
## Question Answering (QA)

### Task Description
The Question Answering task in NLP pertains to building a model which can answer questions posed in natural language. Many datasets (including [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/), the dataset we use in this notebook) pose this as a reading comprehension task i.e. given a question and a context, the goal is to predict the span within the context with a start and end position which indicates the answer to the question. For every word in the training dataset we predict:
- likelihood this word is the start of the span
- likelihood this word is the end of the span

### The SQuAD Dataset

[SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) is a large dataset for QA consisting of reading passages obtained from high-quality Wikipedia articles. With each passage, the dataset contains accompanying reading comprehension questions based on the content of the passage. For each question, there are one or more answers. These questions and corresponding answers were obtained through crowdsourcing.


The SQuAD format consists of a JSON file for each dataset split. Each title has one or multiple paragraph entries, each consisting of the text - "context", and question-answer entries. Each question-answer entry has:

- a question
- a boolean flag "is_impossible" which shows if the question is answerable or not
- a globally unique id
- in case the question is answerable one answer entry, which contains the text span and its starting character index in the context. If not answerable, the "answers" list is empty



```
{
    "data": [
        {
            "title": "Super_Bowl_50", 
            "paragraphs": [
                {
                    "context": "Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50.", 
                    "qas": [
                        {
                            "question": "Where did Super Bowl 50 take place?", 
                            "is_impossible": "false", 
                            "id": "56be4db0acb8001400a502ee", 
                            "answers": [
                                {
                                    "answer_start": "403", 
                                    "text": "Santa Clara, California"
                                }
                            ]
                        },
                        {
                            "question": "What was the winning score of the Super Bowl 50?", 
                            "is_impossible": "true", 
                            "id": "56be4db0acb8001400a502ez", 
                            "answers": [
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
...
```

The evaluation files (for validation and testing) follow the above format except for it can provide more than one answer to the same question. The inference file follows the above format except for it does not require the "answers" and "is_impossible" keywords.

### BERT Model for QA
In this notebook, we will show how to use a pre-trained [BERT](https://arxiv.org/pdf/1810.04805.pdf) (Bidirectional Encoder Representations from Transformers) model for QA leveraging TAO. The BERT model has made major breakthroughs in Natural Language Understanding in recent years. For most applications, the model is typically trained in two phases, pre-training and fine-tuning. 
- The BERT core model can be pre-trained on large, generic datasets to generate dense vector representations of input sentence(s). 
- It can be quickly fine-tuned to perform a wide variety of tasks such as question/answering, sentiment analysis, or named entity recognition.

The figure below shows a high-level block diagram of pre-training and fine-tuning BERT for QA.
<center><img src="https://developer-blogs.nvidia.com/wp-content/uploads/2020/05/bert-model-625x268.png"></center>

In alignment with the above, for pre-training we can take one of two approaches. We can either pre-train the BERT model with our own data, or use a model pre-trained by Nvidia. After we obtain a pre-trained model, the next step would be to fine-tune it for the QA task and run inference on the fine-tuned model.

<center><img src="https://developer-blogs.nvidia.com/wp-content/uploads/2020/06/Fig4revised-625x340.png"></center>

---
<a id='prepare-data'></a>
### Preparing the dataset
The SQuAD dataset is available [here](https://rajpurkar.github.io/SQuAD-explorer/). You will find that there are 2 versions of the Squad datasets: v1.1 and v2.0. 

- SQuAD 1.1, the older version of the SQuAD dataset, contains 100,000+ question-answer pairs on 500+ articles.
- SQuAD 2.0 dataset combines the 100,000 questions in SQuAD 1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.

#### Downloading the dataset

For convenience, you may use the code below to download the dataset

- Set path to a folder where you want you data to be saved

In [1]:
# IMPORTANT NOTE: Set path to a folder where you want you data to be saved
#DATA_DOWNLOAD_DIR = "/workspace/data"
DATA_DOWNLOAD_DIR = "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/data" 

- Create the data and results directories if they don't exist

In [None]:
import os
# Create the data and results directories if they don't exist
if not os.path.exists(DATA_DOWNLOAD_DIR):
        os.makedirs(DATA_DOWNLOAD_DIR)

- simple utility function to download the squad dataset

In [None]:
import urllib.request

# simple utility function to download the squad dataset
def download_squad(save_path):
    save_path = save_path + '/squad'

    if not os.path.exists(save_path):
        os.makedirs(save_path)

    if not os.path.exists(save_path + '/v1.1'):
        os.makedirs(save_path + '/v1.1')

    if not os.path.exists(save_path + '/v2.0'):
        os.makedirs(save_path + '/v2.0')
        
    # urls for both SQuAD v1.1 and v2.0. You may modify the dict below if you choose to download only one of the two.
    download_urls = {
            'https://rajpurkar.github.io/SQuAD-explorer' '/dataset/train-v1.1.json': 'v1.1/train-v1.1.json',
            'https://rajpurkar.github.io/SQuAD-explorer' '/dataset/dev-v1.1.json': 'v1.1/dev-v1.1.json',
            'https://rajpurkar.github.io/SQuAD-explorer' '/dataset/train-v2.0.json': 'v2.0/train-v2.0.json',
            'https://rajpurkar.github.io/SQuAD-explorer' '/dataset/dev-v2.0.json': 'v2.0/dev-v2.0.json',
    }
         
    for item in download_urls:
        url = item
        file = download_urls[item]
        print('Downloading:', url)
        if os.path.isfile(save_path + '/' + file):
            print('** Download file already exists, skipping download **')
        else:
            response = urllib.request.urlopen(url)
            with open(save_path + '/' + file, "wb") as handle:
                handle.write(response.read())

- This will download both v1.1 and v2.0 versions of SQuAD dataset

In [None]:
# This will download both v1.1 and v2.0 versions of SQuAD dataset
download_squad(DATA_DOWNLOAD_DIR)

- Verify that the data is present

In [2]:
# Verify that the data is present
!ls $DATA_DOWNLOAD_DIR/squad/v1.1

dev-v1.1.json
dev-v1.1.json_cache_eval_BertTokenizer_30522_384_128_64_4
train-v1.1.json
train-v1.1.json_cache_train_BertTokenizer_30522_384_128_64_-1


---
## TAO workflow
The rest of the notebook shows what a sample TAO workflow looks like.

### Setting TAO Mounts

Now that our dataset has been downloaded, an important step in using TAO is to set up the directory mounts. The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

**Project directory**

<img src="images/folder_structure.png"  />


`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case such your these directories are correctly visible to the docker container. **Please also ensure that the source directories exist on your machine!**

In [3]:
%%bash
tee ~/.tao_mounts.json <<'EOF'
{
   "Mounts":[
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/data",
           "destination": "/data"
       },
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/specs",
           "destination": "/specs"
       },
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/results",
           "destination": "/results"
       },
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/.cache",
           "destination": "/root/.cache"
       }
   ]
}
EOF

{
   "Mounts":[
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/data",
           "destination": "/data"
       },
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/specs",
           "destination": "/specs"
       },
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/results",
           "destination": "/results"
       },
       {
           "source": "/home/tadesuyi/End-to-End-NLP_bootcamp/workspace/.cache",
           "destination": "/root/.cache"
       }
   ]
}


- Make sure the source directories exist, if not, create them

In [7]:
# Make sure the source directories exist, if not, create them
#! mkdir ~/End-to-End-NLP/workspace/specs
#! mkdir ~/End-to-End-NLP/workspace/results
! mkdir ~/End-to-End-NLP_bootcamp/workspace/.cache


The rest of the notebook exemplifies the simplicity of the TAO workflow. Users with basic knowledge of Deep Learning can get started building their own custom models using a simple specification file. It's essentially just one command each to run data preprocessing, training, fine-tuning, evaluation, inference, and export! All configurations happen through YAML spec files <br>

---
### Configuration/Specification Files

The essence of all commands in TAO lies in the YAML spec files. There are sample spec files already available for you to use directly or as reference to create your own.  Through these spec files, you can tune many knobs like the model, dataset, hyperparameters, optimizer etc. Each command (like train, finetune, evaluate etc.) should have a dedicated spec file with configurations pertinent to it. <br>

Here is an example of the training spec file:

---
```
trainer:
  max_epochs: 100

model:
  tokenizer:
      tokenizer_name: ${model.language_model.pretrained_model_name} # or sentencepiece
      vocab_file: null # path to vocab file 
      tokenizer_model: null # only used if tokenizer is sentencepiece
      special_tokens: null

  language_model:
    pretrained_model_name: bert-base-uncased
    lm_checkpoint: null
    config_file: null # json file, precedence over config
    config: null

  token_classifier:
    num_layers: 1
    dropout: 0.0
    num_classes: 2
    activation: relu
    log_softmax: false
    use_transformer_init: true


training_ds:
  file: ??? # e.g. squad/v1.1/train-v2.0.json
  batch_size: 12 # per GPU

...
```

---

### Data Convert
For the QA task, the commands in TAO accepts data in the SQuAD JSON format (refer to `Data Preprocessing Lab `). If you have your data in any other format, be sure to convert it in the SQuAD format. Since we are using the SQuAD dataset for this notebook, we don't need to convert the data into any other format. We can proceed with training/fine-tuning directly.

---
### Set Relevant Paths
Set these paths according to your environment.

In [4]:
# The data is saved here
DATA_DIR='/data/squad'

# The configuration files are stored here
SPECS_DIR='/specs/questions_answering'

# The results are saved at this path
RESULTS_DIR='/results/questions_answering'

# Set your encryption key, and use the same key for all commands
KEY=''

Ensure you have access to ngc docker login

In [4]:
!ngc       


ERROR: Incomplete command received

usage: ngc [--debug] [--format_type <fmt>] [--version] [-h]
           {config,diag,pym,registry,version} ...

NVIDIA NGC CLI

optional arguments:
  -h, --help            Show this help message and exit.
  --debug               Enable debug mode.
  --format_type <fmt>   Specify the output format type. Supported formats are:
                        ascii, csv, json. Only commands that produce tabular
                        data support csv format. Default: ascii
  --version             Show the CLI version and exit.

ngc:
  {config,diag,pym,registry,version}
    config              Configuration Commands
    diag                Diagnostic Commands
    pym                 PyM Commands
    registry            Registry Commands
    version             Version Commands


---
### Downloading Specs
We can proceed to downloading the spec files. The user may choose to modify/rewrite these specs, or even individually override them through the launcher. You can download the default spec files by using the `download_specs` command. <br>

The `-o` argument indicating the folder where the default specification files will be downloaded, and `-r` that instructs the script where to save the logs. **Make sure the -o points to an empty folder!**

<img src="images/spec.png"  />

In [None]:
!tao question_answering download_specs \
    -r $RESULTS_DIR \
    -o $SPECS_DIR

---
<a id='training'></a>
### Training


Training a model using TAO is as simple as configuring your spec file and running the train command. The code cell below uses the default train.yaml available for users as reference. It is configured by default to use the `bert-base-uncased` pretrained model. Additionally, these configurations could easily be overridden using the tao-launcher CLI as shown below. For instance, below we override the `training_ds.file`, `validation_ds.file`, `trainer.max_epochs`, `training_ds.num_workers` and `validation_ds.num_workers` configurations to suit our needs. We encourage you to take a look at the .yaml spec files we provide! <br>

For training a QA model in TAO, we use the `tao question_answering train` command with the following args:
- `-e`: Path to the spec file
- `-g`: Number of GPUs to use
- `-k`: User specified encryption key to use while saving/loading the model
- `-r`: Path to a folder where the outputs should be written. Make sure this is mapped in tlt_mounts.json
- Any overrides to the spec file eg. trainer.max_epochs <br>

More details about these arguments are present in the [TAO Getting Started Guide](https://docs.nvidia.com/tao/tao-toolkit/index.html) <br>
`NOTE:` All file paths corresponds to the destination mounted directory that is visible in the TAO docker container used in backend.<br>

Also worth noting is that the first time you run training on the dataset, it will run pre-processing and save that processed data in the same directory as the dataset.

In [5]:
# Since this is a demonstration, we just train for 1 epoch below. You may need to train for more depending on your dataset.
!tao question_answering train \
                        -e $SPECS_DIR/train.yaml \
                        -g 1  \
                        -k $KEY \
                        -r $RESULTS_DIR/train \
                        model.language_model.pretrained_model_name=bert-base-uncased \
                        training_ds.file=$DATA_DIR/v1.1/train-v1.1.json \
                        validation_ds.file=$DATA_DIR/v1.1/dev-v1.1.json \
                        trainer.max_epochs=1 \
                        training_ds.num_workers=4 \
                        validation_ds.num_workers=4 \
                        training_ds.batch_size=4 \
                        validation_ds.num_samples=4

2023-05-27 13:05:04,042 [INFO] root: Registry: ['nvcr.io']
2023-05-27 13:05:04,108 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tadesuyi/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2023-05-27 20:05:10 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-27 20:05:10 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully support

**Expected output**
```bash
2022-10-17 19:31:15,445 [INFO] root: Registry: ['nvcr.io']
2022-10-17 19:31:15,588 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-pyt:v3.21.11-py3
2022-10-17 19:31:20,448 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
...
                                                                                Epoch 0, global step 8309: val_loss reached 0.60064 (best 0.60064), saving model to "/results/questions_answering/train/checkpoints/trained-model--val_loss=0.60-epoch=0.ckpt" as top 3
Epoch 0: 100%|█████| 11085/11085 [34:46<00:00,  5.31it/s, loss=1.02, lr=4.03e-7]
Validating: 0it [00:00, ?it/s]
Validating:   0%|                                         | 0/1 [00:00<?, ?it/s]
Validating: 100%|█████████████████████████████████| 1/1 [00:00<00:00,  3.20it/s][NeMo I 2022-10-17 11:08:41 qa_model:173] val exact match 100.0
[NeMo I 2022-10-17 11:08:41 qa_model:174] val f1 100.0
Epoch 0: 100%|█████| 11085/11085 [34:47<00:00,  5.31it/s, loss=1.02, lr=4.03e-7]
                                                                                Epoch 0, global step 11079: val_loss reached 0.38622 (best 0.38622), saving model to "/results/questions_answering/train/checkpoints/trained-model--val_loss=0.39-epoch=0.ckpt" as top 3
Epoch 0: 100%|█████| 11085/11085 [35:03<00:00,  5.27it/s, loss=0.99, lr=2.85e-7]
[NeMo I 2022-10-17 11:09:13 train:136] Experiment logs saved to '/results/questions_answering/train'
[NeMo I 2022-10-17 11:09:13 train:137] Trained model saved to '/results/questions_answering/train/checkpoints/trained-model.tlt'
2022-10-17 20:09:18,886 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

```

The train command produces a .tlt file called `trained-model.tlt` saved at `$RESULTS_DIR/train/checkpoints/trained-model.tlt`. This file can be fed directly into the fine-tuning stage as we see in the next block.

#### Other tips and tricks:
- To accelerate the training without loss of quality, it is possible to train with these parameters:  `trainer.amp_level="O1"` and `trainer.precision=16` for reduced precision.
- The batch size (`training_ds.batch_size`) may influence the validation accuracy. Larger batch sizes are faster to train with, however, you may get slightly better results with smaller batches.
- You can use the parameter: `trainer.val_check_interval` to define how many times per epoch to see validation accuracy metric calculated and printed. For instance, using `trainer.val_check_interval=0.25` will show the metric 4 times per epoch.

---
### Fine-Tuning
Like many other NLP tasks, since we begin with a pretrained BERT model the step shown above for (re)training with your custom data should do the trick. However, TAO does provide a command for fine-tuning if your use-case demands that. Instead of `tao question_answering train`, we use `tao question_answering finetune` instead. We also specify the spec file corresponding to fine-tuning. All commands in TAO follow a similar pattern, streamlining the workflow even further!

Note: If you wish to proceed with a trained dataset for better inference results, you can find a .nemo model [here](
https://ngc.nvidia.com/catalog/collections/nvidia:nemotrainingframework).

Simply re-name the .nemo file to .tlt and pass it through the finetune pipeline.

In [9]:
!tao  question_answering finetune \
                        -e $SPECS_DIR/finetune.yaml \
                        -g 1 \
                        -m $RESULTS_DIR/train/checkpoints/trained-model.tlt \
                        -k $KEY \
                        -r $RESULTS_DIR/finetune \
                        finetuning_ds.file=$DATA_DIR/v2.0/train-v2.0.json \
                        validation_ds.file=$DATA_DIR/v2.0/dev-v2.0.json \
                        trainer.max_epochs=1 \
                        finetuning_ds.num_workers=4 \
                        validation_ds.num_workers=4

2023-05-25 13:25:55,279 [INFO] root: Registry: ['nvcr.io']
2023-05-25 13:25:55,349 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tadesuyi/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2023-05-25 20:26:01 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-25 20:26:01 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully support

**Expected output**:

```python
...

Epoch 0: 100%|█████████▉| 6002/6008 [40:12<00:02,  2.49it/s, loss=0.83, lr=5e-6]
Epoch 0: 100%|█████████▉| 6004/6008 [40:12<00:01,  2.49it/s, loss=0.83, lr=5e-6]
Epoch 0: 100%|█████████▉| 6006/6008 [40:12<00:00,  2.49it/s, loss=0.83, lr=5e-6]
Epoch 0: 100%|██████████| 6008/6008 [40:28<00:00,  2.47it/s, loss=0.83, lr=5e-6]
Validating: 100%|█████████████████████████████| 510/510 [01:05<00:00, 10.06it/s][NeMo I 2022-10-17 11:53:49 qa_model:173] val exact match 71.5320475027373
[NeMo I 2022-10-17 11:53:49 qa_model:174] val f1 74.46986178542292
Epoch 0: 100%|██████████| 6008/6008 [41:07<00:00,  2.43it/s, loss=0.83, lr=5e-6]
                                                                                Epoch 0, global step 5497: val_loss reached 0.98476 (best 0.98476), saving model to "/results/questions_answering/finetune/checkpoints/finetuned-model--val_loss=0.98-epoch=0.ckpt" as top 3
Epoch 0: 100%|██████████| 6008/6008 [41:24<00:00,  2.42it/s, loss=0.83, lr=5e-6]
[NeMo I 2022-10-17 11:54:21 finetune:133] Experiment logs saved to '/results/questions_answering/finetune'
[NeMo I 2022-10-17 11:54:21 finetune:134] Fine-tuned model saved to '/results/questions_answering/finetune/checkpoints/finetuned-model.tlt'
2022-10-17 20:54:29,916 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

```

This command will generate a fine-tuned model `finetuned-model.tlt` at `$RESULTS_DIR/finetune/checkpoints`

---
<a id='evaluation'></a>
### Evaluation
The evaluation spec .yaml is as simple as:

```
test_ds:
  file: ??? # e.g. squad/v1.1/dev-v1.1.json 
  batch_size: 32
  shuffle: false
  num_samples: 500
```

Below, we use `tao question_answering evaluate` and override the test data configuration by specifying `test_ds.file`. Other arguments follow the same pattern as before!

In [10]:
!tao question_answering evaluate \
                        -e $SPECS_DIR/evaluate.yaml \
                        -g 1 \
                        -m $RESULTS_DIR/train/checkpoints/trained-model.tlt \
                        -k $KEY \
                        -r $RESULTS_DIR/evaluate \
                        test_ds.file=$DATA_DIR/v2.0/dev-v2.0.json \
                        test_ds.batch_size=32 \
                        test_ds.num_samples=500

2023-05-25 13:49:46,384 [INFO] root: Registry: ['nvcr.io']
2023-05-25 13:49:46,454 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tadesuyi/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2023-05-25 20:49:52 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-25 20:49:52 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully support

The output of Evaluation should give the exact match/f1 scores for each data point. Remember that we had trained for just 1 epoch since this is a demonstration!

---
<a id='inference'></a>
### Inference
Inference using a .tlt trained or fine-tuned model uses the `tao question_answering infer` command.  <br>
The infer.yaml is also very uncomplicated:
```
# Name of  file containing data used as inputs during the inference.
infer_ds:
    file: ???  # e.g. squad/v1.1/dev-v1.1.json

```

We use the SQuAD 2.0 evaluation file for the sake of demonstration, you can also try out your own custom inputs as an exercise!

In [11]:
!tao question_answering infer \
                        -e $SPECS_DIR/infer.yaml \
                        -g 1 \
                        -m $RESULTS_DIR/train/checkpoints/trained-model.tlt \
                        -k $KEY \
                        -r $RESULTS_DIR/infer \
                        infer_ds.file=$DATA_DIR/v2.0/dev-v2.0.json

2023-05-25 13:52:59,187 [INFO] root: Registry: ['nvcr.io']
2023-05-25 13:52:59,253 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tadesuyi/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2023-05-25 20:53:04 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-25 20:53:05 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully support

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



2023-05-25 13:56:46,468 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.


**Expected result:**
```python
...
[NeMo I 2022-10-17 12:11:51 infer:92] Question: What is the force exerted by standard gravity on one ton of mass?
[NeMo I 2022-10-17 12:11:51 infer:93] Answer  : kilogram-force
[NeMo I 2022-10-17 12:11:51 infer:92] Question: What force leads to a commonly used unit of mass?
[NeMo I 2022-10-17 12:11:51 infer:93] Answer  : kilogram-force
[NeMo I 2022-10-17 12:11:51 infer:92] Question: What force is part of the modern SI system?
[NeMo I 2022-10-17 12:11:51 infer:93] Answer  : kilogram-force
[NeMo I 2022-10-17 12:11:51 infer:96] Experiment logs saved to '/results/questions_answering/infer'
2022-10-17 21:11:53,005 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
```
<img src="images/infer.png"  />

The results of inference are saved at `$RESULTS_DIR/infer/prediction.txt` in the format:
```
{
    <question_id>: <answer>,
    ...
}
```
A file with n-best results for each question is also saved at `$RESULTS_DIR/infer/nbest.txt` in the format:
```
{
    <question_id>: [
        {
            "text": <answer-1>,
            "probability": 0.9576789114427999,
            "start_logit": 7.168248653411865,
            "end_logit": 6.93817138671875
        },
        {
            "text": <answer-2>,
            "probability": 0.02714239211951417,
            "start_logit": 7.168248653411865,
            "end_logit": 3.374755620956421
        },
    ...
```

---
<a id='export-onnx'></a>
### Export to ONNX

[ONNX](https://onnx.ai/) is a popular open format for machine learning models. It enables interoperability between different frameworks, easing the path to production. 

TAO provides commands to export the .tlt model to the ONNX format in an .eonnx archive.  The `export_format` configuration can be set to `ONNX` to achieve this.

Sample usage of the `tao question_answering export` command is shown in the following code cell. The export.yaml file we use looks like:
```
# Name of the .tlt EFF archive to be loaded/model to be exported.
restore_from: trained-model.tlt

# Set export format: ONNX | RIVA
export_format: RIVA

# Output EFF archive containing ONNX.
export_to: exported-model.eonnx
```


In [12]:
!tao question_answering export \
                        -e $SPECS_DIR/export.yaml \
                        -g 1 \
                        -m $RESULTS_DIR/train/checkpoints/trained-model.tlt \
                        -k $KEY \
                        -r $RESULTS_DIR/export \
                        export_format=ONNX

2023-05-25 14:00:27,916 [INFO] root: Registry: ['nvcr.io']
2023-05-25 14:00:27,984 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tadesuyi/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2023-05-25 21:00:33 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-25 21:00:33 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully support

This command exports the model as `exported-model.eonnx` which is essentially an archive containing the .onnx model.

---
### Inference using ONNX

TAO provides the capability to use the exported .eonnx model for inference. The command `tao question_answering infer_onnx` is very similar to the inference command for .tlt models. Again, the input file used are just for demo purposes, you may choose to try out their custom input!

In [None]:
!tao question_answering infer_onnx \
                        -e $SPECS_DIR/infer_onnx.yaml \
                        -g 1 \
                        -m $RESULTS_DIR/export/exported-model.eonnx \
                        -k $KEY \
                        -r $RESULTS_DIR/infer_onnx \
                       infer_ds.file=$DATA_DIR/v2.0/dev-v2.0.json

The output of this `infer_onnx` is similar to that of the `infer` command before. A `prediction.txt` and a `nbest.txt` file is generated. You can configure the exact name/location of these results files in the spec yaml.

---
<a id='export-riva'></a>
### Export to Riva

With TAO, you can also export your model in a format that can deployed using [Nvidia Riva](https://developer.nvidia.com/riva), a highly performant application framework for multi-modal conversational AI services using GPUs! The same command for exporting to ONNX can be used here. The only small variation is the configuration for `export_format` in the spec file!

In [13]:
!tao question_answering export \
                        -e $SPECS_DIR/export.yaml \
                        -g 1 \
                        -m $RESULTS_DIR/train/checkpoints/trained-model.tlt \
                        -k $KEY \
                        -r $RESULTS_DIR/export_riva \
                        export_format=RIVA \
                        export_to=qa-model.riva

2023-05-25 14:01:44,588 [INFO] root: Registry: ['nvcr.io']
2023-05-25 14:01:44,653 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/tadesuyi/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2023-05-25 21:01:50 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-25 21:01:51 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully support


**Your output should look similar as shown below:**
```python
...
[NeMo I 2022-11-01 12:33:56 export:59] Model restored from '/results/questions_answering/train/checkpoints/trained-model.tlt'
[NeMo I 2022-11-01 12:34:11 export:83] Experiment logs saved to '/results/questions_answering/export_riva'
[NeMo I 2022-11-01 12:34:11 export:84] Exported model to '/results/questions_answering/export_riva/qa-model.riva'
[NeMo I 2022-11-01 12:34:12 export:95] Exported model is compliant with Riva
2022-11-01 21:34:17,668 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

```
<img src="images/riva_export.png"  />

The model is exported as `qa-model.riva` which is in a format suited for deployment in Riva.

---
## What's Next?

You could use TAO to build custom models for your own applications, or you could deploy the custom model to Nvidia Riva!

---
## Licensing

Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="Summary.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 33%; text-align: center;">
         <a >6</a>
        <a href="qa-riva-deployment.ipynb">7</a>
        <a href="challenge.ipynb">8</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="challenge.ipynb">Next Notebook</a></span>
</div>

<p> <center> <a href="../Start_Here.ipynb">Home Page</a> </center> </p>