<p> <center> <a href="../../Start_Here.ipynb">Home Page</a> </center> </p>

 
<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="NeMo_Primer.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 33%; text-align: center;">
        <a href="NeMo_Primer.ipynb">1</a>
        <a>2</a>
        <a href="Multitask_Prompt_and_PTuning.ipynb">3</a>
        <a href="demo.ipynb">4</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="Multitask_Prompt_and_PTuning.ipynb">Next Notebook</a></span>
</div>

# NeMo Question Answering

## Overview

This tutorial will demonstrate how to train, evaluate, and test two types of models for Question-Answering -
1. BERT-like models for Extractive Question-Answering
2. Sequence-to-Sequence (S2S) models for Generative Question-Answering (ex. T5/BART-like)


## Task Description

- Given a context and a natural language query, we want to generate an answer for the query
- Depending on how the answer is generated, the task can be broadly divided into two types:
    1. Extractive Question Answering
    2. Generative Question Answering


## Extractive Question-Answering with BERT-like models

Given a question and a context, both in natural language, predict the span within the context with a start and end position which indicates the answer to the question.
For every word in our training dataset we’re going to predict:
- likelihood this word is the start of the span 
- likelihood this word is the end of the span

We are using a BERT encoder with 2 span prediction heads for predicting start and end position of the answer. The span predictions are token classifiers consisting of a single linear layer.

### BERT Model for QA

The [BERT](https://arxiv.org/pdf/1810.04805.pdf) (Bidirectional Encoder Representations from Transformers) model has made significant breakthroughs in Natural Language Understanding in recent years. For most applications, the model is typically trained in pre-training and fine-tuning. 
- The BERT core model can be pre-trained on large, generic datasets to generate dense vector representations of input sentence(s). 
- It can be quickly fine-tuned to perform tasks such as question/answering, sentiment analysis, or named entity recognition.


The figure below shows a high-level block diagram of pre-training and fine-tuning BERT for QA.
<center><img src="https://developer-blogs.nvidia.com/wp-content/uploads/2020/05/bert-model-625x268.png"></center>

## Generative Question-Answering with Sequence-2-Sequence model

Given a question and a context, both in natural language, generate an answer for the question. Unlike the BERT-like models, there is no constraint that the answer should be a span within the context.

### BRAT Model

[BART](https://arxiv.org/abs/1910.13461) is a denoising autoencoder that uses neural machine translation architecture with a bidirectional encoder as in BERT and a left-to-right decoder as in GPT for pretraining sequence-2-sequence model.
During training, BART injects noise into the original text and tends to learn the reconstruction process. For example, In a sequence-to-sequence model, the encoder is fed a corrupted version of the tokens, and the decoder is fed the original tokens. 

The figure below shows depicts a scenario where a corrupted document (left) is encoded with a bidirectional model, and then the likelihood of the original document (right) is calculated with an autoregressive decoder <i>(source: https://arxiv.org/abs/1910.13461)<i>.

<center><img src="images/BRAT_example.JPG"></center>
<center>source paper - <i>BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension</i></center>

## Stanford Question Answering Dataset (SQuAD)

The Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset containing questions from crowdworkers on a set of Wikipedia articles. These questions are answerable within a text paragraph known as context. Answers to a few questions may not exist within the context; therefore, those questions remain unanswerable. The previous version of SQuAD dataset is known `SQuAD 1.1` and contains 100k+ question-answer pairs on 500+ articles. The latest version `(SQuAD 2.0)` combines questions from SQuAD 1.1 with more than 50k unanswerable questions written by crowdworkers in an adversarial manner to look similar to answerable ones. The official `SQuAD 2.0` dataset is split into train, dev, and test. Only the train and dev sets are publicly available. It is distributed under the [CC BY-SA 4.0 license](https://creativecommons.org/licenses/by-sa/4.0/legalcode) and can be downloaded [here](https://rajpurkar.github.io/SQuAD-explorer/). 

#### Data format

- **version**: represents the version of the SQuAD JSON dataset
- **data**: contains the actual data that includes titles and `paragraphs`
- **title**: represents domain/topic of discussion or documents or webpage title where the text for `paragraphs` are being drawn
- **paragraphs**: contains a list of `qas` and `context`
- **qas**: defines a list that contains questions `(question)`, a unique id for each question `(id)`, corresponding answers `(answers)` to the questions. If a question is impossible to answer, then the `is_impoosible` flag is set to True; otherwise, it is set to False. In the answers list, the `text` represents the answer to the question, while the `answer_start` denotes the index where the answer starts within the context.
- **context**: represents a sentence or group sentences where the answer(s) to the question(s) lies. It is possible for a single context to have one to two or more questions. In the examples above, questions are drawn from a single context.


To aid your understanding of the SQuAD 2.0 JSON format, a simplified structural overview is presented below.

```python
{
    'version': 'v2.0',
         'data': [
                   {
                   'title': 'Beyoncé',
              'paragraphs': [
                              {
                                'qas': [
                                            {
                                                'question': 'When did Beyonce start becoming popular?',
                                                      'id': '56be85543aeaaa14008c9063',
                                                 'answers': [{'text': 'in the late 1990s', 'answer_start': 269}],
                                           'is_impossible': False
                                            },
                                            {
                                                'question': 'What areas did Beyonce compete in when she was growing...',
                                                      'id': '56be85543aeaaa14008c9065',
                                                 'answers': [{'text': 'singing and dancing', 'answer_start': 207}],
                                           'is_impossible': False
                                            }
                                       ], #closing qas list
                 
                            'context': 'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".'
                             }, 
                  
         .....................................................
                          ]# clossing paragraphs
                    }, #closing brace before title
             
                    {# opening title
                     'title': 'Matter',
              'paragraphs': [{
                              'qas': [
                                       {
                                      'plausible_answers': [{'text': 'ordinary matter composed of atoms',...........}],
                                               'question': 'What did the term matter include after the 20th century?',
                                                     'id': '5a7db48670df9f001a87505f',
                                                'answers': [],
                                          'is_impossible': True
                                       },
                                       {..........................}
                                     ]
                           'context': .............................
                             }, 
                             {
                               'qas': [
                                            {
                                       'plausible_answers': [{'text': 'matter', 'answer_start': 485}],
                                                'question': 'Physics has broadly agreed on the definition of what?',
                                                      'id': '5a7e070b70df9f001a875439',
                                                 'answers': [],
                                           'is_impossible': True
                                            },
                                            {
                                       'plausible_answers': [{'text': 'Alfvén', 'answer_start': 327}],
                                                'question': 'Who coined the term partonic matter?',
                                                      'id': '5a7e070b70df9f001a87543a',
                                                 'answers': [],
                                           'is_impossible': True
                                            }
                                      ], #closing qas list
                         'context': 'The term "matter" is used throughout physics in a bewildering variety of contexts: for example, one refers to "condensed matter physics", "elementary matter", "partonic" matter, "dark" matter, "anti"-matter, "strange" matter, and "nuclear" matter. In discussions of matter and antimatter, normal matter has been referred to by Alfvén as koinomatter (Gk. common matter). It is fair to say that in physics, there is no broad consensus as to a general definition of matter, and the term "matter" usually is used in conjunction with a specifying modifier.'
                          }#closing last qas & context within paragraph
                      ] #closing paragraph
                } #closing brace before title
            ] #closing data
} #closing json brace
  

```

Import the needed libraries and models for the Question Answering task. 

In [None]:
import os
import wget
import gc

import pytorch_lightning as pl
from omegaconf import OmegaConf

from nemo.collections.nlp.models.question_answering.qa_bert_model import BERTQAModel
from nemo.collections.nlp.models.question_answering.qa_s2s_model import S2SQAModel
from nemo.utils.exp_manager import exp_manager

pl.seed_everything(42)
gc.disable()

Run the cell below to set the directory path to store the dataset and the output files

In [None]:
DATA_DIR = "/workspace/data" # directory for storing datasets
WORK_DIR = "/workspace/results/nemo_question_answering" # directory for storing trained models, logs
script_dir = "/workspace/source_code/nemo_question_answering"

Download and preprocess the SQuAD dataset using the `get_squad.py` script

In [None]:
!python $script_dir/get_squad.py --destDir $DATA_DIR

##### Expected Output
```python
[NeMo I 2023-08-14 04:32:25 get_squad:66] /workspace/data/
[NeMo I 2023-08-14 04:32:25 get_squad:47] Downloading: https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
[NeMo I 2023-08-14 04:32:26 get_squad:47] Downloading: https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
[NeMo I 2023-08-14 04:32:27 get_squad:47] Downloading: https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
[NeMo I 2023-08-14 04:32:27 get_squad:47] Downloading: https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json
```

After execution of the above cell, your data folder will contain a subfolder "squad" and will contain four files for training and evaluation

```
squad  
│
└───v1.1
│   │ -  train-v1.1.json
│   │ -  dev-v1.1.json
│
└───v2.0
    │ -  train-v2.0.json
    │ -  dev-v2.0.json
```

In [None]:
!ls -LR {DATA_DIR}/squad

# Configuration

To proceed with the QA task, models have to be defined in the config file. The config file has multiple important sections that include:

- **model**: All arguments that will relate to the Model - language model, span prediction, optimizer and schedulers, datasets and any other related information
- **trainer**: Any argument to be passed to PyTorch Lightning
- **exp_manager**: All arguments used for setting up the experiment manager - target directory, name, logger information

We will set the path to the default config file `qa_conf.yaml` and edit necessary values for training different models

In [None]:
config_dir = '/workspace/source_code/nemo_question_answering/conf'

Run the cell below to print the entire default config

In [None]:
config_path = f'{config_dir}/qa_conf.yaml'
print(config_path)
config = OmegaConf.load(config_path)
print("Default Config - \n")
print(OmegaConf.to_yaml(config))

### Set dataset config values

Important parameters to be set include the path to the train, validation, and text sets; batch size; and the number of training, validation, and test samples.

In [None]:
# if True, model will load features from cache if file is present, or
# create features and dump to cache file if not already present
config.model.dataset.use_cache = False

# indicates whether the dataset has unanswerable questions
config.model.dataset.version_2_with_negative = True

# indicates whether the dataset is of extractive nature or not
# if True, context spans/chunks that do not contain answer are treated as unanswerable 
config.model.dataset.check_if_answer_in_context = True

# set file paths for train, validation, and test datasets
config.model.train_ds.file = f"{DATA_DIR}/squad/v2.0/train-v2.0.json"
config.model.validation_ds.file = f"{DATA_DIR}/squad/v2.0/dev-v2.0.json"
config.model.test_ds.file = f"{DATA_DIR}/squad/v2.0/dev-v2.0.json"

# set batch sizes for train, validation, and test datasets
config.model.train_ds.batch_size = 8
config.model.validation_ds.batch_size = 8
config.model.test_ds.batch_size = 8

# set number of samples to be used from dataset. setting to -1 uses entire dataset
config.model.train_ds.num_samples = 5000
config.model.validation_ds.num_samples = 1000
config.model.test_ds.num_samples = 100

### Set trainer config values

These values include the maximum number of epochs, maximum steps, device, accelerator, and trainer strategy.

In [None]:
config.trainer.max_epochs = 5
config.trainer.max_steps = -1 # takes precedence over max_epochs
config.trainer.precision = 16
config.trainer.devices = [0] # 0 for CPU, or list of the GPUs to use [0] this tutorial does not support multiple GPUs. If needed please use NeMo/examples/nlp/question_answering/question_answering.py
config.trainer.accelerator = "gpu"
config.trainer.strategy="dp"

### Set experiment manager config values

In [None]:
config.exp_manager.exp_dir = WORK_DIR
config.exp_manager.name = "QA-SQuAD2"
config.exp_manager.create_wandb_logger=False

## Training and Testing Models

In this section we show how to train and test BERT and BRAT models using the SQuAD dataset. 

### BERT Model

#### Set Model Config Values
- `bert-base-based` is set as the pretrained model and also as the tokenizer name.
- Set the path to save the output model as `../checkpoints/bert_squad_v2_0.nemo`

In [None]:
# set language model and tokenizer to be used
# tokenizer is derived from model if a tokenizer name is not provided
config.model.language_model.pretrained_model_name = "bert-base-uncased"
config.model.tokenizer.tokenizer_name = "bert-base-uncased"

# path where model will be saved
config.model.nemo_path = f"{WORK_DIR}/checkpoints/bert_squad_v2_0.nemo"

config.exp_manager.create_checkpoint_callback = True

config.model.optim.lr = 3e-5

#### Create Trainer and Initialize Model

In [None]:
trainer = pl.Trainer(**config.trainer)
model = BERTQAModel(config.model, trainer=trainer)

##### Expected Output
```python
[NeMo I 2023-08-14 04:32:37 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: bert-base-uncased, vocab_file: None, merges_files: None, special_tokens_dict: {}, and use_fast: False
[NeMo I 2023-08-14 04:32:38 qa_processing:106] mean no. of chars in doc: 839.2727272727273
[NeMo I 2023-08-14 04:32:38 qa_processing:107] max no. of chars in doc: 1895
...
[NeMo I 2023-08-14 04:32:39 qa_bert_dataset:115] Preprocessing data into features.
  0%|                                                                                                          | 0/5000 [00:00<?, ?it/s]
[NeMo I 2023-08-14 04:32:39 qa_bert_dataset:264] *** Example ***
[NeMo I 2023-08-14 04:32:39 qa_bert_dataset:265] unique_id: 1000000000
[NeMo I 2023-08-14 04:32:39 qa_bert_dataset:266] example_index: 0
[NeMo I 2023-08-14 04:32:39 qa_bert_dataset:267] doc_span_index: 0
...
[NeMo I 2023-08-14 04:32:49 qa_bert_dataset:283] start_position: 49
[NeMo I 2023-08-14 04:32:49 qa_bert_dataset:284] end_position: 49
[NeMo I 2023-08-14 04:32:49 qa_bert_dataset:285] answer: france
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 505.52it/s]
[NeMo I 2023-08-14 04:32:49 qa_bert_dataset:90] Converting dict features into object features

100%|█████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 554802.12it/s]
```

#### Train, Test, and Save the Model

The maximum number of epochs is set to 5, please note that one epoch may take up to 7mins depending on the device. Increasing the maximum number of epochs (`config.trainer.max_epochs = 5`) may give better results but take more execution time.

In [None]:
trainer.fit(model)
trainer.test(model)

model.save_to(config.model.nemo_path)

##### Expected Output
```python
...
`Trainer.fit` stopped: `max_epochs=5` reached.
...
    
Testing: 0it [00:00, ?it/s]
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test exact: 35.0
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test f1: 37.904761904761905
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test total: 100.0
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test HasAns_exact: 77.77777777777777
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test HasAns_f1: 84.23280423280423
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test HasAns_total: 45.0
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test NoAns_exact: 0.0
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test NoAns_f1: 0.0
[NeMo I 2023-08-16 23:06:48 qa_bert_model:140] test NoAns_total: 55.0
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│     test_HasAns_exact     │     77.77777862548828     │
│      test_HasAns_f1       │     84.23280334472656     │
│     test_HasAns_total     │           45.0            │
│     test_NoAns_exact      │            0.0            │
│       test_NoAns_f1       │            0.0            │
│     test_NoAns_total      │           55.0            │
│        test_exact         │           35.0            │
│          test_f1          │    37.904762268066406     │
│         test_loss         │    11.044330596923828     │
│        test_total         │           100.0           │
└───────────────────────────┴───────────────────────────┘

```

#### Load the Saved Model and Run Inference

While running the inference, it is possible to see that not all responses matched the expected output. This is expected because of the limited number of epochs that the model was trained. You can modify the value of `config.trainer.max_epochs = 5` above and retrain to see better results.

In [None]:
Bmodel = BERTQAModel.restore_from(config.model.nemo_path)

eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
Bmodel.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

config.exp_manager.create_checkpoint_callback = False
exp_dir = exp_manager(Bmodel.trainer, config.exp_manager)
output_nbest_file = os.path.join(exp_dir, "output_nbest_file.json")
output_prediction_file = os.path.join(exp_dir, "output_prediction_file.json")

all_preds, all_nbest = Bmodel.inference(
    config.model.test_ds.file,
    output_prediction_file=output_prediction_file,
    output_nbest_file=output_nbest_file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

##### Expected Output
```python
...
[NeMo I 2023-08-16 23:07:20 save_restore_connector:247] Model BERTQAModel was successfully restored from /workspace/results/nemo_question_answering/checkpoints/bert_squad_v2_0.nemo.
...
[NeMo I 2023-08-16 23:07:20 exp_manager:370] Experiments will be logged at /workspace/results/nemo_question_answering/QA-SQuAD2/2023-08-16_21-35-22
[NeMo I 2023-08-16 23:07:20 exp_manager:788] TensorboardLogger has been set up

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 394.84it/s]

100%|███████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 124091.83it/s]

100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 74764.78it/s]

100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 11735.60it/s]
France
10th and 11th centuries
Denmark, Iceland and Norway
Rollo
10th century
The Normans
Normandy
The Normans
first half of the 10th century
William the Conqueror
```

### S2S BART Model

#### Set Model Config Values

- `facebook/bart-base` is set as the pretrained model and also as the tokenizer name.
- Set the path to save the output model as `../checkpoints/bart_squad_v2_0.nemo`

In [None]:
# set language model and tokenizer to be used
# tokenizer is derived from model if a tokenizer name is not provided
config.model.language_model.pretrained_model_name = "facebook/bart-base"
config.model.tokenizer.tokenizer_name = "facebook/bart-base"

# path where model will be saved
config.model.nemo_path = f"{WORK_DIR}/checkpoints/bart_squad_v2_0.nemo"

config.exp_manager.create_checkpoint_callback = True

config.model.optim.lr = 5e-5

#remove vocab_file from gpt model
config.model.tokenizer.vocab_file = None

#### Create trainer and initialize model

In [None]:
trainer = pl.Trainer(**config.trainer)
model = S2SQAModel(config.model, trainer=trainer)

##### Expected Output
```python
[NeMo I 2023-08-14 04:35:28 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: facebook/bart-base, vocab_file: None, merges_files: None, special_tokens_dict: {}, and use_fast: False
[NeMo I 2023-08-14 04:35:28 qa_processing:106] mean no. of chars in doc: 839.2727272727273
[NeMo I 2023-08-14 04:35:28 qa_processing:107] max no. of chars in doc: 1895
[NeMo I 2023-08-14 04:35:28 qa_processing:106] mean no. of chars in doc: 677.5487804878048
[NeMo I 2023-08-14 04:35:28 qa_processing:107] max no. of chars in doc: 1782
...
[NeMo I 2023-08-14 04:35:42 qa_processing:107] max no. of chars in doc: 1364
[NeMo I 2023-08-14 04:35:42 qa_processing:106] mean no. of chars in doc: 848.4090909090909
[NeMo I 2023-08-14 04:35:42 qa_processing:107] max no. of chars in doc: 1901
[NeMo I 2023-08-14 04:35:42 qa_s2s_dataset:103] Preprocessing data into features.
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 512.77it/s]
[NeMo I 2023-08-14 04:35:42 qa_s2s_dataset:73] Converting dict features into object features

100%|█████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 789887.76it/s]
```

#### Train, Test, and Save the Model

The maximum number of epochs is set to 5 (`note that it requires 7mins to complete 1 epoch`), increasing the value may give better result but take more execution time.

In [None]:
trainer.fit(model)
trainer.test(model)

model.save_to(config.model.nemo_path)

##### Expected Output
```python
...    
`Trainer.fit` stopped: `max_epochs=5` reached.
...
 
Testing: 0it [00:00, ?it/s]
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test exact: 29.0
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test f1: 32.60714285714286
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test total: 100.0
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test HasAns_exact: 64.44444444444444
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test HasAns_f1: 72.46031746031747
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test HasAns_total: 45.0
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test NoAns_exact: 0.0
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test NoAns_f1: 0.0
[NeMo I 2023-08-16 21:50:01 qa_s2s_model:114] test NoAns_total: 55.0
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│     test_HasAns_exact     │     64.44444274902344     │
│      test_HasAns_f1       │     72.46031951904297     │
│     test_HasAns_total     │           45.0            │
│     test_NoAns_exact      │            0.0            │
│       test_NoAns_f1       │            0.0            │
│     test_NoAns_total      │           55.0            │
│        test_exact         │           29.0            │
│          test_f1          │     32.60714340209961     │
│         test_loss         │    2.6404690742492676     │
│        test_total         │           100.0           │
└───────────────────────────┴───────────────────────────┘

```

### Load the saved model and run inference

In [None]:
S2S_model = S2SQAModel.restore_from(config.model.nemo_path)

eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
S2S_model.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

config.exp_manager.create_checkpoint_callback = False
exp_dir = exp_manager(S2S_model.trainer, config.exp_manager)
output_nbest_file = os.path.join(exp_dir, "output_nbest_file.json")
output_prediction_file = os.path.join(exp_dir, "output_prediction_file.json")

all_preds, all_nbest = S2S_model.inference(
    config.model.test_ds.file,
    output_prediction_file=output_prediction_file,
    output_nbest_file=output_nbest_file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

While running the inference, it is possible to see that not all responses matched the expected output. You can modify the value of `config.trainer.max_epochs = 5` above and retrain to see better results. Note that this step will take more time to get executed

##### Expected Output
```python
...
[NeMo I 2023-08-16 21:50:26 save_restore_connector:247] Model S2SQAModel was successfully restored from /workspace/results/nemo_question_answering/checkpoints/bart_squad_v2_0.nemo.
...
[NeMo I 2023-08-16 21:50:26 exp_manager:370] Experiments will be logged at /workspace/results/nemo_question_answering/QA-SQuAD2/2023-08-16_21-35-22
[NeMo I 2023-08-16 21:50:26 exp_manager:788] TensorboardLogger has been set up
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 362.05it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 59918.63it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 76121.67it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 60349.70it/s]
France
the 10th and 11th centuries
Denmark, Iceland and Norway
Rollo
the 10th century
The Normans
Normans
the Normans
first half of the 10th century
William the Conqueror

```

## Inference on Custom Sample Dataset in SQuAD Format

In this section, we will create new test data for inferencing. The essence is to measure how well our trained model behaves in the presence of unseen data. The test data consists of two contexts with questions only. The answers are deducible from the context. It is expected that both the BERT model and the BRAT model should be able to answer at least 80% of the questions correctly. After running the cells below, you can modify the questions or both the context and question and rerun the cells to see the response.

#### Create a sample dataset 

In [None]:
import json
# Sample dataset content
dataset = {
  "version": "1.0",
  "data": [
    {
      "title": "This is a sample custom dataset",
      "paragraphs": [
        {
          "context": "In 2010 the Amazon rainforest experienced another severe drought, in some ways more extreme than the 2005 drought. \
          The affected region was approximately 1,160,000 square miles (3,000,000 km2) of rainforest, compared to 734,000 square miles (1,900,000 km2) \
          in 2005. The 2010 drought had three epicenters where vegetation died off, whereas in 2005 the drought was focused on the southwestern part. \
          The findings were published in the journal Science. In a typical year the Amazon absorbs 1.5 gigatons of carbon dioxide; during 2005 instead \
          5 gigatons were released and in 2010 8 gigatons were released.",
          "qas": [
            {
              "question": "How many gigatons of carbon are absorbed by the Amazon in a typical year?",
              "id": "q1"
            },
            {
              "question": "What was the affected region by the drought in 2010 approximately?",
              "id": "q2"
            },
            {
              "question": "What were the findings regarding the droughts published in?",
              "id": "q3"
            },
            {
              "question": "How many gigatons of carbon were released during the 2005 drought?",
              "id": "q4"
            },
            {
              "question": "How did the 2010 drought differ from the 2005 drought in terms of epicenters?",
              "id": "q5"
            }
          ]
        },
          {
          "context": "The sun is a massive ball of hot, glowing gases at the center of our solar system. It provides light, heat, and energy that sustains \
          life on Earth. The sun's surface temperature is around 5,500 degrees Celsius (9,932 degrees Fahrenheit), while its core temperature reaches about \
          15 million degrees Celsius (27 million degrees Fahrenheit). The sun's energy is generated through a process called nuclear fusion, where hydrogen \
          atoms combine to form helium, releasing immense amounts of energy in the process.",
          "qas": [
            {
              "question": "What is the approximate surface temperature of the sun?",
              "id": "q6"
            },
            {
              "question": "How does the sun generate its energy?",
              "id": "q7"
            },
            {
              "question": "What is the core temperature of the sun?",
              "id": "q8"
            },
            {
              "question": "What process is responsible for the sun's energy generation, where hydrogen atoms combine to form helium?",
              "id": "q9"
            }
          ]
        }
      ]
    }
  ]
}


# Save the dataset as a JSON file
output_file = f"{DATA_DIR}/squad/sample_dataset.json"
with open(output_file, "w") as json_file:
    json.dump(dataset, json_file, indent=4)

print(f"Dataset saved as '{output_file}'")


#### Modify the Config file

Replace the path of the test file in the config file with: `{DATA_DIR}/squad/sample_dataset.json`

In [None]:
# Replace the file path for test dataset
config.model.test_ds.file = f"{DATA_DIR}/squad/sample_dataset.json"

#### 1. Run Inference with BERT Model

In [None]:
# Load the saved model and run inference
bert_model = BERTQAModel.restore_from("/workspace/results/nemo_question_answering/checkpoints/bert_squad_v2_0.nemo")
eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
bert_model.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

config.exp_manager.create_checkpoint_callback = False
exp_dir = exp_manager(bert_model.trainer, config.exp_manager)
bert_output_nbest_file = os.path.join(exp_dir, "bert_output_nbest_file.json")
bert_output_prediction_file = os.path.join(exp_dir, "bert_output_prediction_file.json")

all_preds, all_nbest = bert_model.inference(
    config.model.test_ds.file,
    output_prediction_file=bert_output_prediction_file,
    output_nbest_file=bert_output_nbest_file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

#### Expected Output
```python
[NeMo I 2023-08-17 00:07:46 save_restore_connector:247] Model BERTQAModel was successfully restored from /workspace/results/nemo_question_answering/checkpoints/bert_squad_v2_0.nemo.
...
[NeMo I 2023-08-17 00:07:46 exp_manager:370] Experiments will be logged at /workspace/results/nemo_question_answering/QA-SQuAD2/2023-08-16_21-35-22
[NeMo I 2023-08-17 00:07:46 exp_manager:788] TensorboardLogger has been set up

100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 475.11it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 57368.90it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 35017.38it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 10163.90it/s]

1.5
1,160,000 square miles (3,000,000 km2) of rainforest
journal Science
1.5 gigatons of carbon dioxide; during 2005 instead 5 gigatons were released and in 2010 8
more extreme
around 5,500 degrees Celsius
nuclear fusion
15 million degrees Celsius
nuclear fusion
```

#### 2. Run Inference with S2S BART Model

In [None]:
# Load the saved model and run inference
bart_model = S2SQAModel.restore_from("/workspace/results/nemo_question_answering/checkpoints/bart_squad_v2_0.nemo")

eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
bart_model.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

config.exp_manager.create_checkpoint_callback = False
exp_dir = exp_manager(bart_model.trainer, config.exp_manager)
bart_output_nbest_file = os.path.join(exp_dir, "bart_output_nbest_file.json")
bart_output_prediction_file = os.path.join(exp_dir, "bart_output_prediction_file.json")

all_preds, all_nbest = bart_model.inference(
    config.model.test_ds.file,
    output_prediction_file=bart_output_prediction_file,
    output_nbest_file=bart_output_nbest_file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

#### Expected Output
```python
[NeMo I 2023-08-17 00:07:53 save_restore_connector:247] Model S2SQAModel was successfully restored from /workspace/results/nemo_question_answering/checkpoints/bart_squad_v2_0.nemo.
...
[NeMo I 2023-08-17 00:07:53 exp_manager:370] Experiments will be logged at /workspace/results/nemo_question_answering/QA-SQuAD2/2023-08-16_21-35-22
[NeMo I 2023-08-17 00:07:53 exp_manager:788] TensorboardLogger has been set up

100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 443.75it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 57808.17it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 53773.13it/s]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 74162.55it/s]

1.5
1,160,000 square miles
journal Science
5
where vegetation died off
5,500
nuclear fusion
15 million degrees Celsius
nuclear fusion
```

---
To solidify your understanding of this lab, click the `Lab Activity 1` link below to start building your custom model.

## <center><div style="text-align:center; color:#FF0000; border:3px solid red; height:80px;"> <b><br/>[Lab Activity 1](Activity1.ipynb)</b> </div> </center>
---

### Resources
Below are resourceful links to guide you and assist you in learning more.
- [NeMo Models](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/core.html)
- [Core APIs](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/api.html)
- [Experiment Manager](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/exp_manager.html)
- [Exporting NeMo Models](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/export.html)
- [Prompt Learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html)
- [NeMo Megatron API](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/api.html)

---

## Licensing
Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.

<div>
    <span style="float: left; width: 33%; text-align: left;"><a href="NeMo_Primer.ipynb">Previous Notebook</a></span>
    <span style="float: left; width: 33%; text-align: center;">
         <a href="NeMo_Primer.ipynb">1</a>
         <a>2</a>
        <a href="Multitask_Prompt_and_PTuning.ipynb">3</a>
        <a href="demo.ipynb">4</a>
    </span>
    <span style="float: left; width: 33%; text-align: right;"><a href="Multitask_Prompt_and_PTuning.ipynb">Next Notebook </a></span>
</div>

<p> <center> <a href="../../Start_Here.ipynb">Home Page</a> </center> </p>