# Lab Activity 1: Solution Prototype
---

## Instructions

This lab activity is vital to test your understanding and assist you in perfecting what you have learned in the previous notebook. 
You are to reproduce the training and inferencing process for Question-answering using the already-used QuAC dataset. To complete the lab activity, you are to implement the following steps:

- Understand the structure of the QuAC dataset and process the `quac_v0.2.json` file into a training and validation set. You can decide the split ratio and preprocess the dataset into SQuAD JSON format.
- Import important libraries and use the Bert model `BERTQAModel` for training and inferencing
- Use `omegaconf` to:
    - set dataset config values
    - set trainer config values 
    - set experiment manager config values
- Train and Test your Model
- Run inference using the custom dataset


Part of the solution code is written for you. You are to complete the rest by filling in the statements with the missing value(s) in the commented areas of the notebook. We recommend consciously setting/modifying the `config.trainer.max_epochs` value as it determines the time to complete this lab. The lab activity should not exceed `45 mins`; therefore, the value of `config.trainer.max_epochs` should be between `1 and 6` as one epoch may take up to `7 mins`. If you plan to exceed these values, run this notebook after the Bootcamp active hours.

Note: *You are not expected to get the best result as this activity is for learning. To achieve better results, you can modify parameters: the number of epochs, learning rate, batch size, max step, training, and validation set sample size*.

<div style="text-align:left; color:#FF0000; height:40px; text-color:red; font-size:20px">Before you run this notebook, please close and shut down the kernel of the previous notebooks. </div>

## QuAC Dataset

Question Answering in Context is a dataset for modeling, understanding, and participating in information-seeking dialog. QuAC contains 98,407 QA pairs from 13,594 discussions. The dialogues were conducted on 8,854 unique sections from 3,611 unique Wikipedia articles, and every talk had between four and twelve questions. More information can be found on [Hugging Face dataset page](https://huggingface.co/datasets/quac).
The Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context. You can read more from [QuAC's official website](https://quac.ai/).


### Data Fields

- **dialogue_id**: ID of the dialogue.
- **wikipedia_page_title**: title of the Wikipedia page.
- **background**: the first paragraph of the main Wikipedia article.
- **section_tile**: Wikipedia section title.
- **context**: Wikipedia section text.
- **turn_ids**: list of identification of dialogue turns—one list of IDs per dialogue.
- **question**: list of questions in the dialogue. One list of questions per dialogue.
- **followups**: list of followup actions in the dialogue. One list of followups per dialogue. y: follow, m: maybe follow yp, n: don't follow up.
- **yesno**: list of yes/no in the dialogue. One list of yes/no per dialogue. y: yes, n: no, x: neither.
- **answers**: dictionary of responses to the questions (validation step of data collection)
- **answer_starts**: list of list of starting offsets. For training, a list of single-element lists (one answer per question).
- **texts**: list of list of span texts answering questions. For training, a list of single-element lists (one answer per question).
- **orig_answers**: a dictionary of original answers (the ones provided by the teacher in the dialogue)
- **answer_starts**: list of starting offsets
- **texts**: list of span texts answering questions.

## File Formats

The QuAC dataset is in JSON format. A validation example is given below:



```json
{'paragraphs': [{'context': 'Walton was born in La Mesa, California, the son of Gloria Anne (nee Hickey) and William Theodore "Ted" Walton. His listed adult playing height was 6 feet 11 inches; it has been reported that Walton is actually taller (7 feet 2 inches, or more) but does not like being categorized as a seven-footer. He played high school basketball at Helix High School. At age 17, Walton played for the United States men\'s national basketball team at the 1970 FIBA World Championship. He played college basketball for John Wooden at the University of California, Los Angeles (UCLA) from 1971 to 1974, winning the national title in 1972 over Florida State and again in 1973 with an 87-66 win over Memphis State in which Walton made 21 of 22 field goal attempts and scored 44 points, representing more than half his team\'s total. The Walton-led 1971-72 UCLA basketball team had a record of 30-0, in the process winning its games by an average margin of more than 30 points. He was the backbone of two consecutive 30-0 seasons and was also part of UCLA\'s NCAA men\'s basketball record 88-game winning streak. The UCLA streak contributed to a personal winning streak that lasted almost five years, in which Walton\'s high school, UCLA freshman (freshmen were ineligible for the varsity at that time) and UCLA varsity teams did not lose a game from the middle of his junior year of high school to the middle of his senior year in college. Walton was the 1973 recipient of the James E. Sullivan Award as the top amateur athlete in the United States. Walton also received the USBWA College Player of the Year and Naismith College Player of the Year as the top college basketball player in the country three years in a row while attending UCLA, at the same time earning Academic All-American honors three times. Some college basketball historians rate Walton as the greatest who ever played the game at the college level. In Walton\'s senior year during the 1973-74 season, the school\'s 88-game winning streak ended with a 71-70 loss to Notre Dame. During the same season, UCLA\'s record seven consecutive national titles was broken when North Carolina State defeated the Bruins 80-77 in double overtime in the NCAA semi-finals. With Walton\'s graduation in 1974 and Bruin coach John Wooden\'s retirement after UCLA\'s 1975 national title, the UCLA dynasty came to an end. Prior to joining the varsity team, Walton (18.1, 68.6 percent), along with Greg Lee (17.9 ppg) and Keith Wilkes (20.0 ppg), was a member of the 20-0 UCLA Freshman team. CANNOTANSWER',
   'qas': [{'followup': 'y',
     'yesno': 'x',
     'question': 'Where was he born?',
     'answers': [{'text': 'Walton was born in La Mesa, California,',
       'answer_start': 0},
      {'text': 'La Mesa, California,', 'answer_start': 19},
      {'text': 'Walton was born in La Mesa, California,', 'answer_start': 0},
      {'text': 'La Mesa, California,', 'answer_start': 19},
      {'text': 'Walton was born in La Mesa, California,', 'answer_start': 0}],
     'id': 'C_ece8b3ecad8a47d5a1380955ce47184a_1_q#0',
     'orig_answer': {'text': 'Walton was born in La Mesa, California,',
      'answer_start': 0}},
    {'followup': 'y',
     'yesno': 'y',
     'question': 'Did he play college basketball?',
     'answers': [{'text': 'Championship. He played college basketball for John Wooden at the University of California, Los Angeles (',
       'answer_start': 455},
      {'text': 'He played high school basketball', 'answer_start': 299},
      {'text': 'He played college basketball for John Wooden at the University of California, Los Angeles (UCLA) from 1971 to 1974,',
       'answer_start': 469},
      {'text': 'He played college basketball for John Wooden at the University of California,',
       'answer_start': 469},
      {'text': 'He played college basketball for John Wooden at the University of California, Los Angeles (UCLA)',
       'answer_start': 469}],
     'id': 'C_ece8b3ecad8a47d5a1380955ce47184a_1_q#1',
     'orig_answer': {'text': 'He played college basketball for John Wooden at the University of California, Los Angeles (UCLA)',
      'answer_start': 469}},

     ...
    
    {'followup': 'y',
     'yesno': 'y',
     'question': 'Did he have any notable games?',
     'answers': [{'text': 'The Walton-led 1971-72 UCLA basketball team had a record of 30-0, in the process winning its games by an average margin of more than 30 points.',
       'answer_start': 812},
      {'text': "UCLA's record seven consecutive national titles was broken when North Carolina State defeated the Bruins 80-77 in double overtime in the NCAA semi-finals.",
       'answer_start': 2046},
      {'text': 'winning the national title in 1972 over Florida State and again in 1973 with an 87-66 win over Memphis State',
       'answer_start': 585},
      {'text': "In Walton's senior year during the 1973-74 season, the school's 88-game winning streak ended with a 71-70 loss to Notre Dame.",
       'answer_start': 1896},
      {'text': 'The Walton-led 1971-72 UCLA basketball team had a record of 30-0, in the process winning its games by an average margin of more than 30 points.',
       'answer_start': 812}],
     'id': 'C_ece8b3ecad8a47d5a1380955ce47184a_1_q#5',
     'orig_answer': {'text': 'The Walton-led 1971-72 UCLA basketball team had a record of 30-0, in the process winning its games by an average margin of more than 30 points.',
      'answer_start': 812}}],
   'id': 'C_ece8b3ecad8a47d5a1380955ce47184a_1'}],
 'section_title': 'Early life and college career',
 'background': "William Theodore Walton III (born November 5, 1952) is an American retired basketball player and television sportscaster. Walton became known playing for John Wooden's powerhouse UCLA Bruins in the early 1970s, winning three successive College Player of the Year Awards, while leading the Bruins to two Division I national titles. He then went on to have a prominent career in the National Basketball Association (NBA) where he was a league Most Valuable Player (MVP) and won two NBA championships.",
 'title': 'Bill Walton'}
```

## Data Preparation

### 1. Download a sample QuAC dataset
To get started, you need to download the QuAC dataset by running the `dataset_quac.py` script in the cell below.

In [None]:
!python3 ../../source_code/dataset_quac.py

- Set the path to the following directories:
    - DATA_DIR: the path to the datasets
    - WORK_DIR:  path to the directory for storing trained models and logs
    - script_dir: path the source codes  

In [None]:
DATA_DIR = "/workspace/data/activity1"
WORK_DIR = "/workspace/results/activity1/nemo_question_answering"
script_dir = "/workspace/source_code/activity1/nemo_question_answering"

### 2. Dataset Splitting: Training and Testing Samples

- The QuAC dataset in JSON format is provided as a single file containing 1000 examples.
- You must split the dataset into separate training and validation sets to facilitate model training and evaluation.
- Before splitting the dataset, define the desired ratio between the training and validation sets. In this example, we set train_ratio = 0.8, corresponding to an 80% train and 20% validation split. The split_dataset function is called with the data and train_ratio as arguments to split the dataset accordingly. The train data is saved as `quac_2_squad_train.json`, and the validation data as `quac_2_squad_val.json`. These files are stored in the `/workspace/data/activity1/quac/` directory.
- Use the code snippet below, which demonstrates the dataset coversion into SQuAD format, and then execute the splitting process.

Before running the code snippet for dataset conversion and splitting, please ensure that the `quac_v0.2.json` dataset file exists in your project directory by running the cell below.

In [None]:
!ls -LR {DATA_DIR}/quac

Code snippet for converting QuAC to SQuAD format. You can rewrite the code block or use it as it is.

In [None]:
import os
import wget
import json

class MakeDataset():
    def __init__(self, data_row):
        self.data_row = data_row
        self.final_json = {}
        self.final_json['version'] = "v2.0"
        self.final_json['data'] = []

    def reader(self):
        verbose = 1
        train_file_path = f"{DATA_DIR}/quac/quac_val_v0.2.json"
        if verbose:
            print("Reading the json file")    
        file_train = json.loads(open(train_file_path).read())
        if verbose:
            print("processing...")
        self.data_row = [topic for topic in file_train['data']]
        
              
    
    # make json format    
    def make_json(self):
        for i in range(len(self.data_row)):
            self.brace_in_data ={}
            self.brace_in_data['title'] = self.data_row[i]['title']
            self.brace_in_data['paragraphs'] = []
            paragraphs = self.data_row[i]['paragraphs']
            for j in range(len(paragraphs)):
                brace_in_paragaraphs = {}
                brace_in_paragaraphs['context'] = paragraphs[j]['context']
                qas = paragraphs[j]['qas']
                brace_in_paragaraphs['qas'] = []    
                for k in range(len(qas)):
                    brace_in_qas = {}
                    brace_in_qas['question'] = qas[k]['question']
                    brace_in_qas['id'] = qas[k]['id']
                    answer = qas[k]['answers']
                    if len(answer) == 0:
                        brace_in_qas['answers'] = [] 
                        brace_in_qas['is_impossible'] = True
                    else:
                        brace_in_qas['answers'] =[{'text':answer[0]['text'], 'answer_start':answer[0]['answer_start']}]
                        brace_in_qas['is_impossible'] = False

                    brace_in_paragaraphs['qas'].append(brace_in_qas)
                self.brace_in_data['paragraphs'].append(brace_in_paragaraphs)
            self.final_json['data'].append(self.brace_in_data)        
    
    #save the json file               
    def save_json(self, filename):
        with open(f"{DATA_DIR}/quac/{filename}.json", "w") as write_file:
            json.dump(self.final_json, write_file, indent=4)
            print("{} saved in SQauD json format ....".format(filename))


Run the cell below to set the directory path to store the dataset and the output files.

In [None]:
# read the quac dataset
verbose = 1
train_file_path = f"{DATA_DIR}/quac/quac_v0.2.json"
if verbose:
    print("Reading the json file")    
file_train = json.loads(open(train_file_path).read())
if verbose:
    print("processing...")
data_row = [topic for topic in file_train['data']]

# split the dataset into 800 examples for train set and 200 for validation set. You can modify the ratio based on your preferences
train = data_row[:800]
val = data_row[800:]

#create objects of MakeDataset class and pass split set
train_Obj = MakeDataset(train)
val_Obj = MakeDataset(val)

# call the reader function: Obj.reader()
train_Obj.make_json()
val_Obj.make_json()

# set the filenames 
train_filename = 'quac_2_squad_train'
val_filename = 'quac_2_squad_val'


# generate the SQuAD json format
train_Obj.save_json(train_filename)
val_Obj.save_json(val_filename)

Run the cell below to view an example from the training set in SQuAD format.

In [None]:
path = f"{DATA_DIR}/quac/quac_2_squad_train.json"

train = json.loads(open(path).read())
rows = [topic for topic in train['data']]

rows[799]

Import important libraries `pytorch_lightning,` `OmegaConf,` `BERTQAModel,` and `exp_manager.`

In [None]:
import gc

import pytorch_lightning as pl
from omegaconf import OmegaConf

from nemo.collections.nlp.models.question_answering.qa_bert_model import BERTQAModel
from nemo.utils.exp_manager import exp_manager

pl.seed_everything(42)
gc.disable()

# Configuration

Defined the Model in the config file. The config file has multiple important sections that include:

- **Model**: All arguments that will relate to the Model - language model, span prediction, optimizer and schedulers, datasets, and any other related information
- **trainer**: the training object argument to be passed to PyTorch Lightning
- **exp_manager**: All arguments used for setting up the experiment manager - target directory, name, logger information

We will set the path to the default config file `qa_conf.yaml` and edit the necessary values for training different models.

In [None]:
config_dir = '/workspace/source_code/activity1/nemo_question_answering/conf'

Run the cell below to print the entire default config

In [None]:
config_path = f'{config_dir}/qa_conf.yaml'
print(config_path)
config = OmegaConf.load(config_path)
print("Default Config - \n")
print(OmegaConf.to_yaml(config))

### Set dataset config values

Set essential parameters like the path to the train, validation, and text sets; batch size; and the number of training, validation, and test samples.

In [None]:
# if True, model will load features from cache if file is present, or
# create features and dump to cache file if not already present
config.model.dataset.use_cache = False

# indicates whether the dataset has unanswerable questions
config.model.dataset.version_2_with_negative = True

# indicates whether the dataset is of extractive nature or not
# if True, context spans/chunks that do not contain answer are treated as unanswerable 
config.model.dataset.check_if_answer_in_context = True

# set file paths for train, validation, and test datasets
config.model.train_ds.file = f"{DATA_DIR}/quac/quac_2_squad_train.json"
config.model.validation_ds.file = f"{DATA_DIR}/quac/quac_2_squad_val.json"
config.model.test_ds.file = f"{DATA_DIR}/quac/quac_2_squad_val.json"

# set batch sizes for train, validation, and test datasets
config.model.train_ds.batch_size = 8
config.model.validation_ds.batch_size = 8
config.model.test_ds.batch_size = 8

# set number of samples to be used from dataset. setting to -1 uses entire dataset
config.model.train_ds.num_samples = 5000
config.model.validation_ds.num_samples = 1000
config.model.test_ds.num_samples = 100  

### Set trainer config values

These values include the maximum number of epochs, max steps, device, accelerator, and trainer strategy. (*Depending on the device, one epoch may take up to 7 minutes.*)

In [None]:
config.trainer.max_epochs = 5
config.trainer.max_steps = -1 # takes precedence over max_epochs
config.trainer.precision = 16
config.trainer.devices = [0] # 0 for CPU, or list of the GPUs to use [0] this tutorial does not support multiple GPUs. If needed please use NeMo/examples/nlp/question_answering/question_answering.py
config.trainer.accelerator = "gpu"
config.trainer.strategy="dp"
#config.trainer.devices = 1
#config.trainer.strategy="auto"

### Set experiment manager config values

In [None]:
config.exp_manager.exp_dir = WORK_DIR
config.exp_manager.name = "QA-SQuAD2"
config.exp_manager.create_wandb_logger=False

## Training and Testing Models


### BERT Model

- Set Model Config Values
    - `bert-base-based` is set as the pretrained model and also as the tokenizer name.
    - Set the model optimizer learning rate `config.model.optim.lr`

In [None]:
# set language model and tokenizer to be used
# tokenizer is derived from model if a tokenizer name is not provided
config.model.language_model.pretrained_model_name = "bert-base-uncased"
config.model.tokenizer.tokenizer_name = "bert-base-uncased"

# path where model will be saved
config.model.nemo_path = f"{WORK_DIR}/checkpoints/bert_squad_v2_0.nemo"

config.exp_manager.create_checkpoint_callback = True

config.model.optim.lr = 3e-5

- Create Trainer and Initialize Model

In [None]:
trainer = pl.Trainer(**config.trainer)
model = BERTQAModel(config.model, trainer=trainer)

- Train, Test, and Save the Model *(Depending on the device, one epoch may take up to 7 minutes.)*

In [None]:
trainer.fit(model)
trainer.test(model)

model.save_to(config.model.nemo_path)

#### Load the Saved Model and Run Inference

While running the Inference, it is possible to see that not all responses matched the expected output. It is because of the number of epochs used to avoid long training time.

In [None]:
import os

Bmodel = BERTQAModel.restore_from(config.model.nemo_path)

eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
Bmodel.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

config.exp_manager.create_checkpoint_callback = False
exp_dir = exp_manager(Bmodel.trainer, config.exp_manager)
output_nbest_file = os.path.join(exp_dir, "output_nbest_file.json")
output_prediction_file = os.path.join(exp_dir, "output_prediction_file.json")

all_preds, all_nbest = Bmodel.inference(
    config.model.test_ds.file,
    output_prediction_file=output_prediction_file,
    output_nbest_file=output_nbest_file,
    num_samples=20, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

## Inference on Custom Sample Dataset in SQuAD Format

Create new test data for inferencing. The essence is to measure how well our trained model behaves in the presence of unseen data. The test data consists of two contexts with questions only. The answers are deducible from the context. It is expected that both the BERT model should be able to answer at least 80% of the questions correctly.

In [None]:
import json
# Sample dataset content
dataset = {
  "version": "1.0",
  "data": [
    {
      "title": "This is a sample custom dataset",
      "paragraphs": [
        {
          "context": "In 2010 the Amazon rainforest experienced another severe drought, in some ways more extreme than the 2005 drought. \
          The affected region was approximately 1,160,000 square miles (3,000,000 km2) of rainforest, compared to 734,000 square miles (1,900,000 km2) \
          in 2005. The 2010 drought had three epicenters where vegetation died off, whereas in 2005 the drought was focused on the southwestern part. \
          The findings were published in the journal Science. In a typical year the Amazon absorbs 1.5 gigatons of carbon dioxide; during 2005 instead \
          5 gigatons were released and in 2010 8 gigatons were released.",
          "qas": [
            {
              "question": "How many gigatons of carbon are absorbed by the Amazon in a typical year?",
              "id": "q1"
            },
            {
              "question": "What was the affected region by the drought in 2010 approximately?",
              "id": "q2"
            },
            {
              "question": "What were the findings regarding the droughts published in?",
              "id": "q3"
            },
            {
              "question": "How many gigatons of carbon were released during the 2005 drought?",
              "id": "q4"
            },
            {
              "question": "How did the 2010 drought differ from the 2005 drought in terms of epicenters?",
              "id": "q5"
            }
          ]
        },
          {
          "context": "The sun is a massive ball of hot, glowing gases at the center of our solar system. It provides light, heat, and energy that sustains \
          life on Earth. The sun's surface temperature is around 5,500 degrees Celsius (9,932 degrees Fahrenheit), while its core temperature reaches about \
          15 million degrees Celsius (27 million degrees Fahrenheit). The sun's energy is generated through a process called nuclear fusion, where hydrogen \
          atoms combine to form helium, releasing immense amounts of energy in the process.",
          "qas": [
            {
              "question": "What is the approximate surface temperature of the sun?",
              "id": "q6"
            },
            {
              "question": "How does the sun generate its energy?",
              "id": "q7"
            },
            {
              "question": "What is the core temperature of the sun?",
              "id": "q8"
            },
            {
              "question": "What process is responsible for the sun's energy generation, where hydrogen atoms combine to form helium?",
              "id": "q9"
            }
          ]
        }
      ]
    }
  ]
}


# Save the dataset as a JSON file
output_file = f"{DATA_DIR}/quac/sample_dataset.json"
with open(output_file, "w") as json_file:
    json.dump(dataset, json_file, indent=4)

print(f"Dataset saved as '{output_file}'")

#### Modify the Config file

Replace the path of the test file in the config file with: `{DATA_DIR}/quac/sample_dataset.json`

In [None]:
# Replace the file path for test dataset
config.model.test_ds.file = f"{DATA_DIR}/quac/sample_dataset.json"

#### Run Inference with BERT Model

In [None]:
# Load the saved model and run inference
bert_model = BERTQAModel.restore_from("/workspace/results/activity1/nemo_question_answering/checkpoints/bert_squad_v2_0.nemo")
eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
bert_model.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

config.exp_manager.create_checkpoint_callback = False
exp_dir = exp_manager(bert_model.trainer, config.exp_manager)
bert_output_nbest_file = os.path.join(exp_dir, "bert_output_nbest_file.json")
bert_output_prediction_file = os.path.join(exp_dir, "bert_output_prediction_file.json")

all_preds, all_nbest = bert_model.inference(
    config.model.test_ds.file,
    output_prediction_file=bert_output_prediction_file,
    output_nbest_file=bert_output_nbest_file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

---
## Create Personal Json Test data

Using the cell below, create a json test set with a single `context` and three `questions`.

In [None]:
# The context and questions are expected to be filled by each participants and so the content may vary for each person.

import json
# Sample dataset content
test_set = {
  "version": "1.0",
  "data": [
    {
      "title": "This is a sample custom dataset",
      "paragraphs": [
        {
          "context": " ",
          "qas": [
            {
              "question": "?",
              "id": "q1"
            },
            {
              "question": "?",
              "id": "q2"
            },
            {
              "question": "?",
              "id": "q3"
            },
          ]
        },
          
      ]
    }
  ]
}


# Save the test as a JSON file
test_file = f"{DATA_DIR}/quac/sample_test_set.json"
with open(test_file, "w") as json_file:
    json.dump(test_set, json_file, indent=4)

print(f"Dataset saved as '{test_file}'")

#### Run Inference Using Your Created Personal Test Data

In [None]:
# Replace the file path for test dataset
config.model.test_ds.file = f"{DATA_DIR}/quac/sample_test_set.json"


# Load the saved model and run inference
bert_model = BERTQAModel.restore_from("/workspace/results/activity1/nemo_question_answering/checkpoints/bert_squad_v2_0.nemo")
eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
bert_model.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

config.exp_manager.create_checkpoint_callback = False
exp_dir = exp_manager(bert_model.trainer, config.exp_manager)
bert_output_nbest_file = os.path.join(exp_dir, "bert_output_nbest_file.json")
bert_output_prediction_file = os.path.join(exp_dir, "bert_output_prediction_file.json")

all_preds, all_nbest = bert_model.inference(
    config.model.test_ds.file,
    output_prediction_file=bert_output_prediction_file,
    output_nbest_file=bert_output_nbest_file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

---
### Resources
Below are resourceful links to guide you and assist you in learning more.
- [NeMo Models](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/core.html)
- [Core APIs](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/api.html)
- [Experiment Manager](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/exp_manager.html)
- [Exporting NeMo Models](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/export.html)
- [Prompt Learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html)
- [NeMo Megatron API](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/api.html)

---
## Licensing
Copyright © 2022 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.