![DLI Header](images/DLI_Header.png)

# Overview

## Task Description

- Given a context and a natural language query, we want to generate an answer for the query
- Depending on how the answer is generated, the task can be broadly divided into two types:
    1. Extractive Question Answering
    2. <b>Generative Question Answering</b>

### Generative Question-Answering with S2S and GPT-like models

Given a question and a context, both in natural language, generate an answer for the question. Unlike the BERT-like models, there is no constraint that the answer should be a span within the context.

In [1]:
BRANCH = 'main'

# Imports and constants

In [2]:
import os
import wget
import gc

import pytorch_lightning as pl
from omegaconf import OmegaConf

from nemo.collections.nlp.models.question_answering.qa_gpt_model import GPTQAModel
from nemo.collections.nlp.models.question_answering.qa_s2s_model import S2SQAModel

gc.disable()

NOTE! Installing ujson may make loading annotations faster.


In [3]:
# set the following paths
DATA_DIR = "data" # directory for storing datasets
WORK_DIR = "work_dir" # directory for storing trained models, logs, additionally downloaded scripts

os.makedirs(DATA_DIR, exist_ok=True)
os.makedirs(WORK_DIR, exist_ok=True)

# Configuration

The model is defined in a config file which declares multiple important sections:
- **model**: All arguments that will relate to the Model - language model, span prediction, optimizer and schedulers, datasets and any other related information
- **trainer**: Any argument to be passed to PyTorch Lightning
- **exp_manager**: All arguments used for setting up the experiment manager - target directory, name, logger information

We will download the default config file provided at `NeMo/examples/nlp/question_answering/conf/qa_conf.yaml` and edit necessary values for training different models

In [4]:
# download the model's default configuration file 
config_dir = WORK_DIR + '/conf/'
os.makedirs(config_dir, exist_ok=True)
if not os.path.exists(config_dir + "qa_conf.yaml"):
    print('Downloading config file...')
    wget.download(f'https://raw.githubusercontent.com/NVIDIA/NeMo/{BRANCH}/examples/nlp/question_answering/conf/qa_conf.yaml', config_dir)
else:
    print ('config file already exists')

Downloading config file...


In [5]:
# this will print the entire default config of the model
config_path = f'{WORK_DIR}/conf/qa_conf.yaml'
print(config_path)
config = OmegaConf.load(config_path)
print("Default Config - \n")
print(OmegaConf.to_yaml(config))

work_dir/conf/qa_conf.yaml
Default Config - 

pretrained_model: null
do_training: true
trainer:
  devices:
  - 0
  num_nodes: 1
  max_epochs: 3
  max_steps: -1
  accumulate_grad_batches: 1
  gradient_clip_val: 1.0
  precision: 16
  accelerator: gpu
  log_every_n_steps: 5
  val_check_interval: 1.0
  num_sanity_val_steps: 0
  enable_checkpointing: false
  logger: false
  strategy: ddp
model:
  tensor_model_parallel_size: 1
  nemo_path: null
  library: huggingface
  save_model: false
  tokens_to_generate: 32
  dataset:
    version_2_with_negative: true
    doc_stride: 128
    max_query_length: 64
    max_seq_length: 512
    max_answer_length: 30
    use_cache: false
    do_lower_case: true
    check_if_answer_in_context: true
    keep_doc_spans: all
    null_score_diff_threshold: 0.0
    n_best_size: 20
    num_workers: 1
    pin_memory: false
    drop_last: false
  train_ds:
    file: null
    batch_size: 24
    shuffle: true
    num_samples: -1
    num_workers: ${model.dataset.num_worke

# Training and testing models on SQuAD v2.0

## Dataset

For this example, we are going to download the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) dataset to showcase how to do training and inference. There are two datasets, SQuAD1.0 and SQuAD2.0. SQuAD 1.1, the previous version of the SQuAD dataset, contains 100,000+ question-answer pairs on 500+ articles. SQuAD2.0 dataset combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. 

We have prepared the data directory "squad" with the following four files for training and evaluation: 

```
squad  
│
└───v1.1
│   │ -  train-v1.1.json
│   │ -  dev-v1.1.json
│
└───v2.0
    │ -  train-v2.0.json
    │ -  dev-v2.0.json
```

In [6]:
!ls -LR {DATA_DIR}/squad

data/squad:
v1.1  v2.0

data/squad/v1.1:
dev-v1.1.json  train-v1.1.json

data/squad/v2.0:
dev-v2.0.json  train-v2.0.json


## Set dataset config values

In [7]:
# if True, model will load features from cache if file is present, or
# create features and dump to cache file if not already present
config.model.dataset.use_cache = False

# indicates whether the dataset has unanswerable questions
config.model.dataset.version_2_with_negative = True

# indicates whether the dataset is of extractive nature or not
# if True, context spans/chunks that do not contain answer are treated as unanswerable 
config.model.dataset.check_if_answer_in_context = True

# set file paths for train, validation, and test datasets
config.model.train_ds.file = f"{DATA_DIR}/squad/v2.0/train-v2.0.json"
config.model.validation_ds.file = f"{DATA_DIR}/squad/v2.0/dev-v2.0.json"
config.model.test_ds.file = f"{DATA_DIR}/squad/v2.0/dev-v2.0.json"

# set batch sizes for train, validation, and test datasets
config.model.train_ds.batch_size = 8
config.model.validation_ds.batch_size = 8
config.model.test_ds.batch_size = 8

# set number of samples to be used from dataset. setting to -1 uses entire dataset
config.model.train_ds.num_samples = 5000
config.model.validation_ds.num_samples = 1000
config.model.test_ds.num_samples = 100

## Set trainer config values

In [8]:
config.trainer.max_epochs = 1
config.trainer.max_steps = -1 # takes precedence over max_epochs
config.trainer.precision = 16
config.trainer.devices = [0] # 0 for CPU, or list of the GPUs to use [0] this tutorial does not support multiple GPUs. If needed please use NeMo/examples/nlp/question_answering/question_answering.py
config.trainer.accelerator = "gpu"
config.trainer.strategy="auto"

## Set experiment manager config values

In [9]:
# config.exp_manager.exp_dir = WORK_DIR
# config.exp_manager.name = "QA-SQuAD2"
# config.exp_manager.create_wandb_logger=False

## S2S BART model for SQuAD v2.0

### Set model config values

In [10]:
# set language model and tokenizer to be used
# tokenizer is derived from model if a tokenizer name is not provided
config.model.language_model.pretrained_model_name = "facebook/bart-base"
config.model.tokenizer.tokenizer_name = "facebook/bart-base"

# path where model will be saved
config.model.nemo_path = f"{WORK_DIR}/checkpoints/bart_squad_v2_0.nemo"

config.exp_manager.create_checkpoint_callback = True

config.model.optim.lr = 5e-5

#remove vocab_file from gpt model
config.model.tokenizer.vocab_file = None

### Create trainer and initialize model

In [11]:
# uncomment below line and run if you get an error while initializing tokenizer on Colab (reference: https://github.com/huggingface/transformers/issues/8690)
# !rm -r /root/.cache/huggingface/

trainer = pl.Trainer(**config.trainer)
model = S2SQAModel(config.model, trainer=trainer)

Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..


[NeMo I 2025-06-13 17:25:46 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: facebook/bart-base, vocab_file: None, merges_files: None, special_tokens_dict: {}, and use_fast: False


Downloading config.json:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[NeMo W 2025-06-13 17:25:46 modelPT:244] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.


[NeMo I 2025-06-13 17:25:46 qa_processing:106] mean no. of chars in doc: 839.2727272727273
[NeMo I 2025-06-13 17:25:46 qa_processing:107] max no. of chars in doc: 1895
[NeMo I 2025-06-13 17:25:46 qa_processing:106] mean no. of chars in doc: 677.5487804878048
[NeMo I 2025-06-13 17:25:46 qa_processing:107] max no. of chars in doc: 1782
[NeMo I 2025-06-13 17:25:46 qa_processing:106] mean no. of chars in doc: 828.0972222222222
[NeMo I 2025-06-13 17:25:46 qa_processing:107] max no. of chars in doc: 2132
[NeMo I 2025-06-13 17:25:46 qa_processing:106] mean no. of chars in doc: 540.0
[NeMo I 2025-06-13 17:25:46 qa_processing:107] max no. of chars in doc: 1423
[NeMo I 2025-06-13 17:25:46 qa_processing:106] mean no. of chars in doc: 756.71875
[NeMo I 2025-06-13 17:25:46 qa_processing:107] max no. of chars in doc: 1747
[NeMo I 2025-06-13 17:25:46 qa_processing:106] mean no. of chars in doc: 732.4418604651163
[NeMo I 2025-06-13 17:25:46 qa_processing:107] max no. of chars in doc: 3076
[NeMo I 2025

100%|██████████| 5000/5000 [00:11<00:00, 430.64it/s]

[NeMo I 2025-06-13 17:25:59 qa_s2s_dataset:73] Converting dict features into object features



100%|██████████| 5026/5026 [00:00<00:00, 666073.87it/s]

[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 649.4358974358975
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1765
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 571.625
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1404
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 491.79487179487177
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1145
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 694.5454545454545
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1127
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 668.76
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1096
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 789.7727272727273
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1466
[NeMo I 2025




[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 768.4871794871794
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1268
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 708.0512820512821
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1166
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 917.2564102564103
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1992
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 776.0816326530612
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1643
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 788.2173913043479
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in doc: 1364
[NeMo I 2025-06-13 17:25:59 qa_processing:106] mean no. of chars in doc: 848.4090909090909
[NeMo I 2025-06-13 17:25:59 qa_processing:107] max no. of chars in do

100%|██████████| 1000/1000 [00:01<00:00, 517.41it/s]

[NeMo I 2025-06-13 17:26:01 qa_s2s_dataset:73] Converting dict features into object features



100%|██████████| 1000/1000 [00:00<00:00, 651592.98it/s]


[NeMo I 2025-06-13 17:26:01 qa_processing:106] mean no. of chars in doc: 649.4358974358975
[NeMo I 2025-06-13 17:26:01 qa_processing:107] max no. of chars in doc: 1765
[NeMo I 2025-06-13 17:26:01 qa_processing:106] mean no. of chars in doc: 571.625
[NeMo I 2025-06-13 17:26:01 qa_processing:107] max no. of chars in doc: 1404
[NeMo I 2025-06-13 17:26:01 qa_processing:106] mean no. of chars in doc: 491.79487179487177
[NeMo I 2025-06-13 17:26:01 qa_processing:107] max no. of chars in doc: 1145
[NeMo I 2025-06-13 17:26:01 qa_processing:106] mean no. of chars in doc: 694.5454545454545
[NeMo I 2025-06-13 17:26:01 qa_processing:107] max no. of chars in doc: 1127
[NeMo I 2025-06-13 17:26:01 qa_processing:106] mean no. of chars in doc: 668.76
[NeMo I 2025-06-13 17:26:01 qa_processing:107] max no. of chars in doc: 1096
[NeMo I 2025-06-13 17:26:01 qa_processing:106] mean no. of chars in doc: 789.7727272727273
[NeMo I 2025-06-13 17:26:01 qa_processing:107] max no. of chars in doc: 1466
[NeMo I 2025

100%|██████████| 100/100 [00:00<00:00, 454.22it/s]

[NeMo I 2025-06-13 17:26:02 qa_s2s_dataset:73] Converting dict features into object features



100%|██████████| 100/100 [00:00<00:00, 589087.64it/s]


Downloading model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

### Train, test, and save the model

In [12]:
trainer.fit(model)
trainer.test(model)

model.save_to(config.model.nemo_path)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


[NeMo I 2025-06-13 17:26:05 modelPT:721] Optimizer config = AdamW (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.999]
        capturable: False
        differentiable: False
        eps: 1e-08
        foreach: None
        fused: None
        lr: 5e-05
        maximize: False
        weight_decay: 0.0
    )
[NeMo I 2025-06-13 17:26:05 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.SquareRootAnnealing object at 0x7fa685586890>" 
    will be used during training (effective maximum steps = 629) - 
    Parameters : 
    (warmup_steps: null
    warmup_ratio: 0.0
    last_epoch: -1
    max_steps: 629
    )



  | Name           | Type                         | Params
----------------------------------------------------------------
0 | language_model | BartForConditionalGeneration | 139 M 
----------------------------------------------------------------
139 M     Trainable params
0         Non-trainable params
139 M     Total params
278.841   Total estimated model params size (MB)
      rank_zero_warn(
    
      rank_zero_warn(
    


Training: 0it [00:00, ?it/s]

      rank_zero_warn(
    


Validation: 0it [00:00, ?it/s]

[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val exact: 27.8
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val f1: 34.236530065222034
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val total: 1000.0
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val HasAns_exact: 55.622489959839356
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val HasAns_f1: 68.54724912695187
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val HasAns_total: 498.0
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val NoAns_exact: 0.199203187250996
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val NoAns_f1: 0.199203187250996
[NeMo I 2025-06-13 17:28:20 qa_s2s_model:114] val NoAns_total: 502.0


`Trainer.fit` stopped: `max_epochs=1` reached.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
      rank_zero_warn(
    


Testing: 0it [00:00, ?it/s]

[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test exact: 27.0
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test f1: 33.50952380952381
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test total: 100.0
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test HasAns_exact: 60.0
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test HasAns_f1: 74.46560846560847
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test HasAns_total: 45.0
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test NoAns_exact: 0.0
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test NoAns_f1: 0.0
[NeMo I 2025-06-13 17:28:25 qa_s2s_model:114] test NoAns_total: 55.0


### Load the saved model and run inference

In [13]:
model = S2SQAModel.restore_from(config.model.nemo_path)

eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
model.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

all_preds, all_nbest = model.inference(
    config.model.test_ds.file,
#     output_prediction_file=output_prediction_file,
#     output_nbest_file=output_nbest_file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

[NeMo I 2025-06-13 17:28:27 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: facebook/bart-base, vocab_file: /tmp/tmpdvjqsiea/d4da401495a44816aa8093cc34169e0e_vocab.json, merges_files: None, special_tokens_dict: {}, and use_fast: False


[NeMo W 2025-06-13 17:28:27 modelPT:244] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2025-06-13 17:28:27 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    file: data/squad/v2.0/train-v2.0.json
    batch_size: 8
    shuffle: true
    num_samples: 5000
    num_workers: 1
    drop_last: false
    pin_memory: false
    
[NeMo W 2025-06-13 17:28:27 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    file: data/squad/v2.0/dev-v2.0.json
    batch_size: 8
    shuffle: false
    num_samples: 1000
    num_workers: 1
    drop_last: false
    pin_memory: false


[NeMo I 2025-06-13 17:28:30 save_restore_connector:249] Model S2SQAModel was successfully restored from /dli/task/work_dir/checkpoints/bart_squad_v2_0.nemo.


Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
100%|██████████| 10/10 [00:00<00:00, 335.65it/s]
100%|██████████| 10/10 [00:00<00:00, 135737.99it/s]


France
10th and 11th centuries
Denmark, Iceland and Norway
Rollo
10th and 11th centuries
Normans
Normans
Rollo
first half of the 10th century
William the Conqueror


## GPT2 model for SQuAD v2.0

### Exercise # 1 - Set model config values

* Modify the `<FIXME>` to use the `gpt2` pre-trained model and tokenizer. 

In [14]:
# set language model and tokenizer to be used
# tokenizer is derived from model if a tokenizer name is not provided
config.model.language_model.pretrained_model_name = "gpt2"
config.model.tokenizer.tokenizer_name = "gpt2"

# path where model will be saved
config.model.nemo_path = f"{WORK_DIR}/checkpoints/gpt2_squad_v2_0.nemo"

config.exp_manager.create_checkpoint_callback = True

config.model.optim.lr = 1e-4

click ... to show solution. 

### Create trainer and initialize model

In [15]:
trainer = pl.Trainer(**config.trainer)
model = GPTQAModel(config.model, trainer=trainer)

Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..


[NeMo I 2025-06-13 17:28:31 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: gpt2, vocab_file: /root/.cache/huggingface/nemo_nlp_tmp/4fc9f399a5e5c3f1466f391ab2dbd82a/vocab.json, merges_files: None, special_tokens_dict: {}, and use_fast: False


Downloading tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
[NeMo W 2025-06-13 17:28:32 modelPT:244] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.


[NeMo I 2025-06-13 17:28:32 qa_processing:106] mean no. of chars in doc: 839.2727272727273
[NeMo I 2025-06-13 17:28:32 qa_processing:107] max no. of chars in doc: 1895
[NeMo I 2025-06-13 17:28:32 qa_processing:106] mean no. of chars in doc: 677.5487804878048
[NeMo I 2025-06-13 17:28:32 qa_processing:107] max no. of chars in doc: 1782
[NeMo I 2025-06-13 17:28:32 qa_processing:106] mean no. of chars in doc: 828.0972222222222
[NeMo I 2025-06-13 17:28:32 qa_processing:107] max no. of chars in doc: 2132
[NeMo I 2025-06-13 17:28:32 qa_processing:106] mean no. of chars in doc: 540.0
[NeMo I 2025-06-13 17:28:32 qa_processing:107] max no. of chars in doc: 1423
[NeMo I 2025-06-13 17:28:32 qa_processing:106] mean no. of chars in doc: 756.71875
[NeMo I 2025-06-13 17:28:32 qa_processing:107] max no. of chars in doc: 1747
[NeMo I 2025-06-13 17:28:32 qa_processing:106] mean no. of chars in doc: 732.4418604651163
[NeMo I 2025-06-13 17:28:32 qa_processing:107] max no. of chars in doc: 3076
[NeMo I 2025

100%|██████████| 5000/5000 [00:20<00:00, 242.27it/s]

[NeMo I 2025-06-13 17:28:54 qa_gpt_dataset:74] Converting dict features into object features



100%|██████████| 5026/5026 [00:00<00:00, 617963.00it/s]

[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 649.4358974358975
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1765
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 571.625
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1404
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 491.79487179487177
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1145
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 694.5454545454545
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1127
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 668.76
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1096
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 789.7727272727273
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1466
[NeMo I 2025




[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 841.7954545454545
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 2077
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 862.4594594594595
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1882
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 968.6808510638298
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 2024
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 835.0612244897959
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1377
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 793.9166666666666
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in doc: 1583
[NeMo I 2025-06-13 17:28:54 qa_processing:106] mean no. of chars in doc: 729.7741935483871
[NeMo I 2025-06-13 17:28:54 qa_processing:107] max no. of chars in do

100%|██████████| 1000/1000 [00:03<00:00, 271.15it/s]

[NeMo I 2025-06-13 17:28:58 qa_gpt_dataset:74] Converting dict features into object features



100%|██████████| 1000/1000 [00:00<00:00, 612396.55it/s]

[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 649.4358974358975
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1765
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 571.625
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1404
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 491.79487179487177
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1145
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 694.5454545454545
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1127
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 668.76
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1096
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 789.7727272727273
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1466
[NeMo I 2025




[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 3145
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 854.3913043478261
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1629
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 789.88
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1341
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 788.2692307692307
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 2078
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 873.6538461538462
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1463
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 726.7727272727273
[NeMo I 2025-06-13 17:28:58 qa_processing:107] max no. of chars in doc: 1042
[NeMo I 2025-06-13 17:28:58 qa_processing:106] mean no. of chars in doc: 763.5384615384615
[Ne

100%|██████████| 100/100 [00:00<00:00, 252.74it/s]

[NeMo I 2025-06-13 17:28:58 qa_gpt_dataset:74] Converting dict features into object features



100%|██████████| 100/100 [00:00<00:00, 509017.48it/s]


Downloading model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

### Exercise # 2 - Train, test, and save the model

* Modify the `<FIXME>` to train, test, and save the model. 

In [16]:
trainer.fit(model)
trainer.test(model)

model.save_to(config.model.nemo_path)

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


[NeMo I 2025-06-13 17:29:01 modelPT:721] Optimizer config = AdamW (
    Parameter Group 0
        amsgrad: False
        betas: [0.9, 0.999]
        capturable: False
        differentiable: False
        eps: 1e-08
        foreach: None
        fused: None
        lr: 0.0001
        maximize: False
        weight_decay: 0.0
    )
[NeMo I 2025-06-13 17:29:01 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.SquareRootAnnealing object at 0x7fa630e2cdf0>" 
    will be used during training (effective maximum steps = 629) - 
    Parameters : 
    (warmup_steps: null
    warmup_ratio: 0.0
    last_epoch: -1
    max_steps: 629
    )



  | Name           | Type            | Params
---------------------------------------------------
0 | language_model | GPT2LMHeadModel | 124 M 
---------------------------------------------------
124 M     Trainable params
0         Non-trainable params
124 M     Total params
248.892   Total estimated model params size (MB)


Training: 0it [00:00, ?it/s]

    


Validation: 0it [00:00, ?it/s]

[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val exact: 11.7
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val f1: 14.55864080364082
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val total: 1000.0
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val HasAns_exact: 1.2048192771084338
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val HasAns_f1: 6.945061854700406
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val HasAns_total: 498.0
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val NoAns_exact: 22.111553784860558
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val NoAns_f1: 22.111553784860558
[NeMo I 2025-06-13 17:37:22 qa_gpt_model:96] val NoAns_total: 502.0


`Trainer.fit` stopped: `max_epochs=1` reached.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: 0it [00:00, ?it/s]

[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test exact: 21.0
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test f1: 21.944444444444443
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test total: 100.0
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test HasAns_exact: 0.0
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test HasAns_f1: 2.0987654320987654
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test HasAns_total: 45.0
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test NoAns_exact: 38.18181818181818
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test NoAns_f1: 38.18181818181818
[NeMo I 2025-06-13 17:37:54 qa_gpt_model:96] test NoAns_total: 55.0


click ... to show solution. 

### Exercise # 3 - Load the saved model and run inference

* Modify the `<FIXME>` to run inference from a saved model. 

In [17]:
model = GPTQAModel.restore_from(config.model.nemo_path)

eval_device = [config.trainer.devices[0]] if isinstance(config.trainer.devices, list) else 1
model.trainer = pl.Trainer(
    devices=eval_device,
    accelerator=config.trainer.accelerator,
    precision=16,
    logger=False,
)

all_preds, all_nbest = model.inference(
    config.model.test_ds.file,
    num_samples=10, # setting to -1 will use all samples for inference
)

for question_id in all_preds:
    print(all_preds[question_id])

[NeMo I 2025-06-13 17:37:56 tokenizer_utils:130] Getting HuggingFace AutoTokenizer with pretrained_model_name: gpt2, vocab_file: /tmp/tmp9fhaqkbq/58b8a27368a64677b322bfefbeed8bdf_vocab.json, merges_files: None, special_tokens_dict: {}, and use_fast: False


Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
[NeMo W 2025-06-13 17:37:56 modelPT:244] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2025-06-13 17:37:56 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    file: data/squad/v2.0/train-v2.0.json
    batch_size: 8
    shuffle: true
    num_samples: 5000
    num_workers: 1
    drop_last: false
    pin_memory: false
    
[NeMo W 2025-06-13 17:37:56 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 


[NeMo I 2025-06-13 17:37:58 save_restore_connector:249] Model GPTQAModel was successfully restored from /dli/task/work_dir/checkpoints/gpt2_squad_v2_0.nemo.


Using 16bit None Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
100%|██████████| 10/10 [00:00<00:00, 214.36it/s]
100%|██████████| 10/10 [00:00<00:00, 160087.94it/s]


the 10% century
the 10 million
the 10% century
the 10% century
the of the 10 million century
the 10% century
the 10 million
the native Frankish and Roman-Gaulish populations
the native Frankish and Roman-Gaulish populations



click ... to show solution. 

![DLI Header](images/DLI_Header.png)