# **Fine-Tuning T5 on Squad v1.1**
In this notebook, we will carry out a part of the following badges:

1. Experiment with another model
    * Finetune a T5-base model on squad v1.1 to visualize it in the next step


## **1. Setup Libaries**
The finetuning is inspired by the following notebook [(can be seen here)](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) and was modified and fixed, because it didnt worked from scratch anymore.

First of all we setup all the needed libs and the directory of our github repo on google colab.

In [20]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [10]:
%cd /content
%cd gdrive/MyDrive/git_projects/How-Does-Bert-Answer-QA-DLP2021/src

/content
/content/gdrive/MyDrive/git_projects/How-Does-Bert-Answer-QA-DLP2021/src


In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
# fix some issues with torch 
import os 
os.environ['LD_LIBRARY_PATH']='/usr/local/lib'

!echo $LD_LIBRARY_PATH
!sudo ln -s /usr/local/lib/libmkl_intel_lp64.so /usr/local/lib/libmkl_intel_lp64.so.1
!sudo ln -s /usr/local/lib/libmkl_intel_thread.so /usr/local/lib/libmkl_intel_thread.so.1
!sudo ln -s /usr/local/lib/libmkl_core.so /usr/local/lib/libmkl_core.so.1

!ldconfig
!ldd /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so

/usr/local/lib
ln: failed to create symbolic link '/usr/local/lib/libmkl_intel_lp64.so.1': File exists
ln: failed to create symbolic link '/usr/local/lib/libmkl_intel_thread.so.1': File exists
ln: failed to create symbolic link '/usr/local/lib/libmkl_core.so.1': File exists
/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link

	linux-vdso.so.1 (0x00007ffe9a5eb000)
	/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4 (0x00007f544c235000)
	libtorch_cpu.so => /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cpu.so (0x00007f5438ea1000)
	libtorch_cuda.so => /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cuda.so (0x00007f53f699d000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f53f6785000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f53f6394000)
	libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f53f6179000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f53f5f5

In [None]:
# load xla pytorch lib for tpu 
VERSION = "nightly" 
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version $VERSION

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5116  100  5116    0     0  17641      0 --:--:-- --:--:-- --:--:-- 17641
Updating... This may take around 2 minutes.
Updating TPU runtime to pytorch-nightly ...
Collecting cloud-tpu-client
  Downloading https://files.pythonhosted.org/packages/56/9f/7b1958c2886db06feb5de5b2c191096f9e619914b6c31fdf93999fdbbd8b/cloud_tpu_client-0.10-py3-none-any.whl
Collecting google-api-python-client==1.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/9a/b4/a955f393b838bc47cbb6ae4643b9d0f90333d3b4db4dc1e819f36aad18cc/google_api_python_client-1.8.0-py3-none-any.whl (57kB)
[K     |████████████████████████████████| 61kB 3.3MB/s 
Uninstalling torch-1.9.0+cu102:
[31mERROR: earthengine-api 0.1.269 has requirement google-api-python-client<2,>=1.12.1, but you'll have google-api-python-client 1.8.0 which is incompatible.[0m
Installing

## **2. Get the dataset**
We used the nlp/dataset lib to get the squad v1.1 dataset. With some helper methods we prepared the Q&A samples for the T5 input. We had to cut down the validation dataset since, the evaluation didnt worked with the trainer and therefore it had to run on the cpu.

In [4]:
# process the examples in input and target text format and the eos token at the end 
def add_eos_to_examples(example):
    example['input_text'] = 'question: %s  context: %s </s>' % (example['question'], example['context'])
    example['target_text'] = '%s </s>' % example['answers']['text'][0]
    return example

# tokenize the examples
def convert_to_features(example_batch):
    input_encodings = tokenizer.batch_encode_plus(example_batch['input_text'], pad_to_max_length=True, max_length=512)
    target_encodings = tokenizer.batch_encode_plus(example_batch['target_text'], pad_to_max_length=True, max_length=16)

    encodings = {
        'input_ids': input_encodings['input_ids'], 
        'attention_mask': input_encodings['attention_mask'],
        'decoder_input_ids': target_encodings['input_ids'], # changed from target
        'decoder_attention_mask': target_encodings['attention_mask'] # changed from target
    }

    return encodings


In [5]:
!pip install nlp
!pip install transformers 
!pip install numpy --upgrade # restart notebook after running this cell first time and then run all cells except the night pytorch xla one
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from visualization.pretrained_model_loader import QAModel, ModelType
from visualization.visualizer import Visualizer
import torch
import nlp # try this further with datasets lib 
#from utils_T5 import add_eos_to_examples, convert_to_features

tokenizer = AutoTokenizer.from_pretrained("t5-base")

# load train and validation split of squad
train_dataset  = nlp.load_dataset('squad', split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('squad', split=nlp.Split.VALIDATION)
print(type(train_dataset))
# map add_eos_to_examples function to the dataset example wise and map convert_to_features batch wise
train_dataset = train_dataset.map(add_eos_to_examples)
train_dataset = train_dataset.map(convert_to_features, batched=True)

valid_dataset = valid_dataset.map(add_eos_to_examples, load_from_cache_file=False)
valid_dataset = valid_dataset.map(convert_to_features, batched=True, load_from_cache_file=False)

valid_dataset = valid_dataset.shuffle().select([i for i in range(0, 2000)]) # make the valid dataset smaller so it doesent need like 20 hours to evaluate

# set the tensor type and the columns which the dataset should return
columns = ['input_ids', 'decoder_input_ids', 'attention_mask', 'decoder_attention_mask']  # changed from target to decoder
train_dataset.set_format(type='torch', columns=columns)
valid_dataset.set_format(type='torch', columns=columns)
torch.save(train_dataset, 'train_data.pt')
torch.save(valid_dataset, 'valid_data.pt')

Requirement already up-to-date: numpy in /usr/local/lib/python3.7/dist-packages (1.21.0)


2021-07-08 18:33:26,589 PyTorch version 1.9.0+cu102 available.
2021-07-08 18:33:28,554 TensorFlow version 2.5.0 available.
2021-07-08 18:33:28,955 Checking /root/.cache/huggingface/datasets/09ec6948d9db29db9a2dcd08df97ac45bccfa6aa104ea62d73c97fa4aaa5cd6c.8fee6e3d53a4d9e5483442c8ba26e06e4ef70eaca60ac7bebc8429fc64a5e86a.py for additional imports.
2021-07-08 18:33:28,958 Lock 139739794598224 acquired on /root/.cache/huggingface/datasets/09ec6948d9db29db9a2dcd08df97ac45bccfa6aa104ea62d73c97fa4aaa5cd6c.8fee6e3d53a4d9e5483442c8ba26e06e4ef70eaca60ac7bebc8429fc64a5e86a.py.lock
2021-07-08 18:33:28,963 Found main folder for dataset https://s3.amazonaws.com/datasets.huggingface.co/nlp/datasets/squad/squad.py at /usr/local/lib/python3.7/dist-packages/nlp/datasets/squad
2021-07-08 18:33:28,966 Found specific version folder for dataset https://s3.amazonaws.com/datasets.huggingface.co/nlp/datasets/squad/squad.py at /usr/local/lib/python3.7/dist-packages/nlp/datasets/squad/408a8fa46a1e2805445b793f1022

<class 'nlp.arrow_dataset.Dataset'>


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
2021-07-08 18:33:34,117 Loading cached processed dataset at /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/cache-8b3378e1a5dcc35bb9860d3c3002c625.arrow
2021-07-08 18:33:34,125 Caching processed dataset at /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/cache-b124c9a8a54c728a1f4c7bb0f80f3935.arrow


HBox(children=(FloatProgress(value=0.0, max=10570.0), HTML(value='')))

2021-07-08 18:33:35,087 Done writing 10570 examples in 20111952 bytes /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/tmp050sj5e9.
2021-07-08 18:33:35,097 Caching processed dataset at /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/cache-ce6091119bb3776353b36650029be9ad.arrow





HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

2021-07-08 18:33:52,686 Done writing 10570 examples in 109576608 bytes /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/tmp1v_dypij.





2021-07-08 18:33:53,312 Caching processed dataset at /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/cache-6db2bfe4768e3a22b30f51e770900a0e.arrow


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))

2021-07-08 18:34:10,105 Done writing 10570 examples in 109576608 bytes /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/tmphsqr447u.
2021-07-08 18:34:10,123 Caching processed dataset at /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/cache-72c6825000f7fcf3605f94558c873f6b.arrow





HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))

2021-07-08 18:34:13,314 Done writing 2000 examples in 20745719 bytes /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/tmpbpukry_q.
2021-07-08 18:34:13,338 Set __getitem__(key) output type to torch for ['input_ids', 'decoder_input_ids', 'attention_mask', 'decoder_attention_mask'] columns  (when key is int or slice) and don't output other (un-formated) columns.
2021-07-08 18:34:13,340 Set __getitem__(key) output type to torch for ['input_ids', 'decoder_input_ids', 'attention_mask', 'decoder_attention_mask'] columns  (when key is int or slice) and don't output other (un-formated) columns.





## **3. Train T5 on Squad v1.1**
We used the trainer module and build a jump point for the tpu to start training.
The training itself took around 6 hours on the full dataset

In [None]:
import logging
import os
import sys
import torch

from transformers import T5ForConditionalGeneration, AutoTokenizer
from transformers import (
    HfArgumentParser,
    Trainer,
    TrainingArguments,
    set_seed,
)

from utils_T5 import prepare_batch, ModelArguments, DataTrainingArguments


logger = logging.getLogger(__name__)
def main():
    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath('args.json'))

    if (
        os.path.exists(training_args.output_dir)
        and os.listdir(training_args.output_dir)
        and training_args.do_train
        and not training_args.overwrite_output_dir
    ):
        raise ValueError(
            f"Output directory ({training_args.output_dir}) already exists and is not empty. Use --overwrite_output_dir to overcome."
        )

    logger.info("Training/evaluation parameters %s", training_args)

    set_seed(training_args.seed)

    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
    )
    model = T5ForConditionalGeneration.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
    )

    train_dataset = torch.load(data_args.train_file_path)
    valid_dataset = torch.load(data_args.valid_file_path)

    # Initialize Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=valid_dataset,
        data_collator=prepare_batch
    )
    # Training
    if training_args.do_train:
        trainer.train(
            model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
        )
        trainer.save_model()
        if trainer.is_world_process_zero():
            tokenizer.save_pretrained(training_args.output_dir)


def _mp_fn(index):
    main()

In [None]:
import json 

args_dict = {
  "num_cores": 8,
  'training_script': 'train_t5_squad.py',
  "model_name_or_path": 't5-base',
  "max_len": 512 ,
  "target_max_len": 16,
  "output_dir": './models/out',
  "overwrite_output_dir": True,
  "per_gpu_train_batch_size": 8,
  "per_gpu_eval_batch_size": 8,
  "gradient_accumulation_steps": 4,
  "learning_rate": 1e-4,
  "tpu_num_cores": 8,
  "num_train_epochs": 4,
  "do_train": True
}

with open('args.json', 'w') as f:
  json.dump(args_dict, f)

In [None]:
import torch_xla.distributed.xla_multiprocessing as xmp
xmp.spawn(_mp_fn, args=(), nprocs=1, start_method='fork')

2021-06-25 10:48:19,679 Training/evaluation parameters TrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=500,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=4,
greater_is_better=None,
group_by_length=False,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=./models/tpu/runs/Jun25_10-48-08_8f1797ebc493,
logging_first_step=False,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691430.0, style=ProgressStyle(descri…

2021-06-25 10:48:50,343 Lock 139738893090384 released on /root/.cache/huggingface/transformers/ab4e948915b067f5cb6e5105f6f85044fd717b133f43240db67899a8fc7b29a2.26934c75adf19ceac3c268b721ba353356b7609c45f5627550326f275a2163b4.lock





Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.
Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.
***** Running training *****
  Num examples = 87599
  Num Epochs = 4
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 4
  Total optimization steps = 10948
Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future version. Using `--per_device_train_batch_size` is preferred.
Using deprecated `--per_gpu_eval_batch_size` argument which will be removed in a future version. Using `--per_device_eval_batch_size` is preferred.


Step,Training Loss
500,0.2552
1000,0.2421
1500,0.2347
2000,0.2311
2500,0.2263
3000,0.1945
3500,0.1717
4000,0.1697
4500,0.1722
5000,0.1689


Saving model checkpoint to ./models/tpu/checkpoint-500
Configuration saved in ./models/tpu/checkpoint-500/config.json
Model weights saved in ./models/tpu/checkpoint-500/pytorch_model.bin
Saving model checkpoint to ./models/tpu/checkpoint-1000
Configuration saved in ./models/tpu/checkpoint-1000/config.json
Model weights saved in ./models/tpu/checkpoint-1000/pytorch_model.bin
Saving model checkpoint to ./models/tpu/checkpoint-1500
Configuration saved in ./models/tpu/checkpoint-1500/config.json
Model weights saved in ./models/tpu/checkpoint-1500/pytorch_model.bin
Saving model checkpoint to ./models/tpu/checkpoint-2000
Configuration saved in ./models/tpu/checkpoint-2000/config.json
Model weights saved in ./models/tpu/checkpoint-2000/pytorch_model.bin
Saving model checkpoint to ./models/tpu/checkpoint-2500
Configuration saved in ./models/tpu/checkpoint-2500/config.json
Model weights saved in ./models/tpu/checkpoint-2500/pytorch_model.bin
Saving model checkpoint to ./models/tpu/checkpoint-30

## **4. Evaluate T5 Model**
The last step was to evaluate the new model against the T5-base model. First we tried to do it with the trainer module, which didnt worked well duo to some weird error messages (Maybe fix this later).
After some hours of trying we decided to use the evaluation methods from the original notebook and they worked after some modifications.

In [None]:
# Dont work since the validation set is too big, so i had to make it smaller (Could be also done some cells above). 12 gb ram are enough for ~180 batches
!pip install datasets
from datasets import load_metric 
import os
import numpy 
import torch
import logging
from transformers import T5ForConditionalGeneration, Trainer, HfArgumentParser, TrainingArguments
from utils_T5 import prepare_batch, ModelArguments, DataTrainingArguments
import numpy as np
# Evaluate on validaton set 
logger = logging.getLogger(__name__)
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath('args.json'))


valid_dataset = torch.load("valid_data.pt")

valid_dataset = valid_dataset.select([i for i in range(0, 40)])

columns = ['input_ids', 'decoder_input_ids', 'attention_mask', 'decoder_attention_mask']
valid_dataset.set_format(type='torch', columns=columns)

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    print(len(eval_pred))
    logits, labels = eval_pred
    print(logits[0].shape, logits[1].shape, labels.shape)
    predictions = np.argmax(logits[0], axis=-1)
    print(predictions.shape)
    return metric.compute(predictions=predictions, references=labels)

model = T5ForConditionalGeneration.from_pretrained(
        "models/tpu/T5_fine_tuned_squad_1",
        cache_dir="/cache",
)

"""
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)

eval_dataloader = torch.utils.data.DataLoader(valid_dataset, batch_size=8)

model.eval()
for batch in eval_dataloader:
    batch = {k: v.to(device) for k, v in batch.items()}
    print(batch)
    with torch.no_grad():
        outputs = model(**batch)

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)
    metric.add_batch(predictions=predictions, references=batch["labels"])

metric.compute()
"""

"""
trainer = Trainer(
      model=model,
      eval_dataset=valid_dataset,
      data_collator=prepare_batch,
      compute_metrics=compute_metrics
  )

logger.info("*** Evaluate ***")
trainer.evaluate()
"""
"""
output_eval_file = os.path.join("/", "eval_results.txt")
with open(output_eval_file, "w") as writer:
    logger.info("***** Eval results *****")
    for key in sorted(eval_output.keys()):
        logger.info("  %s = %s", key, str(eval_output[key]))
        writer.write("%s = %s\n" % (key, str(eval_output[key])))
"""




2021-06-25 23:07:13,975 Loading cached selected dataset at /root/.cache/huggingface/datasets/squad/plain_text/1.0.0/408a8fa46a1e2805445b793f1022e743428ca739a34809fce872f0c7f17b44ab/cache-64a5a5b29717a3e1ac2bd9d5805f5513.arrow
2021-06-25 23:07:13,981 Set __getitem__(key) output type to torch for no columns  (when key is int or slice) and don't output other (un-formated) columns.


'\noutput_eval_file = os.path.join("/", "eval_results.txt")\nwith open(output_eval_file, "w") as writer:\n    logger.info("***** Eval results *****")\n    for key in sorted(eval_output.keys()):\n        logger.info("  %s = %s", key, str(eval_output[key]))\n        writer.write("%s = %s\n" % (key, str(eval_output[key])))\n'

In [14]:
import torch

import nlp
from transformers import T5ForConditionalGeneration, AutoTokenizer

from tqdm.auto import tqdm
from utils_T5 import normalize_answer, f1_score, exact_match_score, evaluate

model = T5ForConditionalGeneration.from_pretrained(
        "../models/t5_finetuned_squad_v1",
        cache_dir="/cache",
)
model_2 = T5ForConditionalGeneration.from_pretrained(
        "t5-base",
        cache_dir="/cache",
)
tokenizer = AutoTokenizer.from_pretrained('../models/t5_finetuned_squad_v1')
valid_dataset = torch.load("valid_data.pt")
#valid_dataset = valid_dataset.select([i for i in range(0, 2)]) # test this or else test on full dataset
#print(valid_dataset)
dataloader = torch.utils.data.DataLoader(valid_dataset, batch_size=32)

2021-07-08 18:41:17,381 Lock 139739717391504 acquired on /cache/91e9fe874e06c44883b535d6c950b8b89d6eaa3298d8e7fb3b2c78039e9f8b7b.66b9637a52aa11e9285cdd6e668cc0df14b3bcf0b6674cf3ba5353c542649637.lock


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…

2021-07-08 18:41:17,442 Lock 139739717391504 released on /cache/91e9fe874e06c44883b535d6c950b8b89d6eaa3298d8e7fb3b2c78039e9f8b7b.66b9637a52aa11e9285cdd6e668cc0df14b3bcf0b6674cf3ba5353c542649637.lock
2021-07-08 18:41:17,477 Lock 139739739247056 acquired on /cache/ab4e948915b067f5cb6e5105f6f85044fd717b133f43240db67899a8fc7b29a2.26934c75adf19ceac3c268b721ba353356b7609c45f5627550326f275a2163b4.lock





HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691430.0, style=ProgressStyle(descri…

2021-07-08 18:41:35,418 Lock 139739739247056 released on /cache/ab4e948915b067f5cb6e5105f6f85044fd717b133f43240db67899a8fc7b29a2.26934c75adf19ceac3c268b721ba353356b7609c45f5627550326f275a2163b4.lock





In [15]:
print(len(dataloader.dataset))

2000


In [16]:
answers = []
answers_2 = []
for batch in tqdm(dataloader):
  outs = model.generate(input_ids=batch['input_ids'], 
                        attention_mask=batch['attention_mask'],
                        max_length=16,
                        early_stopping=True)
  outs_2 = model_2.generate(input_ids=batch['input_ids'], 
                        attention_mask=batch['attention_mask'],
                        max_length=16,
                        early_stopping=True)
  outs = [tokenizer.decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=True) for ids in outs]
  answers.extend(outs)
  outs_2 = [tokenizer.decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=True) for ids in outs_2]
  answers_2.extend(outs_2)

HBox(children=(FloatProgress(value=0.0, max=63.0), HTML(value='')))




In [17]:
print(answers)
print(valid_dataset['answers'])

['UNESCO World Heritage Site', 'Robert Koch', 'time and storage', '11 million', 'the kingdom', 'Super Bowl City', 'Thomas Murphy', 'feeder materials', 'Time', '1997', 'Peyton Manning', '6000', "d'Hondt method", 'his mother', 'Cam Newton', 'Satya Nadella', 'lander', 'elevated partial pressures', 'Tehachapi Mountains', 'the public switched data network operated by the Dutch PTT Telecom (now', 'treatment', '$216,000', 'Aaron Spelling', 'about a million base pairs', "Goldbach's conjecture", 'Social Chapter', 'Überseering BV v Nordic Construction GmbH', 'executed', 'the Middle Rhine', 'Central business districts', 'ten to fifteen', 'internal thylakoid system', '11,000 years', 'Philo of Byzantium', 'restaurant', 'French power in North America meant the disappearance of a strong ally', '515 million years ago', 'Manning', 'Major George Washington', 'Court of Justice of the European Union (CJEU)', 'BBC', 'League of Nations', 'oxygen compounds', 'The Dalek race', '108', '1892', 'Safari Rally', '

In [18]:
predictions = []
references = []
for ref, pred in zip(valid_dataset['answers'], answers):
  predictions.append(pred)
  references.append(ref['text'])
predictions[1], references[1]

('Robert Koch', ['Robert Koch', 'Robert Koch', 'Robert Koch'])

In [19]:
evaluate(references, predictions)

{'exact_match': 82.3, 'f1': 90.77458409663038}

In [20]:
predictions = []
references = []
for ref, pred in zip(valid_dataset['answers'], answers_2):
  predictions.append(pred)
  references.append(ref['text'])
predictions[1], references[1]

evaluate(references, predictions)

{'exact_match': 82.1, 'f1': 90.31784719224679}