# Finetuning ALBERT model on SQUAD Dataset

Stanford Question Answering Dataset (SQUAD) is the most common dataset on which all the state of the art question answering models are tested. Here I am testing the pretrained ALBERT (https://arxiv.org/pdf/1909.11942.pdf) a variant of BERT which outperformed BERT on various datasets like SQUAD, RACE and GLUE. This model is going to be the baseline and will try techniques like Ensembling or test the result of ELECTRA with RETRO READER. Will discuss more about it in end of the notebook.

> Here I am using pretrained ALBERT model from the transformers library just to get started and test the accuracy of the model



In [None]:
#Installing necessary
!pip install transformers
!pip install tensorboardX

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/3a/83/e74092e7f24a08d751aa59b37a9fc572b2e4af3918cb66f7766c3affb1b4/transformers-3.5.1-py3-none-any.whl (1.3MB)
[K     |████████████████████████████████| 1.3MB 8.4MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 15.1MB/s 
Collecting tokenizers==0.9.3
[?25l  Downloading https://files.pythonhosted.org/packages/4c/34/b39eb9994bc3c999270b69c9eea40ecc6f0e97991dba28282b9fd32d44ee/tokenizers-0.9.3-cp36-cp36m-manylinux1_x86_64.whl (2.9MB)
[K     |████████████████████████████████| 2.9MB 44.3MB/s 
[?25hCollecting sentencepiece==0.1.91
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K  

In [None]:
#checking GPU Details
!nvidia-smi

Sun Nov 29 09:20:41 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
#Downloading SQUAD dataset into the notebook
!mkdir dataset \
&& cd dataset \
&& wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json \
&& wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json

--2020-11-29 09:21:14--  https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 42123633 (40M) [application/json]
Saving to: ‘train-v2.0.json’


2020-11-29 09:21:16 (98.2 MB/s) - ‘train-v2.0.json’ saved [42123633/42123633]

--2020-11-29 09:21:16--  https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json
Resolving rajpurkar.github.io (rajpurkar.github.io)... 185.199.108.153, 185.199.109.153, 185.199.110.153, ...
Connecting to rajpurkar.github.io (rajpurkar.github.io)|185.199.108.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4370528 (4.2M) [application/json]
Saving to: ‘dev-v2.0.json’


2020-11-29 09:21:17 (38.0 MB/s) - ‘dev-v2.0.json’ saved [4370528/4370528]



## FineTuneing ALBERT Model on the SQUAD Dataset 

In [None]:
!export SQUAD_DIR=/content/dataset \
&& python transformers/examples/run_squad.py \
  --model_type albert \
  --model_name_or_path albert-base-v2 \
  --do_train \
  --do_eval \
  --do_lower_case \
  --train_file $SQUAD_DIR/train-v2.0.json \
  --predict_file $SQUAD_DIR/dev-v2.0.json \
  --per_gpu_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /content/model_output \
  --save_steps 1000 \
  --threads 4 \
  --version_2_with_negative

# Testing The pretrained model/trained model on the example test case


In [None]:
import os
import torch
import time
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

from transformers import(
    AlbertConfig,
    AlbertForQuestionAnswering,
    AlbertTokenizer,
    squad_convert_examples_to_features
)

from transformers.data.processors.squad import SquadResult, SquadV2Processor, SquadExample
from transformers.data.metrics.squad_metrics import compute_predictions_logits

In [None]:
model_name = "ktrapeznikov/albert-xlarge-v2-squad-v2" # Pretrained SQUAD Model from HuggingFace Library
output_dir = ""

n_best_size = 1 #picking top generated answer
max_answer_length = 30 #setting the maximum length of the answer
do_lower_case = True #generated text will be set to lowercase to reduce dictionary size
null_score_diff_threshold = 0.0

In [None]:
def to_list(tensor): #converting tensor to list
  return tensor.detach().cpu().tolist()

In [None]:
config_class, model_class, tokenizer_class = (
    AlbertConfig, AlbertForQuestionAnswering, AlbertTokenizer)
config = config_class.from_pretrained(
    model_name, do_lower_case=True
)
tokenizer = tokenizer_class.from_pretrained(
    model_name, do_lower_case=True)
model = model_class.from_pretrained(model_name)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=717.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=760289.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2.0, style=ProgressStyle(description_wi…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=156.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=58.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=234922444.0, style=ProgressStyle(descri…




In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
processor = SquadV2Processor()

In [None]:
#Function to run prediction on the basis of the question and the generated context
def run_prediction(question_texts, context_text): 
  examples=[]
  for i, question_text in enumerate(question_texts):
    example = SquadExample(
        qas_id=str(i),
        question_text=question_text,
        context_text = context_text,
        answer_text = None,
        start_position_character=None,
        title = "Predict",
        is_impossible=False,
        answers=None
    )

    examples.append(example)

  features, dataset = squad_convert_examples_to_features(
      examples = examples,
      tokenizer = tokenizer,
      max_seq_length = 384,
      doc_stride=128,
      max_query_length=64,
      is_training=False,
      return_dataset="pt",
      threads=1,
  )
  #converting a text to batch
  eval_sampler = SequentialSampler(dataset) 
  eval_dataloader = DataLoader(dataset, sampler=eval_sampler, batch_size=10)

  all_results=[]

  for batch in eval_dataloader:
    model.eval()
    batch = tuple(t.to(device) for t in batch)

    with torch.no_grad():
      inputs = {
          "input_ids":batch[0],
          "attention_mask": batch[1],
          "token_type_ids": batch[2],
      }
      example_indices = batch[3]
      outputs = model(**inputs)

      for i, example_index in enumerate(example_indices):
        eval_feature = features[example_index.item()]
        unique_id = int(eval_feature.unique_id)

        output = [to_list(output[i]) for output in outputs]

        start_logits, end_logits = output
        result = SquadResult(unique_id, start_logits, end_logits)
        all_results.append(result)
    
    output_prediction_file = "predictions.json"
    output_nbest_file = "nbest_predictions.json"
    output_null_log_odds_file = "null_predictions.json"
    predictions = compute_predictions_logits(
        examples,
        features,
        all_results,
        n_best_size,
        max_answer_length,
        do_lower_case,
        output_prediction_file,
        output_nbest_file,
        output_null_log_odds_file,
        False,  # verbose_logging
        True,  # version_2_with_negative
        null_score_diff_threshold,
        tokenizer,
    )
    return predictions



In [None]:
context = "New Zealand (Māori: Aotearoa) is a sovereign island country in the southwestern Pacific Ocean. It has a total land area of 268,000 square kilometres (103,500 sq mi), and a population of 4.9 million. New Zealand's capital city is Wellington, and its most populous city is Auckland."
questions = ["How many people live in New Zealand?", 
             "What's the capital of New Zealand?",
             "What is total land area of New Zealand"]

# Run method
predictions = run_prediction(questions, context)

# Print results
for key in predictions.keys():
  print(predictions[key])


convert squad examples to features: 100%|██████████| 3/3 [00:00<00:00, 179.01it/s]
add example index and unique id: 100%|██████████| 3/3 [00:00<00:00, 12041.06it/s]


4.9 million.
Wellington
268,000 square kilometres (103,500 sq mi),


Here we trained and tested the Finetuned/Pretrained ALBERT model. The next step wuld be to implement and explore ELECTRA model https://arxiv.org/abs/2003.10555 and ensemble of Retro-Reader(https://arxiv.org/pdf/2001.09694v3.pdf) with ELECTRA and ALBERT(https://arxiv.org/pdf/1909.11942.pdf)