## Slot filling with BERT

This notebook contains a slot filling task with BERT. The aspects that are going to be predicted are as follows:
- hotel-name
- restaurant-food
- restaurant-name

### NOTE: the fine tuned ALBERT model can be downloaded here: [link](https://drive.google.com/file/d/1-wrk3XXAB3yGN0_gdArV3RSs-42LOiAv/view?usp=sharing)

### Attribution: This process mostly follows the structure of COLX563-lab3.

### Get Started

In [None]:
!pip install pulp
!pip install transformers
!pip install sentencepiece
from google.colab import drive
drive.mount('/content/drive')

Collecting pulp
[?25l  Downloading https://files.pythonhosted.org/packages/14/c4/0eec14a0123209c261de6ff154ef3be5cad3fd557c084f468356662e0585/PuLP-2.4-py3-none-any.whl (40.6MB)
[K     |████████████████████████████████| 40.6MB 1.3MB/s 
[?25hCollecting amply>=0.1.2
  Downloading https://files.pythonhosted.org/packages/f3/c5/dfa09dd2595a2ab2ab4e6fa7bebef9565812722e1980d04b0edce5032066/amply-0.1.4-py3-none-any.whl
Installing collected packages: amply, pulp
Successfully installed amply-0.1.4 pulp-2.4
Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/d8/b2/57495b5309f09fa501866e225c84532d1fd89536ea62406b2181933fb418/transformers-4.5.1-py3-none-any.whl (2.1MB)
[K     |████████████████████████████████| 2.1MB 13.2MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)
[K     |████████

In [None]:
import numpy as np
import pandas as pd
import torch
import pulp
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from tqdm.notebook import tqdm
from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering, AlbertTokenizer, AlbertForQuestionAnswering
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import accuracy_score, f1_score

In [None]:
%pwd

'/content'

In [None]:
%cd drive/MyDrive/563_lab4

/content/drive/.shortcut-targets-by-id/1sOfVI4yu6BGIMMhg-GrEKgkvGnt6q68l/563_lab4


### Load data

In [None]:
QUESTIONS = {"hotel-name":"What is the name of the hotel they are looking for?",
             "restaurant-name":"What is the name of the restaurant they are looking for?",
             "restaurant-food":"What type of food are they looking for?"}

def load_and_preprocess_data(file_path):
  '''given the file path, read the data in pandas dataframe format.
  Then, convert aspects into questions, and melt the dataframe so that each row is a unique pair of question-utterance-answer.
  Also, change all NaN values to an empty string so that BERT can process the data.''' 
  df = pd.read_csv(file_path).reset_index()
  df = df[["index", "utts", "hotel-name", "restaurant-name", "restaurant-food"]]  # we will only use BERT to predict these aspects
  df = df.rename(columns={"hotel-name":QUESTIONS["hotel-name"], "restaurant-name":QUESTIONS["restaurant-name"], 
                          "restaurant-food":QUESTIONS["restaurant-food"]})  # rename the column names into questions
  df = df.melt(id_vars=["index", "utts"], value_vars=[QUESTIONS["hotel-name"], QUESTIONS["restaurant-name"], QUESTIONS["restaurant-food"]],
              var_name="question", value_name="answer"
             ) # melt dataframe so that each row is a unique pair of question-utterance-answer. 

  df = df.replace(np.nan, '', regex=True) # replace all nan with an empty string

  return df

In [None]:
train_df = load_and_preprocess_data("./dioData_train.csv")
dev_df = load_and_preprocess_data("./dioData_dev.csv")
test_df = load_and_preprocess_data("./dioData_test.csv")

In [None]:
train_df.head(5)

Unnamed: 0,index,utts,question,answer
0,0,"Guten Tag, I am staying overnight in Cambridge...",What is the name of the hotel they are looking...,
1,1,Hi there! Can you give me some info on Cityroomz?,What is the name of the hotel they are looking...,cityroomz
2,2,I am looking for a hotel named alyesbray lodge...,What is the name of the hotel they are looking...,alyesbray lodge guest house
3,3,I am looking for a restaurant. I would like so...,What is the name of the hotel they are looking...,
4,4,I'm looking for an expensive restaurant in the...,What is the name of the hotel they are looking...,


### Create indices, questions, answers, and contexts for BERT

In [None]:
def convert_df_to_QAC(df):
    '''convert a dataframe to a list of indices, questions, answers, and contexts.'''
    indices = df["index"].to_list()
    questions = df["question"].to_list()
    answers = df["answer"].to_list()
    contexts = df["utts"].to_list()
    
    return indices, questions, answers, contexts

In [None]:
train_i, train_q, train_a, train_c = convert_df_to_QAC(train_df)
dev_i, dev_q, dev_a, dev_c = convert_df_to_QAC(dev_df)
test_i, test_q, test_a, test_c = convert_df_to_QAC(test_df)

In [None]:
print("sent ID: ", train_i[1])
print("C: ", train_c[1])
print("Q: ", train_q[1])
print("A: ", train_a[1])

sent ID:  1
C:  Hi there! Can you give me some info on Cityroomz?
Q:  What is the name of the hotel they are looking for?
A:  cityroomz


In [None]:
assert len(train_q) == len(train_a) == len(train_c) == len(train_i)
assert train_q[0] == "What is the name of the hotel they are looking for?"

### Convert to BERT tensors

In [None]:
# load tokenizer
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=760289.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1312669.0, style=ProgressStyle(descript…




In [None]:
def convert_to_BERT_tensors(questions, contexts):
    '''takes a parallel list of question strings and answer strings'''
    #your code here
    tokenized = tokenizer(questions, contexts, return_tensors='pt', max_length=512, truncation=True, padding=True)
    ids, mask = tokenized['input_ids'], tokenized['attention_mask']

    return ids, mask

In [None]:
def get_answer_span_tensor(question,context,answer):
    # your code here
    input_str = "[CLS]" + question + "[SEP]" + context + "[SEP]"
    tok_input = tokenizer.tokenize(input_str)
    tok_a = tokenizer.tokenize(answer)
    answer_len = len(tok_a)
    
    for i, x in enumerate(tok_input):
      if x in tok_a:
        candidate = tok_input[i:i+answer_len]
        if tok_a == candidate:
          start_idx = i
          end_idx = i+answer_len-1
          break
    
    try:
      if start_idx <= 512 and end_idx <= 512:
        return torch.Tensor([start_idx, end_idx]).to(torch.long)
      
      # out of boundary
      else:
        return torch.Tensor([0, 0]).to(torch.long)
    
    # no answer
    except UnboundLocalError:
      return torch.Tensor([0, 0]).to(torch.long)

In [None]:
# test 
print("Q: ", train_q[1])
print("C: ", train_c[1])
print("A: ", train_a[1])
start_idx, end_idx = get_answer_span_tensor(train_q[1], train_c[1], train_a[1])
test_tokens = tokenizer.tokenize("[CLS]" + train_q[1] + "[SEP]" + train_c[1] + "[SEP]")
assert test_tokens[start_idx:end_idx+1] == ['▁city', 'room', 'z']
print("success!")

Q:  What is the name of the hotel they are looking for?
C:  Hi there! Can you give me some info on Cityroomz?
A:  cityroomz
success!


### Build `QAdataset` and a corresponding dataloader

In [None]:
batch_size = 16

class QAdataset(Dataset):
    '''A dataset for housing QA data, including input_data, output_data, and padding mask'''
    def __init__(self, input_data, output_data,mask):
        self.input_data = input_data
        self.output_data = output_data
        self.mask = mask
        
    def __len__(self):
        return len(self.input_data)
    
    def __getitem__(self, index):
        target = self.output_data[index]
        data_val = self.input_data[index]
        mask = self.mask[index]
        return data_val,target,mask 

In [None]:
def prepare_QA_dataset(questions, contexts, answers=[], split="train"):
    '''for split in "train", "dev", "test, prepare Pytorch dataset by reading the files
    and converting the data to tensors. For test, provides dummy answers'''
    
    QA_input, masks = convert_to_BERT_tensors(questions, contexts)
    
    if not split == "test":
        spans = torch.zeros((len(questions), 2)).to(torch.long)
        for i, (q, c, a) in enumerate(zip(questions, contexts, answers)):
            spans[i] = get_answer_span_tensor(q, c, a)
            
    else:
        spans = torch.Tensor([(0, 0) for _ in range(len(questions))]).to(torch.long)
    
    return QAdataset(QA_input, spans, masks)

In [None]:
train_dataset = prepare_QA_dataset(train_q, train_c, train_a, split="train")
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False)

In [None]:
dev_dataset = prepare_QA_dataset(dev_q, dev_c, dev_a, split="dev")
dev_dataloader = DataLoader(dev_dataset, batch_size=batch_size, shuffle=False)

In [None]:
test_dataset = prepare_QA_dataset(test_q, test_c, test_a, split="test")
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

### Train BERT QA model

In [None]:
def train(dataloader, model):
    '''Trains model with the given dataloader. Returns total loss of one epoch.'''
    tot_loss = 0
    steps = 0
    for data_val, target, mask in tqdm(dataloader):
        data_val, target, mask = data_val.to("cuda"), target.to("cuda"), mask.to("cuda")
        logits = model(data_val, mask)
        start_logits, end_logits = logits["start_logits"], logits["end_logits"]
        loss = start_loss_function(start_logits, target[:,0])
        loss += end_loss_function(end_logits, target[:,1])
        tot_loss += loss.item()
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        steps += 1

    return tot_loss

In [None]:
def evaluate(dataloader, model):
    '''Evalulates the model with the given dataloader. 
    Returns accuracies of start indicies and the end indices.'''
  all_pred_start, all_label_start = [], []
  all_pred_end, all_label_end = [], []
  with torch.no_grad():
    model.eval()
    for data_val, target, mask in tqdm(dataloader):
      data_val, target, mask = data_val.to("cuda"), target.to("cuda"), mask.to("cuda")
      logits = model(data_val, mask)
      sys_starts, sys_ends = logits["start_logits"].cpu().data.argmax(dim=1), logits["end_logits"].cpu().data.argmax(dim=1)
      gold_starts, gold_ends = target[:,0], target[:,1]

      all_pred_start.extend([i.item() for i in sys_starts])
      all_label_start.extend([i.item() for i in gold_starts])
      all_pred_end.extend([i.item() for i in sys_ends])
      all_label_end.extend([i.item() for i in gold_ends])

  accuracy_start = accuracy_score(all_label_start, all_pred_start)
  accuracy_end = accuracy_score(all_label_end, all_pred_end)
  
  return accuracy_start, accuracy_end

In [None]:
model = AlbertForQuestionAnswering.from_pretrained('albert-base-v2')
model = model.to("cuda")
start_loss_function = nn.CrossEntropyLoss()
end_loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=684.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=47376696.0, style=ProgressStyle(descrip…




Some weights of the model checkpoint at albert-base-v2 were not used when initializing AlbertForQuestionAnswering: ['predictions.bias', 'predictions.LayerNorm.weight', 'predictions.LayerNorm.bias', 'predictions.dense.weight', 'predictions.dense.bias', 'predictions.decoder.weight', 'predictions.decoder.bias']
- This IS expected if you are initializing AlbertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of AlbertForQuestionAnswering were not initialized from the model checkpoint at albert-base-v2 and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN t

In [None]:
EPOCHS = 20
import time
for epoch in range(EPOCHS):
  start = time.time()
  train_loss= train(train_dataloader, model)
  train_start_acc, train_end_acc = evaluate(train_dataloader, model)
  dev_start_acc, dev_end_acc = evaluate(dev_dataloader, model)

  print(f"Epoch [{epoch+1}/{EPOCHS}], Loss: {train_loss:.4f}, Training start acc: {train_start_acc:.4f}, Training end acc: {train_end_acc:.4f}, Dev start acc: {dev_start_acc:.4f}, Dev end acc: {dev_end_acc:.4f}, Seconds: {time.time() - start:.4f} s")
  
  model_save = {
      "epoch":epoch,
      "model_state_dict": model.state_dict(),
      "optimizer_state_dict": optimizer.state_dict(),
      "loss": train_loss
  }
  torch.save(model_save, f"./ckpt/model_{epoch+1}.pt")

HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [1/20], Loss: 174.6433, Training start acc: 0.7556, Training end acc: 0.7562, Dev start acc: 0.7538, Dev end acc: 0.7554, Seconds: 110.5995 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [2/20], Loss: 113.6708, Training start acc: 0.7552, Training end acc: 0.7560, Dev start acc: 0.7530, Dev end acc: 0.7514, Seconds: 110.5675 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [3/20], Loss: 69.7719, Training start acc: 0.7587, Training end acc: 0.7601, Dev start acc: 0.7554, Dev end acc: 0.7554, Seconds: 110.6077 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [4/20], Loss: 52.1707, Training start acc: 0.8008, Training end acc: 0.8114, Dev start acc: 0.8216, Dev end acc: 0.8192, Seconds: 110.4029 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [5/20], Loss: 30.2858, Training start acc: 0.8975, Training end acc: 0.8991, Dev start acc: 0.9040, Dev end acc: 0.8967, Seconds: 110.6233 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [6/20], Loss: 30.6254, Training start acc: 0.8855, Training end acc: 0.8973, Dev start acc: 0.8927, Dev end acc: 0.8959, Seconds: 110.5686 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [7/20], Loss: 20.2877, Training start acc: 0.9970, Training end acc: 0.9968, Dev start acc: 0.9847, Dev end acc: 0.9806, Seconds: 110.5902 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [8/20], Loss: 13.4887, Training start acc: 0.9988, Training end acc: 0.9990, Dev start acc: 0.9903, Dev end acc: 0.9847, Seconds: 110.5969 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [9/20], Loss: 11.0451, Training start acc: 0.9974, Training end acc: 0.9966, Dev start acc: 0.9887, Dev end acc: 0.9814, Seconds: 110.5347 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [10/20], Loss: 15.5237, Training start acc: 0.9987, Training end acc: 0.9983, Dev start acc: 0.9903, Dev end acc: 0.9855, Seconds: 110.5436 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [11/20], Loss: 8.3315, Training start acc: 0.9980, Training end acc: 0.9984, Dev start acc: 0.9879, Dev end acc: 0.9847, Seconds: 110.6543 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [12/20], Loss: 12.0576, Training start acc: 0.9753, Training end acc: 0.9642, Dev start acc: 0.9572, Dev end acc: 0.9387, Seconds: 110.6410 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [13/20], Loss: 12.0278, Training start acc: 0.9988, Training end acc: 0.9988, Dev start acc: 0.9847, Dev end acc: 0.9839, Seconds: 110.7300 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [14/20], Loss: 9.9222, Training start acc: 0.9979, Training end acc: 0.9988, Dev start acc: 0.9887, Dev end acc: 0.9863, Seconds: 110.5910 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [15/20], Loss: 5.9909, Training start acc: 0.9989, Training end acc: 0.9994, Dev start acc: 0.9879, Dev end acc: 0.9847, Seconds: 110.7360 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Epoch [16/20], Loss: 4.9377, Training start acc: 0.9994, Training end acc: 0.9996, Dev start acc: 0.9887, Dev end acc: 0.9855, Seconds: 110.7512 s


HBox(children=(FloatProgress(value=0.0, max=705.0), HTML(value='')))

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-40-8bced642e7d2>", line 5, in <module>
    train_loss= train(train_dataloader, model)
  File "<ipython-input-37-667fd3eecdda>", line 12, in train
    loss.backward()
  File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 1823, in showtraceback
    stb = value._render_traceback_()

KeyboardInterrupt: ignored

In [None]:
# best model accuracy
model = AlbertForQuestionAnswering.from_pretrained('albert-base-v2')
model = model.to("cuda")
start_loss_function = nn.CrossEntropyLoss()
end_loss_function = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.00001)
checkpoint = torch.load("./ckpt/model_best_10.pt")
model.load_state_dict(checkpoint['model_state_dict'])

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=684.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=47376696.0, style=ProgressStyle(descrip…




Some weights of the model checkpoint at albert-base-v2 were not used when initializing AlbertForQuestionAnswering: ['predictions.bias', 'predictions.LayerNorm.weight', 'predictions.LayerNorm.bias', 'predictions.dense.weight', 'predictions.dense.bias', 'predictions.decoder.weight', 'predictions.decoder.bias']
- This IS expected if you are initializing AlbertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of AlbertForQuestionAnswering were not initialized from the model checkpoint at albert-base-v2 and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN t

<All keys matched successfully>

In [None]:
accuracy_start, accuracy_end = evaluate(dev_dataloader, model)

print(f"Start accuracy: {accuracy_start:.4f}")
print(f"End accuracy: {accuracy_end:.4f}")

HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))


Start accuracy: 0.9903
End accuracy: 0.9855


## Generate predictions

In [None]:
# this code is originally from COLX-563, lab3.
def select_best_answer_span(start_probs, end_probs, distance):
    ''' returns a list of spans corresponding to the highest probability QA solution which satisfy the restriction that the end index must
    be within distance after the start index'''
    output_spans = []
    for i in range(start_probs.shape[0]):
        best_indicies = None
        best_prob = -9999 # essentially zero probability in log space, could also use -np.inf
        for j in range(start_probs.shape[1]):
            for k in range(end_probs.shape[1]):
                if j <= k <= j + distance:
                    prob = start_probs[i,j] + end_probs[i,k]
                    if prob > best_prob:
                        best_prob = prob
                        best_indicies = (j,k)
        output_spans.append(best_indicies)
    return output_spans

In [None]:
def generate_answers(dataloader):
  '''iterate over the dataloader and return a list of predicted answers'''
  pred_answers = []
  for data_val, target, mask in tqdm(dataloader):
    data_val, target, mask = data_val.to("cuda"), target.to("cuda"), mask.to("cuda")
    logits = model(data_val, mask)
    sys_start_probs, sys_end_probs = logits["start_logits"].cpu().data, logits["end_logits"].cpu().data
    sys_spans = select_best_answer_span(sys_start_probs, sys_end_probs, 10)

    for i in range(data_val.shape[0]):
      curr_data = data_val[i]
      curr_start_idx, curr_end_idx = sys_spans[i]
      answer = tokenizer.decode(curr_data[curr_start_idx:curr_end_idx+1])
      if answer == "[CLS]":
        answer = ""
      pred_answers.append(answer)
  return pred_answers

In [None]:
pred_dev_a = generate_answers(dev_dataloader)

HBox(children=(FloatProgress(value=0.0, max=78.0), HTML(value='')))




In [None]:
pred_test_a = generate_answers(test_dataloader)

HBox(children=(FloatProgress(value=0.0, max=75.0), HTML(value='')))




### Calculate scores

In [None]:
accuracy_score(dev_a, pred_dev_a)

0.9838579499596449

In [None]:
f1_score(dev_a, pred_dev_a, average="macro")

0.8189724186000199

### Convert question - context - answer format to the original format (pandas dataframe) and save it as a csv file

In [None]:
question_to_aspect = {v:k for k, v in QUESTIONS.items()}
def convert_QAC_to_df(indices, questions, answers, contexts):
  '''Convert pairs of question-answer-context into the original dataframe format'''
  aspects = [question_to_aspect[q] for q in questions]
  df = pd.DataFrame({"index":indices, "utts":contexts, "aspects":aspects, "answers":answers})
  df = df.pivot(index=["index", "utts"],
         columns="aspects",
         values="answers").reset_index()
  df.columns.name = None
  df = df.replace("", np.nan, regex=True)

  return df.set_index("index")

In [None]:
BERT_dev_pred_df = convert_QAC_to_df(dev_i, dev_q, pred_dev_a, dev_c)
BERT_dev_pred_df.to_csv("BERT_dev_predictions.csv")

In [None]:
BERT_test_pred_df = convert_QAC_to_df(test_i, test_q, pred_test_a, test_c)
BERT_test_pred_df.to_csv("BERT_test_predictions.csv")