# SQuAD in Keras (Question and Answer)

Let's solve a SQuAD problem using Keras and BERT

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Bscailly when we input the question sentece, we give a corresponding answer
Simply put, AI help solving Enligh comprehension

In this notebook, we build the model in Keras in 4 steps
1) Understand SQuAD 
2) Create BERT Input 
3) Create SQUAD model 
4) Inference using testset

![squad]('./img/squad.png')

Actaully SQuAD does not asnwer fully, it only gives a very first and last words in that sentence. So, if it knows the first and last words, it naturally makes sense to understand the whole sentence.

- Reference: SQuAD 2.0 [Github](https://rajpurkar.github.io/SQuAD-explorer/)

In [19]:
import numpy as np
import pandas as pd
from keras import backend as K
from keras import Input, Model
from keras import optimizers
import keras as keras
from keras.layers import Layer, Embedding, Dense, Input, LSTM, Bidirectional, Activation, Conv1D, GRU, TimeDistributed, Dropout
from keras.models import Model, load_model
from keras.preprocessing.text import Tokenizer, text_to_word_sequence
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

import warnings
import tensorflow as tf
import os
import re
import pickle
import codecs
from tqdm import tqdm
import shutil
import json
import matplotlib.pyplot as plt
warnings.filterwarnings(action='ignore')

In [None]:
Install Keras-bert: Keras API of BERT model 
Install Keras_radam: Revised version of ADAM optimizer 
# !pip install keras-bert
# !pip install keras-radam   

In [20]:
from keras_bert import load_trained_model_from_checkpoint, load_vocabulary
from keras_bert import Tokenizer
from keras_bert import AdamWarmup, calc_train_steps

from keras_radam import RAdam

Downaloding Model and Data to your directory, and building a few inital helper functions are the same as '02_Sentiment_Analysis_using_Bert' file. Please refer to the previousJupyter notebbok for a codeset

# 1. Download the SQuAD dataset

You can do Wget
```
!wget https://raw.githubusercontent.com/nate-parrott/squad/master/data/train-v1.1.json
!wget https://raw.githubusercontent.com/nate-parrott/squad/master/data/dev-v1.1.json
```
Or direct download and save it in the data directory 
[Github Link](https://github.com/nate-parrott/squad/tree/master/data)

In [21]:
os.listdir('./data')

['.ipynb_checkpoints',
 'bert',
 'dev-v1.1.json',
 'glove',
 'News_Category_Dataset_v2.json',
 'ratings_test.txt',
 'ratings_train.txt',
 'train-v1.1.json']

A function to make SQuAD JSON file to Pandas Dataframe
> Reference: https://www.kaggle.com/sanjay11100/squad-stanford-q-a-json-to-pandas-dataframe

In [22]:
def squad_json_to_dataframe_train(input_file_path, record_path = ['data','paragraphs','qas','answers'],
                           verbose = 1):
    """
    input_file_path: path to the squad json file.
    record_path: path to deepest level in json file default value is
    ['data','paragraphs','qas','answers']
    verbose: 0 to suppress it default is 1
    """
    if verbose:
        print("Reading the json file")    
    file = json.loads(open(input_file_path).read())
    if verbose:
        print("processing...")
    # parsing different level's in the json file
    js = pd.io.json.json_normalize(file , record_path )
    m = pd.io.json.json_normalize(file, record_path[:-1] )
    r = pd.io.json.json_normalize(file,record_path[:-2])
    
    #combining it into single dataframe
    idx = np.repeat(r['context'].values, r.qas.str.len())
    ndx  = np.repeat(m['id'].values,m['answers'].str.len())
    m['context'] = idx
    js['q_idx'] = ndx
    main = pd.concat([ m[['id','question','context']].set_index('id'),js.set_index('q_idx')],1,sort=False).reset_index()
    main['c_id'] = main['context'].factorize()[0]
    if verbose:
        print("shape of the dataframe is {}".format(main.shape))
        print("Done")
    return main

In [23]:
# Load train data
train = squad_json_to_dataframe_train("./data/train-v1.1.json")

Reading the json file
processing...
shape of the dataframe is (87599, 6)
Done


We see that the training data is loaded suceessfully now
The column 'Question' is the question part and 'context' is input to the model.

the output (answer) is the first and last words in 'text' column. For example, if the value in 'text' column is 'Saint Bernadette Soubirous', the answer is 'Saint' and 'Soubrious' /

And one characteristics of SQuAD problem is that, the answer of the 'text' is actually in 'context'. 

In [24]:
train

Unnamed: 0,index,question,context,answer_start,text,c_id
0,5733be284776f41900661182,To whom did the Virgin Mary allegedly appear i...,"Architecturally, the school has a Catholic cha...",515,Saint Bernadette Soubirous,0
1,5733be284776f4190066117f,What is in front of the Notre Dame Main Building?,"Architecturally, the school has a Catholic cha...",188,a copper statue of Christ,0
2,5733be284776f41900661180,The Basilica of the Sacred heart at Notre Dame...,"Architecturally, the school has a Catholic cha...",279,the Main Building,0
3,5733be284776f41900661181,What is the Grotto at Notre Dame?,"Architecturally, the school has a Catholic cha...",381,a Marian place of prayer and reflection,0
4,5733be284776f4190066117e,What sits on top of the Main Building at Notre...,"Architecturally, the school has a Catholic cha...",92,a golden statue of the Virgin Mary,0
...,...,...,...,...,...,...
87594,5735d259012e2f140011a09d,In what US state did Kathmandu first establish...,"Kathmandu Metropolitan City (KMC), in order to...",229,Oregon,18890
87595,5735d259012e2f140011a09e,What was Yangon previously known as?,"Kathmandu Metropolitan City (KMC), in order to...",414,Rangoon,18890
87596,5735d259012e2f140011a09f,With what Belorussian city does Kathmandu have...,"Kathmandu Metropolitan City (KMC), in order to...",476,Minsk,18890
87597,5735d259012e2f140011a0a0,In what year did Kathmandu create its initial ...,"Kathmandu Metropolitan City (KMC), in order to...",199,1975,18890


In [48]:
# Max-length of a sentence. If the sentence is shorter than 384, remaining is filled with 0. 
# I chose 384 as random (due to memory)
SEQ_LEN = 384
BATCH_SIZE = 10
EPOCHS=2
LR=3e-5

# folder that has the pretrained BERT model
pretrained_path = os.path.abspath('./data/bert')

config_path = os.path.join(pretrained_path, 'bert_config.json')
checkpoint_path = os.path.join(pretrained_path, 'bert_model.ckpt')
vocab_path = os.path.join(pretrained_path, 'vocab.txt')

# Specify lables 
DATA_COLUMN = "context"
QUESTION_COLUMN = "question"
TEXT = "text"

#### Same step as Sentiment Analysis Notebook
Create a dictionary called 'token_dict' that adds numbering to words in vocab.txt 
So the flow of NLP is
**Tokonize the sentence into words ==> Words converted to Index (numbers) ==> Fed into the BERT model**

# 2. Tokenize the data

In [49]:
token_dict = {}
with codecs.open(vocab_path, 'r', 'utf8') as reader:
    for line in reader:
        token = line.strip()
        if "_" in token:
            token = token.replace("_","")
            token = "##" + token
        token_dict[token] = len(token_dict)

In [50]:
tokenizer = Tokenizer(token_dict)

In [51]:
tokenizer

<keras_bert.tokenizer.Tokenizer at 0x207d7e5f448>

#### Check if tokenization is done well

In [52]:
print(tokenizer.tokenize("keras is reall fun."))

['[CLS]', 'keras', 'is', 'real', '##l', 'fun', '.', '[SEP]']


In [53]:
question = train['question'][0]
context = train['context'][0]
text = train['text'][0]

Look at sample question, context and answer

In [54]:
question

'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?'

In [55]:
context

'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'

In [56]:
# answer
text

'Saint Bernadette Soubirous'

#### In raw format

Our goal is to get 'question' and 'context' and creata a model to get 'text'. /
Tokenized answer is ['[CLS]', 'saint', 'bern', '##ade', '##tte', 'sou', '##bir', '##ous', '[SEP]'], but we are trying to fine-tuning the model to predict the location of 'saint' and '##ous'

In [57]:
print(tokenizer.tokenize(question, context))

['[CLS]', 'to', 'whom', 'did', 'the', 'vir', '##gin', 'mar', '##y', 'allegedly', 'appear', 'in', '1858', 'in', 'lo', '##urde', '##s', 'franc', '##e', '?', '[SEP]', 'architectural', '##ly', ',', 'the', 'school', 'has', 'a', 'cat', '##hol', '##ic', 'character', '.', 'ato', '##p', 'the', 'main', 'building', "'", 's', 'gold', 'dome', 'is', 'a', 'golden', 'statue', 'of', 'the', 'vir', '##gin', 'mar', '##y', '.', 'immediately', 'in', 'front', 'of', 'the', 'main', 'building', 'and', 'facing', 'it', ',', 'is', 'a', 'copper', 'statue', 'of', 'ch', '##rist', 'with', 'arms', 'up', '##rais', '##ed', 'with', 'the', 'legend', '"', 'ven', '##ite', 'ad', 'me', 'om', '##nes', '"', '.', 'next', 'to', 'the', 'main', 'building', 'is', 'the', 'basilica', 'of', 'the', 'sacred', 'heart', '.', 'immediately', 'behind', 'the', 'basilica', 'is', 'the', 'gr', '##otto', ',', 'a', 'mari', '##an', 'place', 'of', 'prayer', 'and', 'reflect', '##ion', '.', 'it', 'is', 'a', 'replica', 'of', 'the', 'gr', '##otto', 'at', 

In [58]:
print(tokenizer.tokenize(text))

['[CLS]', 'saint', 'bern', '##ade', '##tte', 'sou', '##bir', '##ous', '[SEP]']


In [59]:
# This converts Words into Index (numbers)
def convert_data(data_df):
    global tokenizer
    indices, segments, target_start, target_end = [], [], [], []
    for i in tqdm(range(len(data_df))):
        
        ids, segment = tokenizer.encode(data_df[QUESTION_COLUMN][i], data_df[DATA_COLUMN][i], max_len=SEQ_LEN)
        

        text = tokenizer.encode(data_df[TEXT][i])[0]

        text_slide_len = len(text[1:-1])
        for i in range(1,len(ids)-text_slide_len-1):  
            exist_flag = 0
            if text[1:-1] == ids[i:i+text_slide_len]:
                ans_start = i
                ans_end = i + text_slide_len - 1
                exist_flag = 1
                break
        
        if exist_flag == 0:
            ans_start = SEQ_LEN
            ans_end = SEQ_LEN

        indices.append(ids)
        segments.append(segment)

        target_start.append(ans_start)
        target_end.append(ans_end)

    indices_x = np.array(indices)
    segments = np.array(segments)
    target_start = np.array(target_start)
    target_end = np.array(target_end)
    
    del_list = np.where(target_start!=SEQ_LEN)[0]

    indices_x = indices_x[del_list]
    segments = segments[del_list]
    target_start = target_start[del_list]
    target_end = target_end[del_list]

    train_y_0 = keras.utils.to_categorical(target_start, num_classes=SEQ_LEN, dtype='int64')
    train_y_1 = keras.utils.to_categorical(target_end, num_classes=SEQ_LEN, dtype='int64')
    train_y_cat = [train_y_0, train_y_1]
    
    return [indices_x, segments], train_y_cat

In [60]:
# Load dataframe and split it into train/test

def load_data(df):
    data_df = df
    
    data_df[DATA_COLUMN] = data_df[DATA_COLUMN].astype(str)
    data_df[QUESTION_COLUMN] = data_df[QUESTION_COLUMN].astype(str)


    data_x, data_y = convert_data(data_df)

    return data_x, data_y

In [61]:
train_x, train_y = load_data(train)

100%|███████████████████████████████████████████████████████████████████████████| 87599/87599 [03:21<00:00, 434.57it/s]


In [62]:
train_x

[array([[  101, 10114, 18104, ...,     0,     0,     0],
        [  101, 12976, 10124, ...,     0,     0,     0],
        [  101, 10105, 78253, ...,     0,     0,     0],
        ...,
        [  101, 10169, 12976, ...,     0,     0,     0],
        [  101, 10106, 12976, ...,     0,     0,     0],
        [  101, 12976, 10124, ...,     0,     0,     0]]),
 array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]])]

# 3. Pre-trained model

In [63]:
layer_num = 12
model = load_trained_model_from_checkpoint(
    config_path,
    checkpoint_path,
    training=False,
    trainable=True,
    seq_len=SEQ_LEN)
model.summary()

Model: "model_6"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Input-Token (InputLayer)        (None, 384)          0                                            
__________________________________________________________________________________________________
Input-Segment (InputLayer)      (None, 384)          0                                            
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 384, 768), ( 91812096    Input-Token[0][0]                
__________________________________________________________________________________________________
Embedding-Segment (Embedding)   (None, 384, 768)     1536        Input-Segment[0][0]              
____________________________________________________________________________________________

Now let's create a couple custom layer for transfer learning to be implemented
We need to specify NonMaking function to unleash the tensor that was in BERT's default masking.
This is a necessary step to train BERT model in SQuAD problem

In [64]:
class NonMasking(Layer):   
    def __init__(self, **kwargs):   
        self.supports_masking = True  
        super(NonMasking, self).__init__(**kwargs)   
  
    def build(self, input_shape):   
        input_shape = input_shape   
  
    def compute_mask(self, input, input_mask=None):   
        return None   
  
    def call(self, x, mask=None):   
        return x   
  
    def get_output_shape_for(self, input_shape):   
        return input_shape

Then we create two Keras Custom Layer 
'MyLayer_Start' predicts the first word of the answer 
'MyLaer_End' predicts the last word of the answer 

The fundamental of two layers are basically the same.
Get the last layer of the BERT as input, we make (batch_size, 384, 768) tensor to (batch_size, 384, 2)
Then we divide tensor into two, to be able to have two different output: (batch_size, 384) and (batch_size, 384)

It's 350, because it predicts the location of 384 words 

In [65]:
class MyLayer_Start(Layer):

    def __init__(self,seq_len, **kwargs):
        
        self.seq_len = seq_len
        self.supports_masking = True
        super(MyLayer_Start, self).__init__(**kwargs)

    def build(self, input_shape):
        
        self.W = self.add_weight(name='kernel', 
                                 shape=(768,2),
                                 initializer='uniform',
                                 trainable=True)
        super(MyLayer_Start, self).build(input_shape)

    def call(self, x):
        
        x = K.reshape(x, shape=(-1,384,768))
        x = K.dot(x, self.W)
        
        x = K.permute_dimensions(x, (2,0,1))

        self.start_logits, self.end_logits = x[0], x[1]
        
        self.start_logits = K.softmax(self.start_logits, axis=-1)
        
        return self.start_logits

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.seq_len)


class MyLayer_End(Layer):
    def __init__(self,seq_len, **kwargs):

            self.seq_len = seq_len
            self.supports_masking = True
            super(MyLayer_End, self).__init__(**kwargs)

    def build(self, input_shape):

            self.W = self.add_weight(name='kernel', 
                                     shape=(768, 2),
                                     initializer='uniform',
                                     trainable=True)
            super(MyLayer_End, self).build(input_shape)


    def call(self, x):


            x = K.reshape(x, shape=(-1,384,768))
            x = K.dot(x, self.W)
            x = K.permute_dimensions(x, (2,0,1))

            self.start_logits, self.end_logits = x[0], x[1]

            self.end_logits = K.softmax(self.end_logits, axis=-1)

            return self.end_logits

    def compute_output_shape(self, input_shape):
            return (input_shape[0], self.seq_len)

# 4. Create a model

Create a model that ouputs the BERT output
it will predict 'start_answer' and 'end_answer'

In [66]:
from keras.layers import merge, dot, concatenate
from keras import metrics

def get_bert_finetuning_model(model):
    
    inputs = model.inputs[:2]
    dense = model.output
    x = NonMasking()(dense)
    outputs_start = MyLayer_Start(384)(x)
    outputs_end = MyLayer_End(384)(x)
    
    bert_model = keras.models.Model(inputs, [outputs_start, outputs_end])
    
    bert_model.compile(
          optimizer=RAdam(learning_rate=LR, decay=0.001),
          loss='categorical_crossentropy',
          metrics=['accuracy'])

    return bert_model

# 5.Start Training

In [67]:
bert_model = get_bert_finetuning_model(model)
bert_model.summary()

Model: "model_7"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Input-Token (InputLayer)        (None, 384)          0                                            
__________________________________________________________________________________________________
Input-Segment (InputLayer)      (None, 384)          0                                            
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 384, 768), ( 91812096    Input-Token[0][0]                
__________________________________________________________________________________________________
Embedding-Segment (Embedding)   (None, 384, 768)     1536        Input-Segment[0][0]              
____________________________________________________________________________________________

In [68]:
history = bert_model.fit(train_x, 
                         train_y, 
                         batch_size=10, 
                         validation_split=0.1, # we can do validation_data=(test_x, test_y) instead
                         shuffle=False, 
                         verbose=True)

Train on 78555 samples, validate on 8729 samples
Epoch 1/1
   50/78555 [..............................] - ETA: 66:56:04 - loss: 11.8252 - my_layer__start_3_loss: 6.0306 - my_layer__end_3_loss: 5.7946 - my_layer__start_3_accuracy: 0.0000e+00 - my_layer__end_3_accuracy: 0.0000e+00


KeyboardInterrupt



# 6. Save the best model

In [70]:
path = os.path.abspath('./data')

In [71]:
bert_model.save_weights(path+"/squad_wordpiece.h5")

# 7. Retrain the model. 

(now we empty the 'validation_split' string, so that it trains on a whole dataset)

In [None]:
bert_model.compile(optimizer=RAdam(learning_rate=0.00003, decay=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
bert_model.fit(train_x, train_y, batch_size=10, shuffle=False, verbose=1)

2nd retraining

In [None]:
bert_model.save_weights(path+"/squad_wordpiece_3.h5")

In [None]:
bert_model = get_bert_finetuning_model(model)
bert_model.load_weights(path+"/squad_wordpiece_3.h5")

Test data set에 대한 bert_input을 만들어 줍니다.
Train data set과는 다르게 label을 생성하지 않습니다.

In [72]:
def convert_pred_data(question, doc):
    global tokenizer
    indices, segments = [], []
    ids, segment = tokenizer.encode(question, doc, max_len=SEQ_LEN)
    indices.append(ids)
    segments.append(segment)
    indices_x = np.array(indices)
    segments = np.array(segments)
    return [indices_x, segments]

def load_pred_data(question, doc):
    data_x = convert_pred_data(question, doc)
    return data_x

Create a helper function that returns an answer from the question

In [74]:
def predict_letter(question, doc):
  
    test_input = load_pred_data(question, doc)
    test_start, test_end = bert_model.predict(test_input)

    indexes = tokenizer.encode(question, doc, max_len=SEQ_LEN)[0]
    start = np.argmax(test_start, axis=1).item()
    end = np.argmax(test_end, axis=1).item()
    start_tok = indexes[start]
    end_tok = indexes[end]
    print("Question : ", question)

    print("-"*50)
    print("Context : ", end = " ")

    def split_text(text, n):
        for line in text.splitlines():
            while len(line) > n:
                x, line = line[:n], line[n:]
                yield x
            yield line



    for line in split_text(doc, 150):
        print(line)

    print("-"*50)
    print("ANSWER : ", end = " ")
    print("\n")
    sentences = []

    for i in range(start, end+1):
        token_based_word = reverse_token_dict[indexes[i]]
        sentences.append(token_based_word)
        print(token_based_word, end= " ")

    print("\n")
    print("Untokenized Answer : ", end = "")
    for w in sentences:
        if w.startswith("##"):
            w = w.replace("##", "")
        else:
            w = " " + w

    print(w, end="")
    print("")

# 7. Inference on Test set

In [75]:
def squad_json_to_dataframe_dev(input_file_path, record_path = ['data','paragraphs','qas','answers'],
                           verbose = 1):
    """
    input_file_path: path to the squad json file.
    record_path: path to deepest level in json file default value is
    ['data','paragraphs','qas','answers']
    verbose: 0 to suppress it default is 1
    """
    if verbose:
        print("Reading the json file")    
    file = json.loads(open(input_file_path).read())
    if verbose:
        print("processing...")
    # parsing different level's in the json file
    js = pd.io.json.json_normalize(file , record_path )
    m = pd.io.json.json_normalize(file, record_path[:-1] )
    r = pd.io.json.json_normalize(file,record_path[:-2])
    
    #combining it into single dataframe
    idx = np.repeat(r['context'].values, r.qas.str.len())
    m['context'] = idx
    main = m[['id','question','context','answers']].set_index('id').reset_index()
    main['c_id'] = main['context'].factorize()[0]
    if verbose:
        print("shape of the dataframe is {}".format(main.shape))
        print("Done")
    return main

In [77]:
input_file_path ='./data/dev-v1.1.json'
record_path = ['data','paragraphs','qas','answers']
verbose = 0
dev = squad_json_to_dataframe_dev(input_file_path=input_file_path,record_path=record_path)

Reading the json file
processing...
shape of the dataframe is (10570, 5)
Done


In [78]:
dev

Unnamed: 0,id,question,context,answers,c_id
0,56be4db0acb8001400a502ec,Which NFL team represented the AFC at Super Bo...,Super Bowl 50 was an American football game to...,"[{'answer_start': 177, 'text': 'Denver Broncos...",0
1,56be4db0acb8001400a502ed,Which NFL team represented the NFC at Super Bo...,Super Bowl 50 was an American football game to...,"[{'answer_start': 249, 'text': 'Carolina Panth...",0
2,56be4db0acb8001400a502ee,Where did Super Bowl 50 take place?,Super Bowl 50 was an American football game to...,"[{'answer_start': 403, 'text': 'Santa Clara, C...",0
3,56be4db0acb8001400a502ef,Which NFL team won Super Bowl 50?,Super Bowl 50 was an American football game to...,"[{'answer_start': 177, 'text': 'Denver Broncos...",0
4,56be4db0acb8001400a502f0,What color was used to emphasize the 50th anni...,Super Bowl 50 was an American football game to...,"[{'answer_start': 488, 'text': 'gold'}, {'answ...",0
...,...,...,...,...,...
10565,5737aafd1c456719005744fb,What is the metric term less used than the New...,"The pound-force has a metric counterpart, less...","[{'answer_start': 82, 'text': 'kilogram-force'...",2066
10566,5737aafd1c456719005744fc,What is the kilogram-force sometimes reffered ...,"The pound-force has a metric counterpart, less...","[{'answer_start': 114, 'text': 'kilopond'}, {'...",2066
10567,5737aafd1c456719005744fd,What is a very seldom used unit of mass in the...,"The pound-force has a metric counterpart, less...","[{'answer_start': 274, 'text': 'slug'}, {'answ...",2066
10568,5737aafd1c456719005744fe,What seldom used term of a unit of force equal...,"The pound-force has a metric counterpart, less...","[{'answer_start': 712, 'text': 'kip'}, {'answe...",2066


In [79]:
import random
for i in random.sample(range(100),100):
    doc = dev['context'][i]
    question = dev['question'][i]
    answers = dev['answers'][i]
    predict_letter(question, doc)
    print("")
    print("real answer : ", answers)
    print("")

Question :  Who was the main performer at this year's halftime show?
--------------------------------------------------
Context :  CBS broadcast Super Bowl 50 in the U.S., and charged an average of $5 million for a 30-second commercial during the game. The Super Bowl 50 halftime s
how was headlined by the British rock group Coldplay with special guest performers Beyoncé and Bruno Mars, who headlined the Super Bowl XLVII and Supe
r Bowl XLVIII halftime shows, respectively. It was the third-most watched U.S. broadcast ever.
--------------------------------------------------
ANSWER :  



Untokenized Answer : 

UnboundLocalError: local variable 'w' referenced before assignment