<h1>Tensorflow neural model for question answering (SQuAD2)</h1>

<p>This notebook is based on the <a href="http://web.stanford.edu/class/cs224n/">CS224n class</a>. The notebook implements the neural baseline model in Tensorflow as described in the class <a href="http://web.stanford.edu/class/cs224n/project/default-final-project-handout-squad-track.pdf">final project</a> (IID SQuAD track). The difference is that the CS224n model is implemented in PyTorch, and here you can see how to build the same neural network but in Tensorflow.</p> 

<p>The implementation follows neural design described in Section 4 of the project write up. To prepare and preprocess the data run setup.py as described in Section 2.2.</p>

In [2]:
import os
import datetime
import json
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models

In [3]:
emb_path = './data/word_emb.json'
with open(emb_path) as f:
    embedding_matrix = json.load(f)
embedding_matrix = np.asarray(embedding_matrix, dtype=np.float32)
embedding_matrix.shape

(88714, 300)

In [4]:
learning_rate = 0.0001 # "Learning rate"

In [5]:
batch_size = 100 # "Batch size to use"
dropout = 0 # 1-keep_prob
context_len = 400 # "The maximum context length of your model"
question_len = 50 # "The maximum question length of your model"
(VOCAB_SIZE, embedding_size) = embedding_matrix.shape
hidden_size = 200 # "Size of the hidden states"
buffer_size = 10000
num_epochs = 2

<p>Build the model as shown below. Layers are names in the same way as in CS224n PyTorch model in qa_model.py</p>

In [6]:
context = layers.Input(shape=(context_len, ), name='context')
question = layers.Input(shape=(question_len, ), name='question')

context_embs = layers.Embedding(input_dim=VOCAB_SIZE,
                            output_dim=embedding_size,
                            weights=[embedding_matrix],
                            trainable=False)(context)
qn_embs = layers.Embedding(input_dim=VOCAB_SIZE,
                            output_dim=embedding_size,
                            weights=[embedding_matrix],
                            trainable=False)(question)

masking_layer = layers.Masking()
masked_context_embs = masking_layer(context_embs)
masked_qn_embs = masking_layer(qn_embs)

In [7]:
forward_layer = layers.GRU(hidden_size, dropout=dropout, return_sequences=True) 
backward_layer = layers.GRU(hidden_size, dropout=dropout, return_sequences=True, go_backwards=True)
bidirect_gru = layers.Bidirectional(forward_layer, 
                                    backward_layer=backward_layer)
context_hiddens = bidirect_gru(masked_context_embs)
question_hiddens = bidirect_gru(masked_qn_embs)

attn_output = layers.Attention()([context_hiddens, question_hiddens])
attn_output = layers.Dropout(dropout)(attn_output)
blended_reps = layers.concatenate([context_hiddens, attn_output], axis=2)

In [8]:
blended_reps_final = layers.Dense(units=hidden_size,
                                  activation='relu')(blended_reps)

In [9]:
tf.shape(context_hiddens)

<KerasTensor: shape=(3,) dtype=int32 inferred_value=[None, 400, 400] (created by layer 'tf.compat.v1.shape')>

In [108]:
logits_start = layers.Dense(units=1, activation=None)(blended_reps_final)
logits_start = tf.squeeze(logits_start, axis=[2])
prob_dist_start = layers.Softmax(axis=1, name = 'tf_op_layer_start')(logits_start)

logits_end = layers.Dense(units=1, activation=None)(blended_reps_final)
logits_end = tf.squeeze(logits_end, axis=[2]) 
prob_dist_end = layers.Softmax(axis=1, name = 'tf_op_layer_end')(logits_end)

In [120]:
model = models.Model(inputs=[context, question], outputs=[prob_dist_start, prob_dist_end])

In [121]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['SparseCategoricalAccuracy'])

In [148]:
model.summary()

Model: "model_8"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
context (InputLayer)            [(None, 400)]        0                                            
__________________________________________________________________________________________________
question (InputLayer)           [(None, 50)]         0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 400, 300)     26614200    context[0][0]                    
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 50, 300)      26614200    question[0][0]                   
____________________________________________________________________________________________

<p>Load train, dev, and test datasets:</p>

In [123]:
def load_dataset(npz_name, shuffle=False):
    npz = np.load(npz_name)
    npz_y1s = [context_len-1 if y < 0 else y for y in npz['y1s']]
    npz_y2s = [context_len-1 if y < 0 else y for y in npz['y2s']]
    context_ds = tf.data.Dataset.from_tensor_slices(npz['context_idxs']) 
    ques_ds = tf.data.Dataset.from_tensor_slices(npz['ques_idxs']) 
    start_label = tf.data.Dataset.from_tensor_slices(npz_y1s) 
    end_label = tf.data.Dataset.from_tensor_slices(npz_y2s) 
    dataset = tf.data.Dataset.zip(({'context': context_ds, 'question': ques_ds}, 
                                   {'tf_op_layer_start': start_label, 'tf_op_layer_end': end_label}))
    if shuffle:
        dataset = dataset.shuffle(buffer_size).batch(batch_size)
    else:
        dataset = dataset.batch(batch_size)
    return dataset
    
npz_train = './data/train.npz'
npz_dev = './data/dev.npz'
npz_test = './data/test.npz'

train_dataset = load_dataset(npz_train, shuffle=True)
dev_dataset = load_dataset(npz_dev)
test_dataset = load_dataset(npz_test)

<p>Set up checkpoints and logs:</p>

In [127]:
checkpoint_dir = 'training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_best_model.h5")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=False,
    save_best_only=True)

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

In [184]:
history = model.fit(train_dataset,
                    epochs=num_epochs,
                    callbacks=[checkpoint_callback, tensorboard_callback],
                    validation_data=dev_dataset,
                    validation_steps=1)

Epoch 1/2
Epoch 2/2


<p></p>

<p>After six training epochs we check the results and compute predictions. Variable prediction is a list of two arrays. Each array is of (# of test examples , context_len) shape. The first array contains predictions for the start position of each answer; the second array contains predictions for the end position of each answer.</p>

In [185]:
predictions = model.predict(test_dataset)

<p>The model predicts softmax values for each position in the context. So, we need to take argmax to get the post probable position:</p>

In [186]:
pred_start = tf.math.argmax(predictions[0], axis=1)
pred_end = tf.math.argmax(predictions[1], axis=1)

<p>Let’s have a look at the predictions for the first 20 test examples and compare those with the test labels:</p>

In [245]:
print(tf.slice(pred_start, [0], [20]).numpy())
for ex in test_dataset.take(1):
    print(ex[1]['tf_op_layer_start'].numpy()[:20])

[ 34  21  55  65 127 140 140 140 140 176 103  40 252 252 252 252 252  63
  57  68]
[ 34  21  55  65 127 399 399 399 399 176 103  40 399 399 399 399 399  63
  57 399]


In [246]:
print(tf.slice(pred_end, [0], [20]).numpy())
for ex in test_dataset.take(1):
    print(ex[1]['tf_op_layer_end'].numpy()[:20])    

[ 34  24  59  65 127 140 140 140 140 178 104  40 252 252 252 252 252  65
  58  68]
[ 34  24  59  65 127 399 399 399 399 178 104  40 399 399 399 399 399  65
  58 399]


In [77]:
model.save(os.path.join('./training_checkpoints', 'save_best_model.h5')) 