# Basic tutorial: Question answering
#### Author: Matteo Caorsi

This short tutorial provides you with the basic functioning of *giotto-deep* API.

## Scope

The example described in this tutorial is the one of question answering. A trained model would be able to **find** the answer inside a given *context*. Hence, we are not building models that can generate new sentences to answer an abstract question: rather, our models read a text (a.k.a. *context*) and try to answer a given question based on the information found in the context.

## Content

The main steps of the tutorial are the following:
 1. creation of a dataset
 2. creation of a model
 3. define metrics and losses
 4. train the model
 5. try to answer a question
 6. extract some features of the network to study the attention maps

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

import copy

from torch.nn import Transformer
from torch.optim import Adam, SparseAdam, SGD
import numpy as np
from gtda.diagrams import BettiCurve
from gtda.plotting import plot_betti_surfaces
import torch
from torch import nn
from torch.utils.data.sampler import SubsetRandomSampler

from gdeep.models import FFNet
from gdeep.visualization import persistence_diagrams_of_activations
from gdeep.data.datasets import DatasetBuilder
from gdeep.trainer import Trainer
from gdeep.models import ModelExtractor
from gdeep.utility import DEVICE
from gdeep.data import PreprocessingPipeline
from gdeep.data import TransformingDataset
from gdeep.data.preprocessors import Normalization, TokenizerQA
from gdeep.data.datasets import DataLoaderBuilder
from gdeep.visualization import Visualiser
from gdeep.search import GiottoSummaryWriter


# Initialize the tensorboard writer

In order to visualize and analyze the results of your models, you need to start tensorboard.
On the terminal, move inside the `/examples` folder. There run the following command:

```
tensorboard --logdir=runs
```

Then go [here](http://localhost:6006/) after the training to see all the visualization results.

In [None]:
writer = GiottoSummaryWriter()


# Create your dataset

In giotto-deep one can wrte a few lines to get the most famous datasets: in the next cell you will see how simple it is.

In [None]:
bd = DatasetBuilder(name="SQuAD2", convert_to_map_dataset=True)
ds_tr_str, ds_val_str, ds_ts_str = bd.build()


An item of the dataset contains a context and a question whose answer can be found within that context. The correct answer as well as the starting token are also provided: check the output of the next cell.

In [None]:
print("Before preprocessing: \n", ds_tr_str[0])


## Required preprocessing

Neural networks cannot direcly deal with strings. We have first to preprocess the dataset in three main ways:
 1. Tokenise the strings into its words
 2. Build a vocabulary out of these words
 3. Embed each word into a vector, so that each sentence becomes a list of vectors

The first two steps are performed by the `TokenizerQA`. The embedding will be added directly as layers to the model.

In [None]:

tokenizer = TokenizerQA()

# in case you need to combine multiple preprocessing:
# ppp = PreprocessingPipeline(((PreprocessTextData(), IdentityTransform(), TextDataset),
#                             (Normalisation(), IdentityTransform(), BasicDataset)))


tokenizer.fit_to_dataset(ds_tr_str)
transformed_textds = tokenizer.attach_transform_to_dataset(ds_tr_str)

transformed_textts = tokenizer.attach_transform_to_dataset(
    ds_val_str
)  # this has been fitted on the train set!

print("After the preprocessing: \n", transformed_textds[0])

# the only part of the training/test set we are interested in
train_indices = list(range(64 * 2))
test_indices = list(range(64 * 1))

dl_tr2, dl_ts2, _ = DataLoaderBuilder((transformed_textds, transformed_textts)).build(
    (
        {"batch_size": 16, "sampler": SubsetRandomSampler(train_indices)},
        {"batch_size": 16, "sampler": SubsetRandomSampler(test_indices)},
    )
)


## Define and train your model

The model for QA shall accept as input the context and the question and return the probabilities for the initial and final token of the answer in the input context. The output then, shall be a pair of logits.

In [None]:

# my simple transformer model
class QATransformer(nn.Module):
    def __init__(self, src_vocab_size, tgt_vocab_size, embed_dim):
        super(QATransformer, self).__init__()
        self.transformer = Transformer(
            d_model=embed_dim,
            nhead=2,
            num_encoder_layers=1,
            num_decoder_layers=1,
            dim_feedforward=512,
            dropout=0.1,
        )
        self.embedding_src = nn.Embedding(src_vocab_size, embed_dim, sparse=True)
        self.embedding_tgt = nn.Embedding(tgt_vocab_size, embed_dim, sparse=True)
        self.generator = nn.Linear(embed_dim, 2)

    def forward(self, ctx, qst):
        # print(src.shape, tgt.shape)
        ctx_emb = self.embedding_src(ctx).permute(1, 0, 2)
        qst_emb = self.embedding_tgt(qst).permute(1, 0, 2)
        # print(src_emb.shape, tgt_emb.shape)
        self.outs = self.transformer(qst_emb, ctx_emb).permute(1, 0, 2)
        # print(outs.shape)
        logits = self.generator(self.outs)
        return logits

    def __deepcopy__(self, memo):
        """this is needed to make sure that the 
        non-leaf nodes do not
        interfere with copy.deepcopy()
        """
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        for k, v in self.__dict__.items():
            setattr(result, k, copy.deepcopy(v, memo))
        return result

    def encode(self, src, src_mask):
        """this method is used only at the inference step"""
        return self.transformer.encoder(self.embedding_src(src), src_mask)

    def decode(self, tgt, memory, tgt_mask):
        """this method is used only at the inference step"""
        return self.transformer.decoder(self.embedding_tgt(tgt), memory, tgt_mask)


In [None]:
src_vocab_size = len(tokenizer.vocabulary)
tgt_vocab_size = len(tokenizer.vocabulary)
emb_size = 64

model = QATransformer(src_vocab_size, tgt_vocab_size, emb_size)
print(model)


## Define the loss function

This loss function is a adapted version of the Cross Entropy for the trnasformer architecture.

In [None]:
def loss_fn(output_of_network, label_of_dataloader):
    # print(output_of_network.shape, label_of_dataloader.shape)
    tgt_out = label_of_dataloader
    logits = output_of_network
    cel = nn.CrossEntropyLoss()
    return cel(logits, tgt_out)


## Train the model!

We are fnally there! We have defined the model, transformed the dataset so that it is manageable by standard layers and we have also adapted the loss function. We are ready to start the training: in giotto-deep, it is a matter of a few lines.

In [None]:
# prepare a pipeline class with the model, dataloaders loss_fn and tensorboard writer
pipe = Trainer(model, (dl_tr2, dl_ts2), loss_fn, writer)

# train the model
pipe.train(SGD, 3, False, {"lr": 0.01}, {"batch_size": 16})


## Answering questions!

Here we have a question and its associated context:

In [None]:
bb = next(iter(ds_val_str))
bb[:2]



Get the vocabulary and numericize the question and context to then input both to the model.

In [None]:
# get vocabulary and tokenizer
voc = tokenizer.vocabulary
context = tokenizer.tokenizer(bb[0])
question = tokenizer.tokenizer(bb[1])

# get the indexes in the vocabulary of the tokens
context_idx = torch.tensor(list(map(voc.__getitem__, context)))
question_idx = torch.tensor(list(map(voc.__getitem__, question)))


In [None]:
aa = next(iter(dl_tr2))
pad_fn = lambda length_to_pad, item: torch.cat(
    [item, tokenizer.pad_item * torch.ones(length_to_pad - item.shape[0])]
).to(torch.long)

# these tensors are ready to be fitted into the model
length_to_pad = aa[0][0].shape[-1]  # context length
context_ready_for_model = pad_fn(length_to_pad, context_idx)
length_to_pad = aa[0][1].shape[-1]  # question length
question_ready_for_model = pad_fn(length_to_pad, question_idx)


Put the two tensors of context and question together and input them to the model

In [None]:
input_list = [context_ready_for_model.view(1, -1).to(DEVICE), 
              question_ready_for_model.view(1,-1).to(DEVICE)]

out = pipe.model(*input_list)


The output corresponds to the digits for the start and end tokens of the answer. It is now time to extract them with `torch.argmax`

In [None]:
answer_idx = torch.argmax(out, dim=1)

# simple code to convert the model's answer into words
try:
    if answer_idx[0][1] > answer_idx[0][0]:
        print(
            "The model proposes: '",
            " ".join(context[answer_idx[0][0] : answer_idx[0][1]]),
            "...'",
        )
    else:
        print("The model proposes: '", context[answer_idx[0][0]], "...'")
except IndexError:
    print("The model was not able to find the answer.")
print("The actual answer was: '" + bb[2][0] + "'")


# Extract inner data from your models

In this section we are extracting, for the same input as above, the attention maps.

Such matrices are creating a relationsip between the question and the context, highlighting the words that most captured the transformer attention. Such maps are really useful to interpret the model results. 

In [None]:

# the model extractor
ex = ModelExtractor(pipe.model, loss_fn)

# getting the names of the layers
layer_names = ex.get_layers_param().keys()

print("Let's extract the activations of the first attention layer: ", next(iter(layer_names)))
self_attention = ex.get_activations(input_list)[-5:-3]


In [None]:
# let's plot the tensor! First, load th visualizer
vs = Visualiser(pipe)
vs.plot_self_attention(self_attention, context, question, figsize=(20, 20));



### Challenge

If you have trained the model with very few epochs and only a small subset of data, you would have probably obtained almost random results: Can you improve teh model and interpret these attention maps?

### Visualise your model interactively

One final note about visualising and inspecting transformer models: it is possible, in giotto-deep, to plot an interactive model graph in tensdorboard, so that you can eviscerate the inner working of the transformer visually and demistify these poewerful models!

In [None]:
from gdeep.visualization import Visualiser

vs = Visualiser(pipe)

vs.plot_interactive_model()
