# Results demo for Natural Language Inference (NLI)

In order to investigate the trained models, we use this notebook to load the final trained checkpoint of the BiLSTM with max pooling (the best performing model) and perform inference with a few examples. 

In [25]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [26]:
import torch
import os.path
from spacy.lang.en import English
import torch.nn.functional as F
from src.models.nliclassifier import NLIClassifier
from src.dataset.dataloaders import get_embeddings_for_data
from src.dataset.snli_dataset import str_to_idxs, label_map

We start by getting the vocabulary and embeddings vectors that were used to train the models. The following function will attempt to read them in if they are located in `/data/processed`. 

In [27]:
# Make sure to enter the parent dir of the embedding vocab+vector file used for training
emb_vocab, emb_vecs = get_embeddings_for_data(
    dataset_path=os.path.join("..", "data", "processed")
)

Map:   0%|          | 0/9824 [00:00<?, ? examples/s]

Map:   0%|          | 0/9842 [00:00<?, ? examples/s]

Map:   0%|          | 0/549367 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/9824 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/9842 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/549367 [00:00<?, ? examples/s]

.vector_cache/glove.840B.300d.zip:  11%|█         | 235M/2.18G [02:33<21:11, 1.53MB/s]     


KeyboardInterrupt: 

We then load the final model checkpoint for the BiLSTMMaxPool. There may be an error that the `embedding.weight` is missing from the checkpoint, but this was done simply to reduce the size of the checkpoints. We set `strict=False` and provide the embedding vectors to the model init function through the kwargs of the `load_from_checkpoint` function.

In [20]:
# Load the model along with the embeddings
model = (
    NLIClassifier.load_from_checkpoint(
        os.path.join(
            "..",
            "checkpoint",
            "real_blstmpme_train",
            "final.ckpt",
        ),
        strict=False,
        embedding_mat=emb_vecs,
    )
    .cpu()
    .eval()
)

We use the same tokenizer that was used to train the model. 

In [21]:
tokenizer = English().tokenizer

This function processes the string input, passes it through the model, computes the label, and prints out the final model judgement. 

In [22]:
def predict_and_print(sent_1, sent_2):
    inps_1, len_1 = str_to_idxs(sent_1, tokenizer, emb_vocab)
    inps_2, len_2 = str_to_idxs(sent_2, tokenizer, emb_vocab)
    out = model(inps_1, inps_2, len_1, len_2)
    probs = F.softmax(out, dim=-1)
    label = torch.argmax(probs, dim=1).unsqueeze(0).detach().item()
    print(
        f"""
    Premise: "{sent_1}"
    Hypothesis: "{sent_2}"
    Model judgement: "{label_map[label]}"
    """
    )

In [23]:
predict_and_print("Two men sitting in the sun", "Nobody is sitting in the shade")


    Premise: "Two men sitting in the sun"
    Hypothesis: "Nobody is sitting in the shade"
    Model judgement: "contradiction"
    


In [24]:
predict_and_print("A man is walking a dog", "No cat is outside")


    Premise: "A man is walking a dog"
    Hypothesis: "No cat is outside"
    Model judgement: "contradiction"
    


## Results analysis

The examples above are judged as contradictions by the model even though they should be judged as neutral. We can speculate a few reasons why this happens. In the first case, we have a double negative. This may confuse the model. It also required the model to understand the difference between the terms "sun" and "shade", and it may not have created a very robust understanding of these terms during its training, if these were not present in the training data. In the second case we have the term "No cat is outside". The model may not be paying enough attention to the word "No", thinking that the the contradiction lies in the fact that both sentences are referring to the same object that is outside, but one is referring to it as a dog, and the other as a cat. 