# Generating Entailments from Sentence Representations

A natural extension of the material covered in the previous notebook involves attempting to generate further sentences that are entailed by a given sentence. This generation procedure could then be carried out repeatedly to create complex networks of entailment relations amongst sentences. Such a network might then considered as a formalization of the "inferential roles" of the various sentences it contains. And more interestingly, question answering can be formulated as the generation of a specific entailed sentence conditional upon a query. 

This is all still very much a work in progress. 

## Generating Embeddings Assuming a Tree Structure

To approach the problem, we'll first encode a sentence into distributed representation using a tree-structured neural network of sort discussed previously. Then, we'll use a similar network to "decode out" from this representation an entailed sentence in a step-by-step manner. To simplify matters, we'll assume that the structure of the decoded sentences is given (i.e. the the structure of the parse tree is available), and aim to learn a set of weights for each dependency in the structure such that propagating activations through the structure results in each node being assigned an embedding that predicts the word appropriate to that node. Word prediction occurs by applying a softmax over the inner products between the embedding and vocabulary representations for each possible word; to simplify this process, we'll use subvocabularies that include only the words that could occupy a particular node given the dependency it occupies with respect to a head word. For instance, if a node occupies the 'det' dependency with respect to a head word (e.g. a noun like "guitar"), then only words like "a", "the", "some", etc. will be considered when computing the softmax. 

To start, we'll load the SNLI corpus and preprocess it to only consider sentence pairs that are labelled with the entailment relation. We'll use a sampling of the pairs to train our generative model, and then test how well the model generates entailments for novel sentences. 

In [10]:
import random
import pickle
import numpy as np

from pysem.corpora import SNLI
from pysem.networks import DependencyNetwork
from pysem.generatives import EmbeddingGenerator

snli = SNLI('/home/pblouw/corpora/snli_1.0/')
snli.extractor = snli.get_xy_pairs
snli.load_vocab('snli_vocab.pickle')

with open('subvocabs.pickle', 'rb') as pfile:
    subvocabs = pickle.load(pfile)

train_data = [d for d in snli.train_data if d.label == 'entailment'] # focus on entailment relations only

train_batch = train_data[:100000] # use small amount of data for prototyping
test_batch = train_data[100000:100100]

Next, we'll define the encoder and decoder networks, and then train both using the selected training examples.  

In [3]:
dim = 200
iters = 50
rate = 0.002

encoder = DependencyNetwork(dim=dim, vocab=snli.vocab)
decoder = EmbeddingGenerator(dim=dim, subvocabs=subvocabs)

for _ in range(iters):
    print('On iteration ', _)
    for sample in train_batch:
        s1 = sample.sentence1
        s2 = sample.sentence2

        encoder.forward_pass(s1)
        decoder.forward_pass(s2, encoder.get_root_embedding())
        decoder.backward_pass(rate=rate)
        encoder.backward_pass(decoder.pass_grad, rate=rate)

On iteration  0
On iteration  1
On iteration  2
On iteration  3
On iteration  4
On iteration  5
On iteration  6
On iteration  7
On iteration  8
On iteration  9
On iteration  10
On iteration  11
On iteration  12
On iteration  13
On iteration  14
On iteration  15
On iteration  16
On iteration  17
On iteration  18
On iteration  19
On iteration  20
On iteration  21
On iteration  22
On iteration  23
On iteration  24
On iteration  25
On iteration  26
On iteration  27
On iteration  28
On iteration  29
On iteration  30
On iteration  31
On iteration  32
On iteration  33
On iteration  34
On iteration  35
On iteration  36
On iteration  37
On iteration  38
On iteration  39
On iteration  40
On iteration  41
On iteration  42
On iteration  43
On iteration  44
On iteration  45
On iteration  46
On iteration  47
On iteration  48
On iteration  49


Now, it is possible to see how well the model is able to generate entailments. We can compute raw accuracies, which indicate how many entailed sentences in the training and test sets the model is able to generate completely correctly. We can also compute the proportation of correctly labelled nodes for entailments that were not generated completely correctly. 

In [4]:
def get_accuracy(batch, encoder, decoder):
    errors = 0
    stats = []
    for sample in batch:
        s1 = sample.sentence1
        s2 = sample.sentence2

        encoder.forward_pass(s1)
        decoder.forward_pass(s2, encoder.get_root_embedding())

        for node in decoder.tree:
            if node.lower_ != node.pword:
                errors += 1
                break

    return 'Raw Accuracy: ', (len(batch) - errors) / len(batch)

print(get_accuracy(train_batch, encoder, decoder))
print(get_accuracy(test_batch, encoder, decoder))

('Raw Accuracy: ', 0.04528)
('Raw Accuracy: ', 0.0)


In [11]:
for sample in random.sample(train_batch, 10):
    s1 = sample.sentence1
    s2 = sample.sentence2

    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding())

    predicted = [node.pword for node in decoder.tree]
    true = [node.lower_ for node in decoder.tree]
    
    print('Sentence: ', s1)
    print('Predicted Entailment: ', ' '.join(predicted))
    print('Actual Entailment: ', ' '.join(true))
    print('')

Sentence:  Men and women running in a competition.
Predicted Entailment:  are in a people running
Actual Entailment:  humans of both sexes running

Sentence:  An Asian woman wearing a Asian dress sitting among a group of cloths, with a woven basket on her lap.
Predicted Entailment:  an asian woman is in her
Actual Entailment:  an oriental woman sits by fabric

Sentence:  Subjects eating a McDonalds meal along the street.
Predicted Entailment:  eating eating meal
Actual Entailment:  people eating mcdonalds

Sentence:  A person stands next to the Easter Island statues.
Predicted Entailment:  a person is on the night .
Actual Entailment:  a person is at an island .

Sentence:  A man with long hair taking pictures with his camera.
Predicted Entailment:  man with a beard is man a facial food .
Actual Entailment:  someone in the picture is holding an electronic device .

Sentence:  A man in a jacket and tie smiles at a woman in a white dress.
Predicted Entailment:  man are smiling next and o

In [15]:
for sample in random.sample(test_batch, 5):
    s1 = sample.sentence1
    s2 = sample.sentence2

    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding())

    predicted = [node.pword for node in decoder.tree]
    true = [node.lower_ for node in decoder.tree]
    
    print('Sentence: ', s1)
    print('Predicted Entailment: ', ' '.join(predicted))
    print('Actual Entailment: ', ' '.join(true))
    print('')

Sentence:  A young woman walking on the sidewalk.
Predicted Entailment:  a woman walking outside
Actual Entailment:  a woman is outside

Sentence:  a girl wearing pair of red leggings and a wool dress walks by the window display of a store wearing her headphones looking carefree.
Predicted Entailment:  a woman is looks a something of medical bread .
Actual Entailment:  a girl is wearing a pair of red leggings .

Sentence:  a guy on a roof doing repairs.
Predicted Entailment:  a man is on roof of his building outside working it .
Actual Entailment:  a guy is on top of his house outside doing repairs .

Sentence:  A black and white dog is swimming in a lake.
Predicted Entailment:  two dog swimming outside .
Actual Entailment:  two dogs are outside .

Sentence:  A lady with a pink bike is smiling for the camera.
Predicted Entailment:  a lady is is
Actual Entailment:  the woman is smiling



The raw accuracy rates are somewhat misleading, since the generation process is only counted as accurate if every node in the tree is generated correctly. So, a tree could have 90% of its nodes generated correctly, yet still be counted as an error. 

Otherwise, this limited illustration suggests that the model is pretty good at learning to generate the items in the training data, but that it doesn't generalize very well. This isn't surprising given the limited amount of training data. The model is also very slow to train on the full dataset, so some optimizations and improvements will likely need to be considered to scale up to more effective training. 

## Generating Structure and Embeddings Jointly

A drawback of predicting the words that occupy each node in a tree is that the structure of the tree needs to be known ahead of time. To avoid this drawback, we'll again use a dependency network to encode the sentence, but we'll use a modified recurrent network to decode out the entailed sentence in a manner such that each step in the decoding process can be interpreted as adding a node and an edge to the tree.

At the start of decoding, the representation produced by the encoding network is mapped to the decoding network's hidden state, which is then used to predict a head word, a dependency, and a dependent word (i.e. the information needed to extend the tree by one node). Embeddings corresponding to these predicted items are then provided as the input to the decoding network at the next time step. Additionally, the weights between the input layer and the hidden layer at this time step are determined by the depedency predicted in the previous time step. The new hidden state is thus determined by these weights, the input embeddings, and the hidden state at previous time step. The new hidden state is also used to predict another extension to the tree as before. Generation ceases when no new extensions to the tree are predicted at a given time step. 

We'll train the generative decoder using a selection of entailment pairs, and then we'll see if it is able to predict each entailed sentence.

In [6]:
from pysem.generatives import TreeGenerator

encoder = DependencyNetwork(dim=dim, vocab=snli.vocab)
decoder = TreeGenerator(dim=dim, vocab=snli.vocab)

for _ in range(iters):
    print('On iteration ', _)
    for sample in train_batch[:10]:
        s1 = sample.sentence1
        s2 = sample.sentence2

        encoder.forward_pass(s1)
        decoder.forward_pass(encoder.get_root_embedding(), s2)
        decoder.backward_pass(rate=rate)
        encoder.backward_pass(decoder.pass_grad, rate=rate)

On iteration  0
On iteration  1
On iteration  2
On iteration  3
On iteration  4
On iteration  5
On iteration  6
On iteration  7
On iteration  8
On iteration  9
On iteration  10
On iteration  11
On iteration  12
On iteration  13
On iteration  14
On iteration  15
On iteration  16
On iteration  17
On iteration  18
On iteration  19
On iteration  20
On iteration  21
On iteration  22
On iteration  23
On iteration  24
On iteration  25
On iteration  26
On iteration  27
On iteration  28
On iteration  29
On iteration  30
On iteration  31
On iteration  32
On iteration  33
On iteration  34
On iteration  35
On iteration  36
On iteration  37
On iteration  38
On iteration  39


In [7]:
for sample in random.sample(train_batch[:10], 2):
    s1 = sample.sentence1
    s2 = sample.sentence2

    encoder.forward_pass(s1)
    decoder.forward_pass(encoder.get_root_embedding(), s2)
    
    print('')
    print('Source Sentence: ', s1)
    print('Correct Entailment: ', s2)
    print('')
    for node in decoder.sequence:
        print('Predicted Head: ', node.ph, '   Correct Head: ', node.head.lower_)
        print('Predicted Dep: ', node.pd, '   Correct Dep: ', node.dep_)
        print('Predicted Token: ', node.pw, '   Correct Token: ', node.lower_)
        print('')
    print('')


Source Sentence:  Two blond women are hugging one another.
Correct Entailment:  There are women showing affection.

Predicted Head:  are    Correct Head:  are
Predicted Dep:  ROOT    Correct Dep:  ROOT
Predicted Token:  are    Correct Token:  are

Predicted Head:  are    Correct Head:  are
Predicted Dep:  expl    Correct Dep:  expl
Predicted Token:  there    Correct Token:  there

Predicted Head:  are    Correct Head:  are
Predicted Dep:  attr    Correct Dep:  attr
Predicted Token:  women    Correct Token:  women

Predicted Head:  women    Correct Head:  women
Predicted Dep:  acl    Correct Dep:  acl
Predicted Token:  showing    Correct Token:  showing

Predicted Head:  showing    Correct Head:  showing
Predicted Dep:  dobj    Correct Dep:  dobj
Predicted Token:  affection    Correct Token:  affection

Predicted Head:  are    Correct Head:  are
Predicted Dep:  punct    Correct Dep:  punct
Predicted Token:  .    Correct Token:  .



Source Sentence:  A Little League team tries to catch

This is again a very limited demonstration, but it indicates that the model on track to learn how to generate sentences by predicting structure and content jointly. One needed addition to the model is a proper prediction heuristic that samples from the distribution for each predicted word, head, and dependency to create the input to the hidden state at the next time step. Scaling and speed are also issues.