# Generating Sentences with TreeRNNs

This notebook goes through a minimal example of encoding one sentence into a distributed representation using a TreeRNN, and the using this distributed representation to generate another sentence using a different TreeRNN in reverse. To start, we'll do some data cleaning to make sure we have a good set of sentence pairs to train on. The main goal here is to remove sentences with mispelled words and oddities.

In [1]:
import enchant 
import random
import pickle
import numpy as np

from collections import namedtuple
from pysem.corpora import SNLI
from pysem.networks import DependencyNetwork
from pysem.generatives import EmbeddingGenerator

checker = enchant.Dict('en_US')
TrainingPair = namedtuple('TrainingPair', ['sentence1', 'sentence2', 'label'])

snli = SNLI('/home/pblouw/snli_1.0/')
snli.load_xy_pairs()

def repair(sen):
    tokens = DependencyNetwork.parser(sen)
    if len(tokens) > 15:
        return None
    for token in tokens:
        if not checker.check(token.text):
            return None
    return sen

def clean_data(data):
    clean = []
    for item in data:
        
        s1 = repair(item.sentence1)
        s2 = repair(item.sentence2)
        if s1 == None or s2 == None:
            continue
        else:
            clean.append(TrainingPair(s1, s2, item.label))
    
    return clean

In [2]:
clean_dev = clean_data(snli.dev_data)
clean_train = clean_data(snli.train_data)
clean_test = clean_data(snli.test_data)

In [3]:
print(len(clean_dev))
print(len(clean_test))
print(len(clean_train))

4955
4839
306651


Next, we'll build a vocab from the set of cleaned sentence pairs. 

In [4]:
def build_vocab(data):
    vocab = set()
    for item in data:
        s1 = item.sentence1
        s2 = item.sentence2
        
        t1 = DependencyNetwork.parser(s1)
        t2 = DependencyNetwork.parser(s2)
        
        for t in t1:
            if t.text not in vocab:
                vocab.add(t.text)
        for t in t2:
            if t.text not in vocab:
                vocab.add(t.text)

    return sorted(list(vocab))

data = clean_dev + clean_test + clean_train
vocab = build_vocab(data)

In [5]:
print(len(vocab))

22555


Now we can collect all of the sentence pairs standing in entailment relations to one another.

In [6]:
train_data = [d for d in clean_train if d.label == 'entailment'] # or d.label == 'neutral']
test_data = [d for d in clean_test if d.label == 'entailment'] # or d.label == 'neutral']
dev_data = [d for d in clean_dev if d.label == 'entailment'] # or d.label == 'neutral']

print(len(train_data))
print(len(test_data))
print(len(dev_data))

106288
1666
1701


In [7]:
dim = 300
iters = 100
rate = 0.0006
batchsize = 10000

vectors = 'w2v_embeddings.pickle'

with open('w2v_dep_vocabs.pickle', 'rb') as pfile:
    subvocabs = pickle.load(pfile)

encoder = DependencyNetwork(dim=dim, vocab=vocab, pretrained=vectors)
decoder = EmbeddingGenerator(dim=dim, subvocabs=subvocabs, vectors=vectors)

for _ in range(iters):
    print('On iteration ', _)
    if _ == 60:
        rate = rate / 2.0
    if _ == 75:
        rate = rate / 2.0
    if _ == 90:
        rate = rate / 2.0
    
    batch = random.sample(train_data, batchsize)
    for sample in batch:
        s1 = sample.sentence1
        s2 = sample.sentence2

        encoder.forward_pass(s1)        
        decoder.forward_pass(s2, encoder.get_root_embedding())
        decoder.backward_pass(rate=rate)
        encoder.backward_pass(decoder.pass_grad, rate=rate)

On iteration  0
On iteration  1
On iteration  2
On iteration  3
On iteration  4
On iteration  5
On iteration  6
On iteration  7
On iteration  8
On iteration  9
On iteration  10
On iteration  11
On iteration  12
On iteration  13
On iteration  14
On iteration  15
On iteration  16
On iteration  17
On iteration  18
On iteration  19
On iteration  20
On iteration  21
On iteration  22
On iteration  23
On iteration  24
On iteration  25
On iteration  26
On iteration  27
On iteration  28
On iteration  29
On iteration  30
On iteration  31
On iteration  32
On iteration  33
On iteration  34
On iteration  35
On iteration  36
On iteration  37
On iteration  38
On iteration  39
On iteration  40
On iteration  41
On iteration  42
On iteration  43
On iteration  44
On iteration  45
On iteration  46
On iteration  47
On iteration  48
On iteration  49
On iteration  50
On iteration  51
On iteration  52
On iteration  53
On iteration  54
On iteration  55
On iteration  56
On iteration  57
On iteration  58
On iter

## Simple Entailment Generation Examples

This small amount of data probably isn't enough to generalize outside of the training set, so we'll first check how well the learned decoder is able to generate the entailments it has been trained on.

In [8]:
sample_trees = [d for d in dev_data if 5 < len(d.sentence2.split()) < 10]
batch = random.sample(train_data, 5)

for sample in batch:
    s1 = sample.sentence1
    s2 = sample.sentence2
    randsen = random.choice(sample_trees)

    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding())

    predicted = [node.pword for node in decoder.tree]
    true = [node.lower_ for node in decoder.tree]

    print('Sentence: ', s1)
    print('Predicted Entailment: ', ' '.join(predicted))
    print('Actual Entailment: ', ' '.join(true))
    print('')

Sentence:  A group of Asians and one Anglo stand outside bundled in winter clothing.
Predicted Entailment:  group are are outside .
Actual Entailment:  people are standing outside .

Sentence:  A man with a hat is playing the guitar.
Predicted Entailment:  playing playing a guitar .
Actual Entailment:  man playing a guitar .

Sentence:  Here is a picture of a band performing living at a concert.
Predicted Entailment:  a is performing a concert concert in concert of many fans .
Actual Entailment:  a band performing a concert live in front of many fans .

Sentence:  A lady holds a little girl who is trying to catch bubbles.
Predicted Entailment:  a girl is playing .
Actual Entailment:  a girl is playing .

Sentence:  Man in a helmet riding a dirt bike covered with mud.
Predicted Entailment:  a man is riding with a bike .
Actual Entailment:  a man is traveling on a bike .



We can also generate entailments using randomly chosen trees for the decoding network structure. This doesn't work very well.

In [9]:
batch = random.sample(dev_data, 5)

for sample in batch:
    s1 = sample.sentence1
    s2 = sample.sentence2
    randsen = random.choice(sample_trees)

    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding())

    predicted = [node.pword for node in decoder.tree]
    true = [node.lower_ for node in decoder.tree]
    
    print('Sentence: ', s1)
    print('Predicted Entailment: ', ' '.join(predicted))
    print('Actual Entailment: ', ' '.join(true))

    decoder.forward_pass(randsen.sentence2, encoder.get_root_embedding())
    alternate = [node.pword for node in decoder.tree]
    print('Random Tree Entailment: ', ' '.join(alternate))
    print('')

Sentence:  Several people wait to checkout inside a store with a warehouse looking ceiling.
Predicted Entailment:  several people are are inside a store
Actual Entailment:  many people are waiting in a store
Random Tree Entailment:  the people are inside a store .

Sentence:  A person walking uphill from a construction zone.
Predicted Entailment:  a person walking in a construction zone .
Actual Entailment:  a hill is near a construction site .
Random Tree Entailment:  two person walking in a zone

Sentence:  Four motorcycles are racing on a dirt track.
Predicted Entailment:  four motorcycles are racing
Actual Entailment:  four motorcycles are racing
Random Tree Entailment:  the motorcycles racing on a track .

Sentence:  An Asian girl writes down something on a notepad in her lap.
Predicted Entailment:  an old girl is is .
Actual Entailment:  an asian girl is sitting .
Random Tree Entailment:  one girl is is on a writing

Sentence:  School kids all in blue backpacks.
Predicted Entailm

## Generating Entailment Chains (i.e. Inferential Roles)

We can also generate entailment chains by re-encoding a generated sentence, and then generating new sentence from the subsequent encoding. This is kind of neat because it allows us to distill what the model has learned in a network of inferential relationships between sentences. Philosophers sometimes argue that the meaning of sentences is determined by it's role or location in such a network.

In [50]:
s1 = 'A black dog with a blue collar is jumping into the water.'
s2 = 'A dog chases in a field.'
s3 = 'A frog is cold.'

def predict(encoder, decoder, s1, s2, s3):
    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding())

    true = [node.lower_ for node in decoder.tree]
    predicted = [node.pword for node in decoder.tree]

    print('Sentence: ', s1)
    print('Predicted Entailment: ', ' '.join(predicted))

    encoder.forward_pass(' '.join(predicted))
    decoder.forward_pass(s3, encoder.get_root_embedding())

    predicted = [node.pword for node in decoder.tree]
    print('Next Prediction: ', ' '.join(predicted))
    print('')

predict(encoder, decoder, s1, s2, s3)
    
s1 = 'A black dog with a blue collar is jumping into the water.'
s2 = 'Some dog\'s collar is blue.'
s3 = 'The man sleeps.'

predict(encoder, decoder, s1, s2, s3)

s1 = 'Two police officers are sitting on motorcycles in the road.'
s2 = 'Two policemen sit on their bikes.'
s3 = 'The men have big guns.'

predict(encoder, decoder, s1, s2, s3)

s1 = 'Five people are playing in a gymnasium.'
s2 = 'Some people are competing indoors.'
s3 = 'Some people are inside.'

predict(encoder, decoder, s1, s2, s3)

s1 = 'A woman, whose face can only be seen in a mirror, is applying eyeliner in a dimly lit room.'
s2 = 'The woman applies eyeliner.'
s3 = 'The red woman applies green eyeliner.'

predict(encoder, decoder, s1, s2, s3)

Sentence:  A black dog with a blue collar is jumping into the water.
Predicted Entailment:  a dog is in the water .
Next Prediction:  a dog is wet .

Sentence:  A black dog with a blue collar is jumping into the water.
Predicted Entailment:  a his s dog is wet .
Next Prediction:  a dog is .

Sentence:  Two police officers are sitting on motorcycles in the road.
Predicted Entailment:  two officers are on their road .
Next Prediction:  the officers are same outdoors .

Sentence:  Five people are playing in a gymnasium.
Predicted Entailment:  the people are are together .
Next Prediction:  the people are together .

Sentence:  A woman, whose face can only be seen in a mirror, is applying eyeliner in a dimly lit room.
Predicted Entailment:  a woman is makeup .
Next Prediction:  a female woman makeup physical makeup .



In [258]:
def condition(encoder, decoder, s1, s2, cond):
    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding() + cond)

    true = [node.lower_ for node in decoder.tree]
    predicted = [node.pword for node in decoder.tree]
    print('Predicted Entailment: ', ' '.join(predicted))
      
s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'water'
cond = encoder.vectors[cond_word]

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_word)

encoder.forward_pass('')
condition(encoder, decoder, s1, s2, cond)
        
        
        
s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'blue'
cond = encoder.vectors[cond_word]

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_word)

encoder.forward_pass('')
condition(encoder, decoder, s1, s2, cond)


s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'fishing'
cond = encoder.vectors[cond_word]

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_word)

encoder.forward_pass('')
condition(encoder, decoder, s1, s2, cond)
        
        
s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'sleep'
cond = encoder.vectors[cond_word]

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_word)

encoder.forward_pass('')
condition(encoder, decoder, s1, s2, cond)

s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'boat'
cond = encoder.vectors[cond_word]

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_word)

encoder.forward_pass('')
condition(encoder, decoder, s1, s2, cond)


s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'Two people are walking.'
cond_sen = 'How many people are walking?'

encoder.forward_pass(cond_sen)
cond = encoder.get_root_embedding()

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_sen)

condition(encoder, decoder, s1, s2, cond)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'The mother and daughter walk together.'
cond_sen = 'Are the mother and daughter walking?'

encoder.forward_pass(cond_sen)
cond = encoder.get_root_embedding()

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_sen)

condition(encoder, decoder, s1, s2, cond)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'The bridge is over a river.'
cond_sen = 'What is the bridge over?'

encoder.forward_pass(cond_sen)
cond = encoder.get_root_embedding()

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_sen)

condition(encoder, decoder, s1, s2, cond)


Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  water
Predicted Entailment:  a shirtless man is in the blue water .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  blue
Predicted Entailment:  a blue man is in the blue boat .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  fishing
Predicted Entailment:  a shirtless man fishing in the blue water .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  sleep
Predicted Entailment:  a shirtless man sleeps in the blue water .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  boat
Predicted Entailment:  a shirtless boat boat in the blue boat .

Sentence:  A mother and daughter walk along the side of a bridge.
Conditioning Context:  How many people are walking?
Predicted Entailment:  two people are wal

In [195]:
s1 = 'Several runners compete in a road race.'
s2 = 'the dog ran quickly to the beach.'

def sub_predict(encoder, decoder, s1, s2):
    
    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding())

    true = [node.lower_ for node in decoder.tree]
    predicted = [node.pword for node in decoder.tree]

    print('Sentence: ', s1)
    print('Predicted Entailment: ', ' '.join(predicted))
    print('')    

# sub_predict(encoder, decoder, s1, s2)
    
# s1 = 'Several runners compete in a road race.'
# s2 = 'The people are outside.'

# sub_predict(encoder, decoder, s1, s2)

# s1 = 'Several runners compete in a road race.'
# s2 = 'Some people run in a race.'

# sub_predict(encoder, decoder, s1, s2)

# s1 = 'Several runners compete in a road race.'
# s2 = 'Some runner walks.'

# sub_predict(encoder, decoder, s1, s2)


# s1 = 'The runners compete outside in a race.'
# s2 = 'The runners move quickly.'

# sub_predict(encoder, decoder, s1, s2)

# s1 = 'The runners are outside.'
# s2 = 'The runners do not walking.'

# sub_predict(encoder, decoder, s1, s2)


s1 = 'Some kids are wrestling on an inflatable raft.'
s2 = 'the boy is on the beach.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'the kids are outside.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'Some kids wrestle outside in the sun.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'The kids are with an inflatable raft.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'young kids wrestle with each other.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'old children play all over the water.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'the kids wrestle with an fierce determination.'
sub_predict(encoder, decoder, s1, s2)

s1 = 'Several kids are all on a raft.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'They raft on three kids.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'a rafts used in the match.'
sub_predict(encoder, decoder, s1, s2)

s2 = 'the kids are in the water.'
sub_predict(encoder, decoder, s1, s2)

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are together .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are together on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are on a inflatable raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  several kids are on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  several kids are all on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are on a inflatable raft .

Sentence:  Several kids are all on a raft.
Predicted Entailment:  the kids are on a several raft .

Sentence:  Several kids are all on a raft.
Predicted Entailment:  kids are on one raft .

Sentence:  Sev

In [279]:
s1 = 'Two people pose for the camera.'
s2 = 'Two people pose for a picture.'
sub_predict(encoder, decoder, s1, s2)

s1 = 'One person poses for the camera.'
sub_predict(encoder, decoder, s1, s2)

s1 = 'A man poses for the camera.'
sub_predict(encoder, decoder, s1, s2)

s1 = 'A man is posing for the camera.'
s2 = 'A man lays for the camera.'
sub_predict(encoder, decoder, s1, s2)


s1 = 'A man is posing on the grass.'
s2 = 'A man lays for the camera.'
sub_predict(encoder, decoder, s1, s2)


s1 = 'A man is sleeping on the grass.'
s2 = 'The man is laying down to sleep.'
sub_predict(encoder, decoder, s1, s2)

Sentence:  Two people pose for the camera.
Predicted Entailment:  two people pose for the camera .

Sentence:  One person poses for the camera.
Predicted Entailment:  one person poses for a camera .

Sentence:  A man poses for the camera.
Predicted Entailment:  one man poses for the camera .

Sentence:  A man is posing for the camera.
Predicted Entailment:  a man posing for the camera .

Sentence:  A man is posing on the grass.
Predicted Entailment:  a man is on the grass .

Sentence:  A man is sleeping on the grass.
Predicted Entailment:  a man is is down on grass .



In [269]:
for x in train_data[200:220]:
    print(x.sentence1)
    print(x.sentence2)
    print('')

Some children are playing jump rope.
Children are jumping rope.

A large golden dog sniffing the butt of a white dog
Two animals getting to know each other.

A sumo wrestler with a brown belt is pushing another wrestler in a bout.
Two sumo wrestlers compete in a match.

Two large dogs greet other while their owners watch.
the dogs see each other

Woman with green sweater and sunglasses smiling
A woman with a green sweater has a happy expression.

A woman with dark hair is wearing a green sweater.
A woman is there.

A smiling lady in a green jacket at a public gathering.
A happy woman smiling

A woman in a green jacket and black sunglasses outside in a crowd.
A woman is outside.

A climber is making his way up a snowy mountainside.
A climber is ascending

a lone person jumping through the air from one snowy mountain to another.
A person jumps in the air.

A man doing tricks in the snow.
The man is outside.

Two Asian people sit at a blue table in a food court.
Two people are seated toge

In [131]:
s1 = 'The little boy is jumping into a puddle on the street.'
s2 = 'A boy plays with my favorite puddle'
sub_predict(encoder, decoder, s1, s2)

# s1 = 'An animal is jumping to catch an object.'
# s2 = 'A white moose walks.'
s1 = 'Young players engage in the sport of Water polo while others watch.'
s2 = 'A man jumps in the big puddle'
sub_predict(encoder, decoder, s1, s2)

# s2 = 'A white moose walks.'
s1 = 'Some kids are wrestling on an inflatable raft.'
s2 = 'The bed is dirty'
s3 = 'The kids are wrestling'
sub_predict(encoder, decoder, s1, s3)

Sentence:  The little boy is jumping into a puddle on the street.
Predicted Entailment:  the boy is in his wet puddle

Sentence:  Young players engage in the sport of Water polo while others watch.
Predicted Entailment:  the players watch in a other water

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are are



In [120]:
for item in decoder.tree:
    print(item.lower_, item.dep_, item.head, item.pword)
    

a det man some
man nsubj stares kids
stares ROOT stares are
at prep stares on
some det puddle a
puddle pobj at raft


In [132]:
for item in decoder.tree:
    probs = np.copy(item.probs)
    idx = np.argmax(probs)
    indices =  np.argpartition(probs.flatten(), -3)[-3:]
    print(decoder.idx_to_wrd[item.dep_][idx], [(decoder.idx_to_wrd[item.dep_][x], probs[x]) for x in indices])

some [('a', array([ 0.02479785])), ('the', array([ 0.35729147])), ('some', array([ 0.61051977]))]
kids [('people', array([ 0.02372272])), ('kids', array([ 0.68255007])), ('children', array([ 0.26362851]))]
are [('is', array([ 0.00296623])), ('were', array([ 0.00402924])), ('are', array([ 0.99073111]))]
are [('playing', array([ 0.0363415])), ('kids', array([ 0.03651994])), ('are', array([ 0.49121337]))]


In [17]:
print(subvocabs['aux'])

{'wearing', 'ware', 'does', 'park', 'jump', 'considers', 'snow', 'Player', 'calmly', 'feels', 'Is', 'saliva', 'turned', 'Will', 'bungee', 'outdoors', 'match', 'looks', 'enjoy', 'be', 'swimming', 'outside', 'seafarer', 'crew', 'hate', 'balancing', 'laying', 'wanna', 'been', 'pep', 'TO', 'help', 'will', 'ins', 'to', 'birds', 'band', 'might', 'Can', 'goes', 'adult', 'prepares', 'likes', 'do', 'work', 'would', 'Men', 'should', 'id', 'pretend', 'microwave', 'unknowing', 'rock', 'like', 'doing', 'am', 'sits', 'is', 'busy', 's', 'sunshine', 'willing', 'walks', 'skateboarded', 'para', 'sleeping', 'sitting', 'building', 'competing', 'being', 'woman', 'ARE', 'Waves', 'avoiding', 'tries', 'ski', 'dressed', 'goers', 'leans', 'orange', 'jeep', 'came', 'must', 'have', 'enjoys', 'ride', 'walk', 'were', 'runs', 'n', 'was', 'WERE', 'did', 'stare', 'getting', 'can', 'both', 'inhaling', 'handicapped', 're', 'are', 'IS', 'may', 'shall', 'has', 'having', 'square', 'climbs', 'hang', 'outdoor', 'helping', 'w

In [18]:
total = 0 
correct = 0

for item in train_data:
    encoder.forward_pass(item.sentence1)
    decoder.forward_pass(item.sentence2, encoder.get_root_embedding())
    
    for node in decoder.tree:
        total += 1
        if node.pword.lower() == node.lower_:
            correct += 1
            
accuracy = float(correct / total)
print(accuracy)

0.7062324425311581


In [19]:
total = 0 
correct = 0

for item in test_data:
    encoder.forward_pass(item.sentence1)
    decoder.forward_pass(item.sentence2, encoder.get_root_embedding())
    
    for node in decoder.tree:
        total += 1
        if node.pword.lower() == node.lower_:
            correct += 1
            
accuracy = float(correct / total)
print(accuracy)

0.6167829757904713
