# Generating Sentences with TreeRNNs

This notebook goes through a minimal example of encoding one sentence into a distributed representation using a TreeRNN, and the using this distributed representation to generate another sentence using a different TreeRNN in reverse. To start, we'll do some data cleaning to make sure we have a good set of sentence pairs to train on. The main goal here is to remove sentences with mispelled words and oddities.

In [1]:
import enchant 
import random
import pickle
import numpy as np

from collections import namedtuple
from pysem.corpora import SNLI
from pysem.networks import DependencyNetwork
from pysem.generatives import EmbeddingGenerator, EncoderDecoder

checker = enchant.Dict('en_US')
TrainingPair = namedtuple('TrainingPair', ['sentence1', 'sentence2', 'label'])

snli = SNLI('/home/pblouw/snli_1.0/')
snli.load_xy_pairs()

def repair(sen):
    tokens = DependencyNetwork.parser(sen)
    if len(tokens) > 15:
        return None
    for token in tokens:
        if not checker.check(token.text):
            return None
    return sen

def clean_data(data):
    clean = []
    for item in data:
        s1 = repair(item.sentence1)
        s2 = repair(item.sentence2)
        if s1 == None or s2 == None:
            continue
        else:
            clean.append(TrainingPair(s1, s2, item.label))
    return clean

def build_vocab(data):
    vocab = set()
    for item in data:
        parse1 = DependencyNetwork.parser(item.sentence1)
        parse2 = DependencyNetwork.parser(item.sentence2)
        
        for p in parse1:
            if p.text not in vocab:
                vocab.add(p.text)
        
        for p in parse2:
            if p.text not in vocab:
                vocab.add(p.text)

    return sorted(list(vocab))

In [2]:
clean_dev = clean_data(snli.dev_data[:])
clean_train = clean_data(snli.train_data[:])
clean_test = clean_data(snli.test_data[:])

Next, we'll build a vocab from the set of cleaned sentence pairs. 

In [3]:
data = clean_dev + clean_test + clean_train
vocab = build_vocab(data)

In [4]:
print(len(vocab))

22555


Now we can collect all of the sentence pairs standing in entailment relations to one another.

In [5]:
train_data = [d for d in clean_train if d.label == 'entailment']
test_data = [d for d in clean_test if d.label == 'entailment']
dev_data = [d for d in clean_dev if d.label == 'entailment']

print(len(train_data))
print(len(test_data))
print(len(dev_data))

106288
1666
1701


In [6]:
dim = 300
iters = 1
rate = 0.01

vectors = 'w2v_embeddings.pickle'

with open('w2v_dep_vocabs.pickle', 'rb') as pfile:
    subvocabs = pickle.load(pfile)

encoder = DependencyNetwork(dim=dim, vocab=vocab, pretrained=vectors)
decoder = EmbeddingGenerator(dim=dim, subvocabs=subvocabs, vectors=vectors)

model = EncoderDecoder(encoder=encoder, decoder=decoder, data=train_data)
model.train(iters=iters, rate=rate)

On iteration  0


In [7]:
sample = random.choice(train_data)

print(sample)

model.encode(sample.sentence1)
model.decode(sample.sentence2)

TrainingPair(sentence1='Some people crossing the street with a yellow vehicle in the background.', sentence2='A group of people crossing a street in front of a vehicle.', label='entailment')


'a adults of people doing a animals for bus of a it .'

In [8]:
def compute_accuracy(data, model):
    total = 0 
    correct = 0

    for item in data:
        model.encoder.forward_pass(item.sentence1)
        model.decoder.forward_pass(item.sentence2, model.encoder.get_root_embedding())

        for node in model.decoder.tree:
            total += 1
            if node.pword.lower() == node.lower_:
                correct += 1

    return float(correct / total)

print(compute_accuracy(train_data, model))
print(compute_accuracy(dev_data, model))

0.4622308761261935
0.45877837116154874


In [9]:
model.save('enc_model.pickle','dec_model.pickle')

In [10]:
test_model = EncoderDecoder(encoder=None, decoder=None, data=train_data)
test_model.load('enc_model.pickle','dec_model.pickle')

print(compute_accuracy(train_data, test_model))
print(compute_accuracy(dev_data, test_model))

0.4622308761261935
0.45877837116154874


## Simple Entailment Generation Examples

This small amount of data probably isn't enough to generalize outside of the training set, so we'll first check how well the learned decoder is able to generate the entailments it has been trained on.

In [11]:
batch = random.sample(train_data, 5)

for sample in batch:
    model.encode(sample.sentence1)

    print('Sentence: ', sample.sentence1)
    print('Actual Entailment: ', sample.sentence2)
    print('Predicted Entailment: ', model.decode(sample.sentence2))
    print('')

Sentence:  Man giving female a leg up onto tree.
Actual Entailment:  A man is helping a woman into a tree.
Predicted Entailment:  a man is is a something on a roof .

Sentence:  A few people look at an advertisement on a city street.
Actual Entailment:  People are observing an advertisement outdoors.
Predicted Entailment:  group are are a escalator outside .

Sentence:  Two men on top of a roof fixing it.
Actual Entailment:  Two men are outdoors.
Predicted Entailment:  two men gentlemen people .

Sentence:  A group of people are watching fireworks.
Actual Entailment:  the people are watching things explode
Predicted Entailment:  the crowd are gathered eyes play

Sentence:  Two workers working on the gulf.
Actual Entailment:  There are two people.
Predicted Entailment:  there working two people .



## Random Entailment Generation Examples

We can also generate entailments using randomly chosen trees for the decoding network structure. This doesn't  always work very well.

In [12]:
batch = random.sample(train_data, 5)

for sample in batch:
    model.encode(sample.sentence1)

    print('Sentence: ', sample.sentence1)
    print('Actual Entailment: ', sample.sentence2)
    print('Predicted Entailment: ', model.decode(sample.sentence2))
    print('Random Tree Entailment: ', model.decode())
    print('')

Sentence:  A dog is jumping over a gate.
Actual Entailment:  A dog is jumping.
Predicted Entailment:  a dog is jumps .
Random Tree Entailment:  a dog is jumps its something

Sentence:  Firemen are lowered into a construction area.
Actual Entailment:  Firemen are being lowered.
Predicted Entailment:  group are being rebuilds .
Random Tree Entailment:  the group rebuilds in a bus .

Sentence:  A person in dark blue walks down a path between trees.
Actual Entailment:  There is a person wearing dark blue.
Predicted Entailment:  there walks a people sitting green ride .
Random Tree Entailment:  a woman is walks a green walk .

Sentence:  A hairy man steps out of the swimming pool.
Actual Entailment:  A hairy man steps out of the pool.
Predicted Entailment:  a black man is in in a water .
Random Tree Entailment:  a man is is his waterfall .

Sentence:  An art exhibit featuring portraits of guns and cattle heads.
Actual Entailment:  Pictures are at an art exhibit.
Predicted Entailment:  adult

## Generating Entailment Chains (i.e. Inferential Roles)

We can also generate entailment chains by re-encoding a generated sentence, and then generating new sentence from the subsequent encoding. This is kind of neat because it allows us to distill what the model has learned in a network of inferential relationships between sentences. Philosophers sometimes argue that the meaning of sentences is determined by it's role or location in such a network.

In [13]:
s1 = 'A black dog with a blue collar is jumping into the water.'
s2 = 'Two police officers are sitting on motorcycles in the road.'
s3 = 'Five people are playing in a gymnasium.'
s4 = 'A man curls up in a blanket on the street.'

sentences = [s1, s2, s3, s4]

for sentence in sentences:
    print('Sentence: ', sentence)
    model.encode(sentence)
    entailment = model.decode()
    print('Predicted Entailment: ', entailment)
    model.encode(entailment)
    print('Next Entailment: ', model.decode())
    print('')

Sentence:  A black dog with a blue collar is jumping into the water.
Predicted Entailment:  a dog man jumping outside .
Next Entailment:  a man is jumping rock sport .

Sentence:  Two police officers are sitting on motorcycles in the road.
Predicted Entailment:  a Motorcycles in vehicle
Next Entailment:  a group bikes a bike .

Sentence:  Five people are playing in a gymnasium.
Predicted Entailment:  there are people outside .
Next Entailment:  there are people being sitting .

Sentence:  A man curls up in a blanket on the street.
Predicted Entailment:  a young man is is is on his something
Next Entailment:  man is drumming on a band on a hat .



In [14]:
def condition(encoder, decoder, s1, s2, cond):
    encoder.forward_pass(s1)
    decoder.forward_pass(s2, encoder.get_root_embedding() + cond)

    true = [node.lower_ for node in decoder.tree]
    predicted = [node.pword for node in decoder.tree]
    print('Predicted Entailment: ', ' '.join(predicted))
      
s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'water'
cond = encoder.vectors[cond_word]

print('')
print('Sentence: ', s1)
print('Conditioning Context: ', cond_word)

encoder.forward_pass('')
condition(encoder, decoder, s1, s2, cond)


Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  water
Predicted Entailment:  a black man waterfall in a low water .


## Substitional Analysis

Finally, it is also possible to examine the effect a given word or phrase has on entailment generation via substitutions. Essentially, this involves looking at the difference made to the most likely entailment when a given word or phrase in the input sentence is replaced with another word or phrase.

In [19]:
s2 = 'the dog is on her phone'
s3 = 'the dog is outside'
s4 = 'the dog is selling the bone'
s5 = 'a dog wearing some clothes is indoors'
s6 = 'a dog are inside a car'
s7 = 'the boy is red'
s8 = 'three people are indoors'
s9 = 'a boy is not indoors'

def substitution(model, sentence1, sentence2):
    model.encode(sentence1)

    print('Sentence: ', sentence1)
    print('Predicted Entailment: ', model.decode(sentence2))
    print('')    

sentence = 'A girl in a beige shirt is sleeping in a car.'
substitution(model, s1, s2)

s1 = 'A man in a beige shirt is sleeping in a car.'
substitution(model, s1, s2)

s1 = 'A woman in a beige shirt is sleeping in a car.'
substitution(model, s1, s2)

s1 = 'A boy in a beige shirt is sleeping in a car.'
substitution(model, s1, s3) 

s1 = 'A woman in a beige shirt is sleeping in a car.'
substitution(model, s1, s3)

s1 = 'A man in a beige shirt is driving in a car.'
substitution(model, s1, s4)

s1 = 'A person in a beige shirt is selling her car.'
substitution(model, s1, s4)

s1 = 'A boy in a red shirt is waiting in a store.'
substitution(model, s1, s5)

s1 = 'Some men in red shirts are waiting in a store.'
substitution(model, s1, s6)

s1 = 'Many women in red shirts are waiting in a store.'
substitution(model, s1, s6)

s1 = 'A boy and a girl are waiting in a store.'
substitution(model, s1, s8)

s1 = 'A boy and a girl are waiting in a playground.'
substitution(model, s1, s8)

s1 = 'A boy in a red shirt is sleeping in a car.'
substitution(model, s1, s9)

s1 = 'A boy in a red shirt is waiting in a store.'
substitution(model, s1, s9)

Sentence:  A mother and daughter walk along the side of a bridge.
Predicted Entailment:  a people sitting over their something

Sentence:  A man in a beige shirt is sleeping in a car.
Predicted Entailment:  a man is on his room

Sentence:  A woman in a beige shirt is sleeping in a car.
Predicted Entailment:  a woman is in their room

Sentence:  A boy in a beige shirt is sleeping in a car.
Predicted Entailment:  a man is outside

Sentence:  A woman in a beige shirt is sleeping in a car.
Predicted Entailment:  a woman is outside

Sentence:  A man in a beige shirt is driving in a car.
Predicted Entailment:  a man is is a something

Sentence:  A person in a beige shirt is selling her car.
Predicted Entailment:  a woman is cleans a something

Sentence:  A boy in a red shirt is waiting in a store.
Predicted Entailment:  a artist wearing a shirt is outside

Sentence:  Some men in red shirts are waiting in a store.
Predicted Entailment:  a men are in a bus

Sentence:  Many women in red shirts 

In [16]:
s1 = 'A fisherman using a cellphone on a boat.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A Man is eating food next to a child on a bench.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A shirtless man skateboards on a ledge.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A man wearing a hat and boots is digging for something in the snow.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A man is on a boat.'
s2 = 'A man is outside'
substitution(model, s1, s2)

s1 = 'A man is on a bench.'
s2 = 'A man is outside'
substitution(model, s1, s2)

s1 = 'A man is on a skateboard.'
s2 = 'A man is outside'
substitution(model, s1, s2)

s1 = 'A man is in the snow.'
s2 = 'A man is outside'
substitution(model, s1, s2)


Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man rehabbing on a boat

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man grass on a something

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man skateboarder on a roof

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man holds in a water

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man is outside

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man is outside

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man is outside

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a man is outside



In [17]:
s1 = 'Some kids are wrestling on an inflatable raft.'
s2 = 'the boy is on the beach.'
substitution(model, s1, s2)

s2 = 'the kids are outside.'
substitution(model, s1, s2)

s2 = 'Some kids wrestle outside in the sun.'
substitution(model, s1, s2)

s2 = 'The kids are with an inflatable raft.'
substitution(model, s1, s2)

s2 = 'young kids wrestle with each other.'
substitution(model, s1, s2)

s2 = 'old children play all over the water.'
substitution(model, s1, s2)

s2 = 'the kids wrestle with an fierce determination.'
substitution(model, s1, s2)

s1 = 'Several kids are all on a raft.'
substitution(model, s1, s2)

s2 = 'They raft on three kids.'
substitution(model, s1, s2)

s2 = 'a rafts used in the match.'
substitution(model, s1, s2)

s2 = 'the kids are in the water.'
substitution(model, s1, s2)

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  the children playing in a playing .

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  the children playing outside .

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  the children playing outside in a playing .

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  the children playing in a outdoor playing .

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  young children playing in a playing .

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  young children playing skateboards in a playing .

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  the children playing in a outdoor playing .

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  the adults are on a outdoor other .

Sentence:  A girl in a beige 

In [20]:
def condition(model, s1, s2, condition, sen=None):
    if sen: 
        model.encoder.forward_pass(condition)
        cond = model.encoder.get_root_embedding()
    else:
        cond = model.encoder.vectors[condition]
    
    model.encode(s1)
    model.decoder.forward_pass(s2, model.encoder.get_root_embedding() + cond)

    predicted = [node.pword for node in decoder.tree]
    print('Sentence: ', s1)
    print('Conditioning Context: ', condition)
    print('Predicted Entailment: ', ' '.join(predicted))
    print('')
      
s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'water'
condition(model, s1, s2, cond_word)
        
s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'blue'
condition(model, s1, s2, cond_word)

s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'fishing'
condition(model, s1, s2, cond_word)

        
s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'sleep'
condition(model, s1, s2, cond_word)

s1 = 'A shirtless man sleeps in his blue boat out on the open waters.'
s2 = 'The red man is in the big boat.'
cond_word = 'boat'
condition(model, s1, s2, cond_word)


s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'Two people are walking.'
cond_sen = 'How many people are walking?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'The mother and daughter walk together.'
cond_sen = 'Are the mother and daughter walking?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'The bridge is over a river.'
cond_sen = 'What is the bridge over?'
condition(model, s1, s2, cond_sen, sen=True)

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  water
Predicted Entailment:  a black man waterfall in a low water .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  blue
Predicted Entailment:  a older man sits in a sunny boat .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  fishing
Predicted Entailment:  a older man fishing in a low boat .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  sleep
Predicted Entailment:  a black man sleeping in a blue boat .

Sentence:  A shirtless man sleeps in his blue boat out on the open waters.
Conditioning Context:  boat
Predicted Entailment:  a big man kayak in a sunny boat .

Sentence:  A mother and daughter walk along the side of a bridge.
Conditioning Context:  How many people are walking?
Predicted Entailment:  two adults are walking .

Sentence:  