## Inferential Role Semantics for Natural Language

This notebook that illustrates how to use recursive neural networks to generate and manipulate inferential roles for natural language expressions. The basic idea is to use one neural network to encode a sentence into an embedding, and then use another neural network to decode the sentence's inferential consequences from this embedding.

First, we'll define some functions to do some basic preprocessing on the SNLI dataset.

In [1]:
import enchant 
import random
import pickle
import numpy as np

from collections import namedtuple
from pysem.corpora import SNLI
from pysem.networks import DependencyNetwork
from pysem.generatives import EmbeddingGenerator, EncoderDecoder

checker = enchant.Dict('en_US')
TrainingPair = namedtuple('TrainingPair', ['sentence1', 'sentence2', 'label'])

snli = SNLI('/Users/peterblouw/corpora/snli_1.0/') # modify this for your SNLI path
snli.load_xy_pairs()

def repair(sen):
    tokens = DependencyNetwork.parser(sen)
    if len(tokens) > 15:
        return None
    for token in tokens:
        if not checker.check(token.text):
            return None
    return sen

def clean_data(data):
    clean = []
    for item in data:
        s1 = repair(item.sentence1)
        s2 = repair(item.sentence2)
        if s1 == None or s2 == None:
            continue
        else:
            clean.append(TrainingPair(s1, s2, item.label))
   
    return clean

def build_vocab(data):
    vocab = set()
    for item in data:
        parse1 = DependencyNetwork.parser(item.sentence1)
        parse2 = DependencyNetwork.parser(item.sentence2)
        
        for t in parse1:
            if t.text not in vocab:
                vocab.add(t.text)
        
        for t in parse2:
            if t.text not in vocab:
                vocab.add(t.text)

    return sorted(list(vocab))

In [2]:
clean_dev = clean_data(snli.dev_data)
clean_train = clean_data(snli.train_data)
clean_test = clean_data(snli.test_data)

Next, we'll build a vocab from the set of cleaned sentence pairs. The number of items in the vocab can vary slightly depending on which version of the SpaCy dependency parser is being used.

In [3]:
data = clean_dev + clean_test + clean_train
vocab = build_vocab(data)

print(len(vocab))

22495


Now we can collect all of the sentence pairs standing in entailment relations to one another.

In [4]:
train_data = [d for d in clean_train if d.label == 'entailment']
test_data = [d for d in clean_test if d.label == 'entailment']
dev_data = [d for d in clean_dev if d.label == 'entailment']

print(len(train_data))
print(len(test_data))
print(len(dev_data))

106246
1666
1700


To train a model on the example entailment pairs from SNLI, we can do the following:

In [6]:
dim = 300

# dependency-specific vocabs to include relevant words for a particular POS.
with open('depdict.pickle', 'rb') as pfile:
    subvocabs = pickle.load(pfile) 

# 
encoder = DependencyNetwork(dim=dim, vocab=vocab)
decoder = EmbeddingGenerator(dim=dim, subvocabs=subvocabs)

learned_model = EncoderDecoder(encoder=encoder, decoder=decoder, data=train_data)
learned_model.train(iters=1, rate=0.0006, batchsize=10)

This is slow, so we can also load model paramters that have been previously generated:

In [7]:
model = EncoderDecoder(encoder=None, decoder=None, data=train_data)
model.load('encoder.pickle','decoder.pickle')

In [8]:
sample = random.choice(test_data)

model.encode(sample.sentence1)
model.decode(sample.sentence2)

'boys are playing outside in a fountain .'

In [9]:
def compute_accuracy(data, model):
    total = 0 
    correct = 0

    for item in data:
        model.encoder.forward_pass(item.sentence1)
        model.decoder.forward_pass(item.sentence2, model.encoder.get_root_embedding())

        for node in model.decoder.tree:
            total += 1
            if node.pword.lower() == node.lower_:
                correct += 1

    return float(correct / total)

print(compute_accuracy(train_data, model))
print(compute_accuracy(dev_data, model))

0.7056433181328429
0.6013361169102296


## Simple Entailment Generation Examples

This small amount of data probably isn't enough to generalize outside of the training set, so we'll first check how well the learned decoder is able to generate the entailments it has been trained on.

In [14]:
batch = random.sample(train_data, 5)

for sample in batch:
    model.encode(sample.sentence1)

    print('Sentence: ', sample.sentence1)
    print('Actual Entailment: ', sample.sentence2)
    print('Predicted Entailment: ', model.decode(sample.sentence2))
    print('')

Sentence:  A woman with a red purse in an orange shirt sitting and eating.
Actual Entailment:  A lady eats.
Predicted Entailment:  a woman eating .

Sentence:  People are walking in the rain with umbrellas and raincoats around the fountain.
Actual Entailment:  Some individuals are walking in the rain
Predicted Entailment:  the people are walking in the rain

Sentence:  Two men on a balcony look up and point at something.
Actual Entailment:  Two men on a balcony point at something.
Predicted Entailment:  two are at a balcony something at something .

Sentence:  Two friends run in a competitive race.
Actual Entailment:  Two people are running.
Predicted Entailment:  two people are running .

Sentence:  The man in the black jacket is walking past the dilapidated doorways.
Actual Entailment:  A man is walking through a doorway.
Predicted Entailment:  a man is walking past the street .



In [15]:
dec_samp = random.sample(train_data, 1).pop()
dec_tree = dec_samp.sentence2

sen = "The young man in colorful shorts is barefoot."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

sen = "A young man sleeping next to a dog."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

sen = "The 3 dogs are cruising down the street."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

sen = "Woman reading a book with a grocery tote."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

sen = "A man laughing while at a restaurant."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

sen = "Two individuals use a photo kiosk."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

sen = "A man pulling items on a cart."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

sen = "Three people are riding a carriage pulled by four horses."
model.encode(sen)
print('Sentence: ', sen)
print('Predicted Entailment: ', model.decode(dec_tree))
print('')

Sentence:  The young man in colorful shorts is barefoot.
Predicted Entailment:  the young man wearing outside .

Sentence:  A young man sleeping next to a dog.
Predicted Entailment:  a young man sleeping next .

Sentence:  The 3 dogs are cruising down the street.
Predicted Entailment:  the small dogs are outside .

Sentence:  Woman reading a book with a grocery tote.
Predicted Entailment:  a old woman reading outside .

Sentence:  A man laughing while at a restaurant.
Predicted Entailment:  a happy man laughing indoors .

Sentence:  Two individuals use a photo kiosk.
Predicted Entailment:  the few people are together .

Sentence:  A man pulling items on a cart.
Predicted Entailment:  a full man pulling outside .

Sentence:  Three people are riding a carriage pulled by four horses.
Predicted Entailment:  a several horses riding together .



## Random Entailment Generation Examples

We can also generate entailments using randomly chosen trees for the decoding network structure. This doesn't  always work very well.

In [16]:
batch = random.sample(train_data, 5)

for sample in batch:
    model.encode(sample.sentence1)

    print('Sentence: ', sample.sentence1)
    print('Actual Entailment: ', sample.sentence2)
    print('Predicted Entailment: ', model.decode(sample.sentence2))
    print('Random Tree Entailment: ', model.decode())
    print('')

Sentence:  A little boy in a green shirt is holding a large snake.
Actual Entailment:  A boy is holding a snake.
Predicted Entailment:  the boy is holding a snake .
Random Tree Entailment:  the boy holding boy .

Sentence:  A man in a gray and orange shirt stands in front of a streetlight.
Actual Entailment:  A streetlight exists on a road.
Predicted Entailment:  a man stands in a streetlight .
Random Tree Entailment:  man is stands .

Sentence:  A woman sits on a green bench while reading a book.
Actual Entailment:  A human sitting
Predicted Entailment:  a reading sitting
Random Tree Entailment:  a reading sitting a book .

Sentence:  An older man reads a newspaper in front of a store.
Actual Entailment:  A man is standing outside.
Predicted Entailment:  the man is is outside .
Random Tree Entailment:  man is up a newspaper in a store

Sentence:  A woman in a robe reading a book.
Actual Entailment:  A woman in a robe is reading.
Predicted Entailment:  a woman in a robe is reading .
Ra

## Generating Entailment Chains

We can also generate entailment chains by re-encoding a generated sentence, and then generating new sentence from the subsequent encoding. This is kind of neat because it allows us to distill what the model has learned in a network of inferential relationships between sentences.

In [27]:
s1 = 'A black dog with a blue collar is jumping into the water.'
s2 = 'Two police officers are sitting on motorcycles in the road.'
s3 = 'Five people are playing in a gymnasium.'
s4 = 'A man curls up in a blanket on the street.'

sentences = [s1, s2, s3, s4]

for sentence in sentences:
    print('Sentence: ', sentence)
    model.encode(sentence)
    entailment = model.decode()
    print('Predicted Entailment: ', entailment)
    model.encode(entailment)
    print('Next Entailment: ', model.decode())
    print('')

Sentence:  A black dog with a blue collar is jumping into the water.
Predicted Entailment:  a dog jumping into the water .
Next Entailment:  a dog jumping wet .

Sentence:  Two police officers are sitting on motorcycles in the road.
Predicted Entailment:  officers are down on the crowded road .
Next Entailment:  the officers of people are on the crowded road .

Sentence:  Five people are playing in a gymnasium.
Predicted Entailment:  people are are game .
Next Entailment:  the people are are .

Sentence:  A man curls up in a blanket on the street.
Predicted Entailment:  a man is in the blanket in the blanket .
Next Entailment:  is laying outside .



## Substitional Analysis

It is also possible to examine the effect a given word or phrase has on entailment generation via substitutions. Essentially, this involves looking at the difference made to the most likely entailment when a given word or phrase in the input sentence is replaced with another word or phrase.

In [28]:
# we'll use these sentences to generate decoding trees (note that just the parse is used)
s2 = 'the dog is walking on her phone'
s3 = 'the dog is outside'
s4 = 'the dog is selling the bone'
s5 = 'a dog wearing some clothes is indoors'
s6 = 'a dog is inside a car'
s7 = 'the dog is furry'
s8 = 'two dogs are alone'
s9 = 'The dog is not outdoors'

def substitution(model, sentence1, sentence2):
    model.encode(sentence1)

    print('Sentence: ', sentence1)
    print('Predicted Entailment: ', model.decode(sentence2))
    print('')    

s1 = 'A boy in a beige shirt is sleeping in a car.'
substitution(model, s1, s2)
    
s1 = 'A girl in a beige shirt is sleeping in a car.'
substitution(model, s1, s2)

s1 = 'A man in a beige shirt is sleeping in a car.'
substitution(model, s1, s2)

s1 = 'A woman in a beige shirt is sleeping in a car.'
substitution(model, s1, s2)

s1 = 'A boy in a beige shirt is sleeping in a car.'
substitution(model, s1, s3) 

s1 = 'A woman in a beige shirt is sleeping in a car.'
substitution(model, s1, s3)

s1 = 'A man in a beige shirt is driving in a car.'
substitution(model, s1, s4)

s1 = 'A person in a beige shirt is selling her car.'
substitution(model, s1, s4)

s1 = 'A boy in a red shirt is waiting in a store.'
substitution(model, s1, s5)

s1 = 'Some men in red shirts are waiting in a store.'
substitution(model, s1, s6)

s1 = 'Many women in red shirts are waiting in a store.'
substitution(model, s1, s6)

s1 = 'A girl and a boy are waiting inside a store.'
substitution(model, s1, s8)

s1 = 'A girl and a boy are waiting inside a park.'
substitution(model, s1, s8)

s1 = 'A boy is in the car.'
substitution(model, s1, s9)

s1 = 'A boy is in the store.'
substitution(model, s1, s9)

Sentence:  A boy in a beige shirt is sleeping in a car.
Predicted Entailment:  a boy is sleeping in his car

Sentence:  A girl in a beige shirt is sleeping in a car.
Predicted Entailment:  a girl is sleeping in her car

Sentence:  A man in a beige shirt is sleeping in a car.
Predicted Entailment:  a man is sleeping in his car

Sentence:  A woman in a beige shirt is sleeping in a car.
Predicted Entailment:  a woman is sleeping in her car

Sentence:  A boy in a beige shirt is sleeping in a car.
Predicted Entailment:  a boy sleeping indoors

Sentence:  A woman in a beige shirt is sleeping in a car.
Predicted Entailment:  a woman sleeping inside

Sentence:  A man in a beige shirt is driving in a car.
Predicted Entailment:  a man is driving a car

Sentence:  A person in a beige shirt is selling her car.
Predicted Entailment:  a person is selling a car

Sentence:  A boy in a red shirt is waiting in a store.
Predicted Entailment:  a boy wearing a shirt is indoors

Sentence:  Some men in red s

## Mapping Multiple Sentences to a Common Description:

Here we can draw inferences that connect a group of sentences to single sentence that they all entail.

In [29]:
s1 = 'A fisherman using a cellphone on a boat.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A Man is eating food next to a child on a bench.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A shirtless man skateboards on a ledge.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A man wearing a hat and boots is digging for something in the snow.'
s2 = 'A man is on the street'
substitution(model, s1, s2)

s1 = 'A man is on a boat.'
s2 = 'A man is outside'
substitution(model, s1, s2)

s1 = 'A man is on a bench.'
s2 = 'A man is outside'
substitution(model, s1, s2)

s1 = 'A man is on a skateboard.'
s2 = 'A man is outside'
substitution(model, s1, s2)

s1 = 'A man is in the snow.'
s2 = 'A man is outside'
substitution(model, s1, s2)


Sentence:  A fisherman using a cellphone on a boat.
Predicted Entailment:  a person is on a boat

Sentence:  A Man is eating food next to a child on a bench.
Predicted Entailment:  a man is on a bench

Sentence:  A shirtless man skateboards on a ledge.
Predicted Entailment:  a man is on a skateboard

Sentence:  A man wearing a hat and boots is digging for something in the snow.
Predicted Entailment:  a man digging in the snow

Sentence:  A man is on a boat.
Predicted Entailment:  a man is outside

Sentence:  A man is on a bench.
Predicted Entailment:  a man is outside

Sentence:  A man is on a skateboard.
Predicted Entailment:  a man is outside

Sentence:  A man is in the snow.
Predicted Entailment:  a man is outside



Here's another example of a building out an inferentail role using a single starting sentence:

In [30]:
s1 = 'Some kids are wrestling on an inflatable raft.'
s2 = 'the boy is on the beach.'
substitution(model, s1, s2)

s2 = 'the kids are outside.'
substitution(model, s1, s2)

s2 = 'Some kids wrestle outside in the sun.'
substitution(model, s1, s2)

s2 = 'The kids are with an inflatable raft.'
substitution(model, s1, s2)

s2 = 'The kids wrestle together.'
substitution(model, s1, s2)

s2 = 'young kids wrestle with each other.'
substitution(model, s1, s2)

s2 = 'old children play all over the water.'
substitution(model, s1, s2)

s2 = 'Some kids are with each other.'
substitution(model, s1, s2)

s2 = 'The kids play on a raft under the water.'
substitution(model, s1, s2)

s1 = 'Several kids are around on a raft.'
substitution(model, s1, s2)

s2 = 'They raft on three kids.'
substitution(model, s1, s2)

s2 = 'a rafts used in the match.'
substitution(model, s1, s2)

s2 = 'at least two kids are outside.'
substitution(model, s1, s2)

s1 = 'Some kids are around.'
substitution(model, s1, s2)

s2 = 'More than one kid is wet.'
substitution(model, s1, s2)

s2 = 'Those kids are not very pleased.'
substitution(model, s1, s2)

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are around .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are around on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are on a inflatable raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are around .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  several kids are on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  several kids are around on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are on a raft .

Sentence:  Some kids are wrestling on an inflatable raft.
Predicted Entailment:  some kids are on a raft of the

## Conditioned Inferences

It is also possible to constrain the decoding process to selectively navigate the inferentail role associated with a particular linguistic expression.

In [31]:
def condition(model, s1, s2, condition, sen=None):
    if sen: 
        model.encoder.forward_pass(condition)
        cond = model.encoder.get_root_embedding()
    else:
        cond = model.encoder.vectors[condition]
    
    model.encode(s1)
    model.decoder.forward_pass(s2, model.encoder.get_root_embedding() + cond)

    predicted = [node.pword for node in model.decoder.tree]
    print('Sentence: ', s1)
    print('Conditioning Context: ', condition)
    print('Predicted Entailment: ', ' '.join(predicted))
    print('')
      
s1 = 'A person wearing a red shirt is falling off a white surfboard.'
s2 = 'A person is falling into the water.'
cond_word = 'surf'
condition(model, s1, s2, cond_word)
       
cond_word = 'ocean'
condition(model, s1, s2, cond_word)

cond_word = 'swim'
condition(model, s1, s2, cond_word)

cond_word = 'fall'
condition(model, s1, s2, cond_word)

cond_word = 'white'
condition(model, s1, s2, cond_word)

s1 = "A man is steering his ship out at sea."
s2 = "A man sleeps in the ocean."
cond_word = 'water'
condition(model, s1, s2, cond_word)
       
cond_word = 'fish'
condition(model, s1, s2, cond_word)

s2 = "A man sleeps in the ocean."
cond_word = 'sails'
condition(model, s1, s2, cond_word)

cond_word = 'steering'
condition(model, s1, s2, cond_word)

cond_word = 'voyage'
condition(model, s1, s2, cond_word)

cond_word = 'sea'
condition(model, s1, s2, cond_word)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'Two people are walking.'
cond_sen = 'How many people are walking?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'The mother and daughter walk together.'
cond_sen = 'Are the mother and daughter walking?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'They are above the water.'
cond_sen = 'What are the mother and daughter doing?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'This bridge is quite tall.'
cond_sen = 'How tall is the bridge?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'Two people are together'
cond_sen = 'Are two people with one another?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the side of a bridge.'
s2 = 'mother is taller.'
cond_sen = 'Who is taller?'
condition(model, s1, s2, cond_sen, sen=True)

Sentence:  A person wearing a red shirt is falling off a white surfboard.
Conditioning Context:  surf
Predicted Entailment:  a surfer is surfing on a surfboard .

Sentence:  A person wearing a red shirt is falling off a white surfboard.
Conditioning Context:  ocean
Predicted Entailment:  a person is is on the ocean .

Sentence:  A person wearing a red shirt is falling off a white surfboard.
Conditioning Context:  swim
Predicted Entailment:  a person is swim off a surfboard .

Sentence:  A person wearing a red shirt is falling off a white surfboard.
Conditioning Context:  fall
Predicted Entailment:  a person is falls off a air .

Sentence:  A person wearing a red shirt is falling off a white surfboard.
Conditioning Context:  white
Predicted Entailment:  a person is wearing off a surfboard .

Sentence:  A man is steering his ship out at sea.
Conditioning Context:  water
Predicted Entailment:  a man is in the water .

Sentence:  A man is steering his ship out at sea.
Conditioning Context:

In [35]:
s1 = 'A mother and daughter walk along the street.'
s2 = 'One person is walking.'
cond_sen = 'How many people are walking?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the street.'
s2 = 'The mother is out for a walk.'
cond_sen = 'What are the people doing?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the street.'
s2 = 'The people are above some water.'
cond_sen = 'Where are the people?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the street.'
s2 = 'This bridge is quite tall.'
cond_sen = 'How fast are the people walking?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the street.'
s2 = 'The woman is in the center of the street'
cond_sen = 'Where on the bridge are the people?'
condition(model, s1, s2, cond_sen, sen=True)

s1 = 'A mother and daughter walk along the street.'
s2 = 'The bridge is above the water.'
cond_sen = 'What is the street over?'
condition(model, s1, s2, cond_sen, sen=True)

Sentence:  A mother and daughter walk along the street.
Conditioning Context:  How many people are walking?
Predicted Entailment:  two people are walking .

Sentence:  A mother and daughter walk along the street.
Conditioning Context:  What are the people doing?
Predicted Entailment:  a people are outside with the street .

Sentence:  A mother and daughter walk along the street.
Conditioning Context:  Where are the people?
Predicted Entailment:  the people are on the street .

Sentence:  A mother and daughter walk along the street.
Conditioning Context:  How fast are the people walking?
Predicted Entailment:  a people walking very present .

Sentence:  A mother and daughter walk along the street.
Conditioning Context:  Where on the bridge are the people?
Predicted Entailment:  the people are on the other of the water

Sentence:  A mother and daughter walk along the street.
Conditioning Context:  What is the street over?
Predicted Entailment:  the people are down the street .



In [43]:
batch = random.sample(test_data, 10)
batch = test_data[-10:]
for sample in batch:
    model.encode(sample.sentence1)
    tree = random.sample(common_parses, 1)[0][0]

    print('Sentence: ', sample.sentence1)
    print('Actual Entailment: ', sample.sentence2)
    print('Predicted Entailment: ', model.decode(tree))
    print('')

Sentence:  A couple sits in the grass.
Actual Entailment:  People are outside.
Predicted Entailment:  a couple are in the grass .

Sentence:  A woman is petting a dog outside.
Actual Entailment:  A person and an animal are interacting out of doors.
Predicted Entailment:  two woman is is .

Sentence:  Two men in karate gear using fighting sticks.
Actual Entailment:  two karate men sparring with sticks
Predicted Entailment:  the men are fighting

Sentence:  A white duck is spreading its wings while sitting on the water.
Actual Entailment:  There is an animal in the water.
Predicted Entailment:  a duck is duck .

Sentence:  A group of girls jumping over another girl who is laying on the floor.
Actual Entailment:  A group of girl is playing.
Predicted Entailment:  the girls are are of the girls .

Sentence:  A woman in a teal apron prepares a meal at a restaurant.
Actual Entailment:  A woman in restaurant
Predicted Entailment:  a woman prepares at the restaurant

Sentence:  Two men in oran