# Moving from words to sentences

### What is the most basic thing we want to be able to do with more than just word-level information?

We propose that natural language inference is a good domain to test for whether relational information between words is being used. Humans are good at it and give predictable answers to these questions, and they require concrete and tangible realtional information between words to get to the right answer.

### Datasets that require increasing amounts of non-local information

Several tasks now are interested in sentence representations that go beyond bag-of-words. Sentiment analysis and paraphrase datasets go slightly above, but a lot of the performance of most models on these comes from word-level information. While sentence representations do outperform BOW on these, it is unclear exactly where they have improved.

Natural language inference is a useful domain in which we can propose challenges that require increasingly complex compositionality and are therefore more diagnostic for what is being learnt and what isn't.

We present this set of datasets for natural language inference that humans perform predictably well on, and are impossible to capture from word-level information. 

### Choosing a vocabulary
We chose the SNLI dataset vocabulary, so that we could benchmark on the InferSent model that was trained end-to-end on natural language inference with this dataset. This assumes GloVe word embeddings.

NOTE : I haven't actually checked if these examples are within the vocab, but that's easy to do.


In [25]:
import numpy as np
id2label = {0:'CONTRADICTION', 1:'NEUTRAL', 2:'ENTAILMENT'}
label2id = {'CONTRADICTION': 0, 'NEUTRAL':1, 'ENTAILMENT':2}


### Kinds of examples:

#### Requiring word-level information regarding its symmetry

A. **Symmetric vs non-symmetric verbs (over subject-object):**

In [26]:
# Insensitive to tense for now (?)

v_ents = ['meets', 'resembles', 'is near to', 'is far from', 'converses with']#, 'is beside', 'collides with']
v_cons = ['overtakes', 'gives the hat to', 'takes the bag from', 'is behind', 'is in front of']
# also 'causes'
v_neus = ['watches', 'ignores', 'hits', 'hugs', 'shoves']#, 'admires', 'talks to']

# Perhaps I should select from the most commonly used verbs...? And get rid of preposition phrase based ones....? 
# Overlaps with comparatives slightly otherwise

nps = ["the woman in the black shirt", "the boy in the red shorts", 
       "the fat man", "the old man holding an umbrella", "the tall girl", "the old woman", 
       "the man wearing a hat", "the girl carrying a basket", "the girl", "the boy"]

sents_A = []
sents_B = []
labels = []

vs = {"ENTAILMENT": v_ents,
     "CONTRADICTION": v_cons,
     "NEUTRAL": v_neus}

for np1 in nps:
    for np2 in nps:
        for key in vs:
            for v in vs[key]:
                if (np1 != np2):
                    sents_A.append(np1 + " " + v + " " + np2 + ' . ')
                    sents_B.append(np2 + " " + v + " " + np1 + ' . ')
                    labels.append(key)
                    
#                     # self-rep
#                     sents_A.append(np1 + " " + v + " " + np2 + ' . ')
#                     sents_B.append(np1 + " " + v + " " + np2 + ' . ')
#                     labels.append('ENTAILMENT')
                    



open("testData/true/s1.verb", 'w').write("\n".join([str(x) for x in sents_A]))
open("testData/true/s2.verb", 'w').write("\n".join([str(x) for x in sents_B]))
np.savetxt("testData/true/labels.verb", [label2id[x] for x in labels], fmt='%i')

print("Total: ", len(labels), "\n")

N = 5

temp = np.random.randint(0, len(labels), N)
for i in temp:
    print(sents_A[i])
    print(sents_B[i])
    print(labels[i])
    print("\n")
    
#FUTURE
# Give people on Mturk a noun phrase + verb and ask them to fill it out with a phrase
# such that the converse makes sense (for asym_v's)
# We ca nbootstrap noun phrases that people give us for more sentences

Total:  1350 

the boy in the red shorts is behind the old woman . 
the old woman is behind the boy in the red shorts . 
CONTRADICTION


the woman in the black shirt hugs the boy . 
the boy hugs the woman in the black shirt . 
NEUTRAL


the girl hugs the boy in the red shorts . 
the boy in the red shorts hugs the girl . 
NEUTRAL


the tall girl takes the bag from the fat man . 
the fat man takes the bag from the tall girl . 
CONTRADICTION


the old woman overtakes the boy in the red shorts . 
the boy in the red shorts overtakes the old woman . 
CONTRADICTION




B. **Temporal ordering**

In [27]:
tws = ["after", "before", "while", "as"]
infs = ["CONTRADICTION", "CONTRADICTION", "ENTAILMENT", "ENTAILMENT"]
# vps = ['sat down', 'walked in', 'stood up']
# nps = ["the woman in the black shirt", "the boy in the red shorts", 
#        "the smiling man"]
vps = ['sat down', 'walked in', 'stood up', 'shouted loudly', 'frowned angrily']
nps = ["the woman in the black shirt",
       "the fat man", "the old man holding an umbrella", 
       "the thin woman","the girl", "the boy"]

sents_A = []
sents_B = []
labels = []

for vp1 in vps:
    for vp2 in vps:
        for np1 in nps:
            for np2 in nps:
                for w, inf in zip(tws, infs):
                    if ((np1 != np2) & (vp1 != vp2)):
                        sents_A.append(np1 + " " + vp1 + " " + w + " " + np2 + ' ' + vp2 + ' . ')
                        sents_B.append(np2 + " " + vp2 + " " + w + " " + np1 + ' ' + vp1 + ' . ')
                        labels.append(inf)
                        
#                         # equalize numbers of each type
#                         u = np.random.uniform()
#                         if (u > 0.5):
#                             sents_A.append(np1 + " " + vp1 + " " + w + " " + np2 + ' ' + vp2 + ' . ')
#                             sents_B.append(np2 + " " + vp1 + " " + w + " " + np1 + ' ' + vp2 + ' . ')
#                             labels.append("NEUTRAL")
                
#                 # self-rep
#                 sents_A.append(np1 + " " + w + " " + np2 + ' . ')
#                 sents_B.append(np1 + " " + w + " " + np2 + ' . ')
#                 labels.append('ENTAILMENT')
                    



open("testData/true/s1.temp", 'w').write("\n".join([str(x) for x in sents_A]))
open("testData/true/s2.temp", 'w').write("\n".join([str(x) for x in sents_B]))
np.savetxt("testData/true/labels.temp", [label2id[x] for x in labels], fmt='%i')

print("Total: ", len(labels), "\n")

N = 5
temp = np.random.randint(0, len(labels), N)
for i in temp:
    print(sents_A[i])
    print(sents_B[i])
    print(labels[i])
    print("\n")

# FUTURE
# Give people on Mturk a noun phrase + "after" and ask them to fill it out, 
# permute order and include with "before, as and while"

Total:  2400 

the woman in the black shirt shouted loudly before the fat man stood up . 
the fat man stood up before the woman in the black shirt shouted loudly . 
CONTRADICTION


the woman in the black shirt walked in while the girl sat down . 
the girl sat down while the woman in the black shirt walked in . 
ENTAILMENT


the old man holding an umbrella walked in as the fat man shouted loudly . 
the fat man shouted loudly as the old man holding an umbrella walked in . 
ENTAILMENT


the old man holding an umbrella shouted loudly while the thin woman stood up . 
the thin woman stood up while the old man holding an umbrella shouted loudly . 
ENTAILMENT


the boy stood up after the woman in the black shirt walked in . 
the woman in the black shirt walked in after the boy stood up . 
CONTRADICTION




#### Requiring bi-gram compositionality

A. Modifiers (adjectives)

In [28]:
vps = ['meets', 'resembles', 'watches', 'ignores', 'hits', 'hugs', 'shoves', 'admires', 'talks to', 'collides with']
# vps = ['meets', 'resembles', 'watches', 'ignores', 'hits']

adjs_temp = {'pos': ['tall', 'cheerful', 'big', 'fat', 'clean', 'happy'],
          'neg': ['short', 'grumpy', 'small', 'thin', 'dirty', 'sad']}
# adjs = {'pos': ['tall', 'big', 'fat'],
#           'neg': ['short', 'small', 'thin']}

adjs = {}
adjs['pos'] = adjs_temp['pos'] + adjs_temp['neg']
adjs['neg'] = adjs_temp['neg'] + adjs_temp['pos']

nps = ["woman in the black shirt",
       "man wearing a suit", "old man holding an umbrella",
       "girl", "boy"]


sents_A = []
sents_B = []
labels = []

for vp in vps:
    for np1 in nps:
        for np2 in nps:
            if (np1 != np2):
                for p, n in zip(adjs['pos'], adjs['neg']):
                    
                    sents_A.append('The ' +  np1 + ' who is ' + p + ', ' + vp + ' the ' + np2 + ' who is ' + n + ' . ')
                    sents_B.append('The ' +  np1 + ', ' + vp + ' the ' + np2 + ' who is ' + n + ' . ')
                    labels.append('ENTAILMENT')
                    
                    sents_A.append('The ' +  np1 + ' who is ' + p + ', ' + vp + ' the ' + np2 + ' who is ' + n + ' . ')
                    sents_B.append('The ' + np1 + ' who is ' + n + ', '+ vp + ' the ' + np2 + ' . ')
                    labels.append('CONTRADICTION')
                
                    
        
        
open("testData/true/s1.adjr", 'w').write("\n".join([str(x) for x in sents_A]))
open("testData/true/s2.adjr", 'w').write("\n".join([str(x) for x in sents_B]))
np.savetxt("testData/true/labels.adjr", [label2id[x] for x in labels], fmt='%i')

print("Total: ", len(labels), "\n")

N = 5
temp = np.random.randint(0, len(labels), N)
for i in temp:
    print(sents_A[i])
    print(sents_B[i])
    print(labels[i])
    print("\n")


# FUTURE
# Find all sentences in s2 from multiNLI that have two non consecutive adjectives in them and swap them, 
# check with Mturk for label - because they won't be opposites like here

Total:  4800 

The man wearing a suit who is big, shoves the boy who is small . 
The man wearing a suit, shoves the boy who is small . 
ENTAILMENT


The man wearing a suit who is dirty, hits the woman in the black shirt who is clean . 
The man wearing a suit, hits the woman in the black shirt who is clean . 
ENTAILMENT


The boy who is short, ignores the woman in the black shirt who is tall . 
The boy, ignores the woman in the black shirt who is tall . 
ENTAILMENT


The man wearing a suit who is cheerful, admires the old man holding an umbrella who is grumpy . 
The man wearing a suit who is grumpy, admires the old man holding an umbrella . 
CONTRADICTION


The boy who is sad, hugs the woman in the black shirt who is happy . 
The boy who is happy, hugs the woman in the black shirt . 
CONTRADICTION




B. Modifiers that negate - if and only if

In [29]:
connec = ['when', 'if']
phe = {'pos': ['it rains', 'there is a lot of snow', 'the wind does blow very hard',
              'there are many clouds', 'the sun is not shining', 'the air is damp'],
      'neg' : ['it does not rain', 'there is not a lot of snow', 'the wind does not blow very hard',
              'there are not many clouds', 'the sun is shining', 'the air is not damp']}

con = {'pos': ['the trees do look beautiful', 'it is very cold', 'everyone does feel sad',
              'the roads are dangerous', 'it is better to stay home', 'the dogs do not go outside'],
      'neg' : ['the trees do not look beautiful', 'it is not very cold', 'everyone does not feel sad',
              'the roads are not dangerous', 'it is not better to stay home', 'the dogs do go outside']}

sents_A = []
sents_B = []
labels = []


for conn in connec:
    for p_i in np.arange(len(phe['pos'])):
        for c_i in np.arange(len(con['pos'])):
        
            pcon = con['pos'][c_i]
            ncon = con['neg'][c_i]
        
            pphe = phe['pos'][p_i]
            nphe = phe['neg'][p_i]
        
            sents_A.append(pcon + " " + conn + " " + pphe + ' . ')
            sents_B.append(ncon + " " + conn + " " + pphe + ' . ')
            labels.append('CONTRADICTION')
        
            sents_A.append(pcon + " " + conn + " " + pphe + ' . ')
            sents_B.append(pcon + " " + conn + " " + nphe + ' . ')
            labels.append('NEUTRAL')
        
            # self-rep/rephrase
            sents_A.append(pcon + " when " + pphe + ' . ')
            sents_B.append('When ' + pphe + ', ' + pcon + ' . ')
            labels.append('ENTAILMENT')
        
    #         # two nots
    #         sents_A.append(pcon + " when " + pphe + ' . ')
    #         sents_B.append(ncon + " when " + nphe + ' . ')
    #         labels.append('NEUTRAL')
        
    #         # Rephrase
    #         sents_A.append(pcon + " when " + pphe + ' . ')
    #         sents_B.append("When " + pphe + ' , ' + ncon + ' . ')
    #         labels.append('CONTRADICTION')
        
    #         sents_A.append(pcon + " when " + pphe + ' . ')
    #         sents_B.append("When " + nphe + ' , ' + pcon + ' . ')
    #         labels.append('NEUTRAL')
        
    

open("testData/true/s1.ncon", 'w').write("\n".join([str(x) for x in sents_A]))
open("testData/true/s2.ncon", 'w').write("\n".join([str(x) for x in sents_B]))
np.savetxt("testData/true/labels.ncon", [label2id[x] for x in labels], fmt='%i')

print("Total: ", len(labels), "\n")

N = 5
temp = np.random.randint(0, len(labels), N)
for i in temp:
    print(sents_A[i])
    print(sents_B[i])
    print(labels[i])
    print("\n")



Total:  216 

it is very cold if the sun is not shining . 
it is not very cold if the sun is not shining . 
CONTRADICTION


the trees do look beautiful when it rains . 
the trees do not look beautiful when it rains . 
CONTRADICTION


the trees do look beautiful if the wind does blow very hard . 
the trees do not look beautiful if the wind does blow very hard . 
CONTRADICTION


it is very cold when there is a lot of snow . 
it is very cold when there is not a lot of snow . 
NEUTRAL


it is very cold when the sun is not shining . 
it is very cold when the sun is shining . 
NEUTRAL




C. With but/however/whereas discourse markers

Could also add although?

In [30]:
# Generate discourse marked examples
discs = ['however', 'but', 'whereas']
nps = ["the woman in the black shirt",
       "the fat man", "the old man holding an umbrella", "the girl", "the boy", 
       "the man wearing a hat", "the girl carrying a basket"]

vps = ['sit down', 'walk in', 'stand up', 'shout loudly', 'frown angrily']

sents_A = []
sents_B = []
labels = []

for disc in discs:
    for np1 in nps:
        for np2 in nps:
            if (np1 != np2):
                for vp in vps:
                    sents_A.append(np1 + " does " + vp + " , " + disc + " " + np2 + ' does not ' + vp + ' . ')
                    sents_B.append(np1 + " does " + vp + ' . ')
                    labels.append('ENTAILMENT')
                    
                    sents_A.append(np1 + " does " + vp + " , " + disc + " " + np2 + ' does not ' + vp + ' . ')
                    sents_B.append(np1 + " does not " + vp + ' . ')
                    labels.append('CONTRADICTION')
                    
                    sents_A.append(np1 + " does " + vp + " , " + disc + " " + np2 + ' does not ' + vp + ' . ')
                    sents_B.append(np2 + " does " + vp + ' . ')
                    labels.append('CONTRADICTION')
                    
                    sents_A.append(np1 + " does " + vp + " , " + disc + " " + np2 + ' does not ' + vp + ' . ')
                    sents_B.append(np2 + " does not " + vp + ' . ')
                    labels.append('ENTAILMENT')
                    
                    sents_A.append(np1 + " does not " + vp + " , " + disc + " " + np2 + ' does ' + vp + ' . ')
                    sents_B.append(np1 + " does " + vp + ' . ')
                    labels.append('CONTRADICTION')
                    
                    sents_A.append(np1 + " does not " + vp + " , " + disc + " " + np2 + ' does ' + vp + ' . ')
                    sents_B.append(np1 + " does not " + vp + ' . ')
                    labels.append('ENTAILMENT')
                    
                    sents_A.append(np1 + " does not " + vp + " , " + disc + " " + np2 + ' does ' + vp + ' . ')
                    sents_B.append(np2 + " does " + vp + ' . ')
                    labels.append('ENTAILMENT')
                    
                    sents_A.append(np1 + " does not " + vp + " , " + disc + " " + np2 + ' does ' + vp + ' . ')
                    sents_B.append(np2 + " does not " + vp + ' . ')
                    labels.append('CONTRADICTION')
                    
                                    



open("testData/true/s1.subjv", 'w').write("\n".join([str(x) for x in sents_A]))
open("testData/true/s2.subjv", 'w').write("\n".join([str(x) for x in sents_B]))
np.savetxt("testData/true/labels.subjv", [label2id[x] for x in labels], fmt='%i')

print("Total: ", len(labels), "\n")

N = 5
temp = np.random.randint(0, len(labels), N)
for i in temp:
    print(sents_A[i])
    print(sents_B[i])
    print(labels[i])
    print("\n")


Total:  5040 

the girl carrying a basket does sit down , however the man wearing a hat does not sit down . 
the girl carrying a basket does sit down . 
ENTAILMENT


the boy does not frown angrily , but the woman in the black shirt does frown angrily . 
the boy does frown angrily . 
CONTRADICTION


the girl carrying a basket does not frown angrily , but the woman in the black shirt does frown angrily . 
the woman in the black shirt does not frown angrily . 
CONTRADICTION


the woman in the black shirt does sit down , however the man wearing a hat does not sit down . 
the man wearing a hat does not sit down . 
ENTAILMENT


the fat man does not frown angrily , however the man wearing a hat does frown angrily . 
the man wearing a hat does frown angrily . 
ENTAILMENT




#### Requiring bigram compositionality as well as symmetry understanding

A. Comparatives

In [31]:
# should I do taller and not taller? Or taller and shorter?

comps_p = {'pos': ['taller', 'more cheerful', 'more tired', 'better dressed'],
          'neg': ['shorter', 'less cheerful', 'less tired', 'not better dressed']}

comps_o = {'pos': ['bigger','heavier', 'more expensive'],
           'neg' : ['smaller', 'lighter', 'cheaper']}

comps_t = {'pos': ['longer'], 
           'neg': ['shorter']}

nps_p = ["the woman in the black shirt", "the boy", 
       "the fat man", "the man holding an umbrella", "the girl", "the old woman"]
nps_o = ['the brown table', 'the metal chair', 'the green cabinet', 'the wooden dresser', 
         'the book case', 'the old hat stand']
nps_t = ['the art film', 'the classical music concert', 'the theatre performance']

sents_A = []
sents_B = []
labels = []

nps = {'obj' : nps_o,
      'pers': nps_p,
      'time': nps_t}

comps = {'obj' : comps_o,
      'pers': comps_p,
      'time': comps_t}

for key in nps:
    for np1 in nps[key]:
        for np2 in nps[key]:
            for p, n in zip(comps[key]['pos'], comps[key]['neg']):
                if (np1 != np2):
                    
#                     # words not exactly the same - one word difference
                    
                    sents_A.append(np1 + " is " + p + " than " + np2 + ' . ')
                    sents_B.append(np2 + " is " + n + " than " + np1 + ' . ')
                    labels.append('ENTAILMENT')
                    
                    sents_A.append(np1 + " is " + n + " than " + np2 + ' . ')
                    sents_B.append(np2 + " is " + p + " than " + np1 + ' . ')
                    labels.append('ENTAILMENT')

                    sents_A.append(np1 + " is " + p + " than " + np2 + ' . ')
                    sents_B.append(np1 + " is " + n + " than " + np2 + ' . ')
                    labels.append('CONTRADICTION')
                    
                    sents_A.append(np1 + " is " + n + " than " + np2 + ' . ')
                    sents_B.append(np1 + " is " + p + " than " + np2 + ' . ')
                    labels.append('CONTRADICTION')
                    
                    sents_A.append(np1 + " is " + p + " than " + np2 + ' . ')
                    sents_B.append(np2 + " is " + p + " than " + np1 + ' . ')
                    labels.append('CONTRADICTION')
                    
                    sents_A.append(np1 + " is " + n + " than " + np2 + ' . ')
                    sents_B.append(np2 + " is " + n + " than " + np1 + ' . ')
                    labels.append('CONTRADICTION')
                    
#                     # Self - rep
                    sents_A.append(np1 + " is " + p + " than " + np2 + ' . ')
                    sents_B.append(np1 + " is " + p + " than " + np2 + ' . ')
                    labels.append('ENTAILMENT')
                    
                    sents_A.append(np1 + " is " + n + " than " + np2 + ' . ')
                    sents_B.append(np1 + " is " + n + " than " + np2 + ' . ')
                    labels.append('ENTAILMENT')




open("testData/true/s1.comp", 'w').write("\n".join([str(x) for x in sents_A]))
open("testData/true/s2.comp", 'w').write("\n".join([str(x) for x in sents_B]))
np.savetxt("testData/true/labels.comp", [label2id[x] for x in labels], fmt='%i')

print("Total: ", len(labels), "\n")

N = 5
temp = np.random.randint(0, len(labels), N)
for i in temp:
    print(sents_A[i])
    print(sents_B[i])
    print(labels[i])
    print("\n")

#FUTURE
# Might be easier to just give Mturkers the _np_ is _"more"_ _adj_ _np_ framework and ask to fill?


Total:  1728 

the old woman is shorter than the boy . 
the boy is taller than the old woman . 
ENTAILMENT


the old hat stand is more expensive than the metal chair . 
the metal chair is more expensive than the old hat stand . 
CONTRADICTION


the girl is better dressed than the man holding an umbrella . 
the girl is not better dressed than the man holding an umbrella . 
CONTRADICTION


the green cabinet is bigger than the old hat stand . 
the old hat stand is bigger than the green cabinet . 
CONTRADICTION


the girl is more tired than the woman in the black shirt . 
the woman in the black shirt is less tired than the girl . 
ENTAILMENT




#### Requiring knowledge of part of speech changes (?)

A. verb - noun distinction

        You can model the train ; 
        you can train the model
        (NEUTRAL)

B. phrase structure/finding objects (?)

        Two men are sitting on the hay making rice ; 
        Two men are sitting on the rice making hay
        (NEUTRAL)

C. Symmetric connectives:

These are all entailment though - so it's less interesting, i.e. even BOW should get this right if it entails for equal.