# Step 0: setting the colab environment

Upload the [Logic-and-Language-Final-Project](https://github.com/madeofstardust/Logic-and-Language-Final-Project) folder to your google drive. Give google colab the access to your google drive.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Path should be the location of your Logic-and-Language-Final-Project folder.

In [2]:
import os
path="/content/drive/MyDrive/Logic-and-Language-Final-Project-main"
os.chdir(path)

# Step 1: preparing the data structure.

## Loading the data
The example we use in this project come from this repository: https://github.com/kovvalsky/LangPro
They have been saved to a txt file after slight modification (replacing "\" with "\_"), which makes it easier to load them into python. 

The "examples" list contains separate examples from the prolog file ""

In [3]:
examples = []
with open("SICK_trial_ccg.txt", "r") as f:
    lines = [line for line in f]
    curr_example = ""
    for line in lines:
        if line == '\n':
            examples.append(curr_example)
            curr_example = ""
        else:
            curr_example += line

## Building the tree structure

In [4]:
from TreeNode import TreeNode
from build_tree_utils import get_line_dict, build_the_tree

The function "reload tree" creates trees from the list of examples; it can also be done for debugging purposes, as it creates a pure tree, without any parameters sepcified (only parents and children relationships are created) 
Parameter "n" specifies the number of examples that are being loaded. 

In [5]:
# input: ccg derivation in string format
# output: tree
examples_roots = {}
def reload_tree():
    examples_roots = {}
    for i in range(0, 15):
        line_dict = get_line_dict(examples[i]) # ccg derivation -> lines
        roots, leaf = build_the_tree(line_dict) # lines -> tree
        examples_roots[i] = (examples[i], roots, leaf)
  
    return examples_roots

examples_roots = reload_tree()

At this point, the representation of each example is saved in examples_roots dictionary:
* the keys are derivations ids;
* first value is the prolog derivation in string format
* the second one - the roots of the derivation, and 
* the third one - the leaf (which should, in the end, represent the final meaning of the derivation.

# Step 2: Getting First Order Logic Meaning Representation

We defined lexical semantic in a dictionary, which uses POS tags, types and some exceptions as keys. See the file lexical_semantic_rules.py to see it in details.
The function "get_compositional_semantics" traverse the tree and sets compositional semantics to each node.

In [6]:
from lexical_semantic_rules import lexical_semantic_rules 
from get_semantics import get_compositional_semantics, get_lexical_semantics 

from nltk.sem.logic import *
read_expr = Expression.fromstring

The function "derivation_to_fol" takes id of the derivation as an argument, and return First Order Logic Formula representing the meaning of this derivation

In [7]:
# input: id of derivation
# output: the FOL representation of the derivation
def derivation_to_FOL(id, and_index = -1):
  line_dict = get_line_dict(examples[id - 1]) # ccg derivation -> lines
  roots, leaf = build_the_tree(line_dict) # lines -> tree
  examples_roots = (examples[id - 1], roots, leaf)

  and_node = -1
  # get lexical semantics for every word
  for i in range(0,len(examples_roots[1])):
    if (examples_roots[1][i].get_word() == 'and'):
      and_node = i
    get_lexical_semantics(examples_roots[1][i])
  
  get_compositional_semantics(examples_roots[2])

  # return FOL
  lr = examples_roots[2].get_lexical_semantics()
  fol = read_expr(f"({lr})(True)").simplify()
  return str(fol)

# Step 3: Get relations from WordNet and model them as FOL formulas


In [8]:
import nltk
from nltk.corpus import wordnet as wn
from nltk.corpus.reader.wordnet import Synset
from typing import List
nltk.download('wordnet')
nltk.download('omw-1.4')
import re

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


Define is_a relationship based on hypernym relationships from wordnet:

In [9]:
def is_a(w1: str, w2: str):
    ''' 
    return true if w1 is more specific than w2
    return false otherwise
    '''
    w1list = wn.synsets(w1)
    w2list = wn.synsets(w2)
    for syn in w1list:
        queue = []
        queue.append(syn)
        seen = set()
        seen.add(syn)
        while len(queue) > 0:
            start = queue.pop(0)
            end = start.hypernyms()
            for e in end:
              if e not in seen:
                  if e in w2list:
                    seen.add(e)
                    return True
              if e not in w2list:
                  queue.append(e)
                  seen.add(e)
    return False

In [10]:
def are_synonyms(w1: str, w2: str):
    ''' 
    return true if w1 and w2 are synonyms
    return false otherwise
    '''
    synonyms = [] 
    for syn in wn.synsets(w1): 
      for lm in syn.lemmas(): 
        synonyms.append(lm.name()) 
    if w2 in synonyms:
      return True
    else:
      return False

In [11]:
def are_antonyms(w1: str, w2: str):
    ''' 
    return true if w1 and w2 are synonyms
    return false otherwise
    '''
    for syn in wn.synsets(w1):
      for lm in syn.lemmas():
          if lm.antonyms():
              if w2 == lm.antonyms()[0].name():
                return True
    return False

In [12]:
knowledge_base = {}
with open("knowledge.txt", "r") as f:
  lines = [line for line in f]
  
for line in lines:
  res = re.findall(r'is_a\((.*?)\)', line)
  if len(res) == 1:
    res = res[0].split(', ')
  if len(res) == 2:
    knowledge_base[res[0]] = res[1]
    knowledge_base[res[1]] = res[0]

print(knowledge_base)

{'lark': 'bird', 'bird': 'lark', 'hound': 'dog', 'dog': 'animal', 'animal': 'dog', 'cat': 'animal', 'student': 'person', 'person': 'cyclist', 'bachelor': 'man', 'man': 'cowboy', 'woman': 'lady', 'human': 'person', 'european': 'person', 'fly': 'hover', 'hover': 'move', 'move': 'run', 'walk': 'move', 'run': 'move', 'obtain': 'receive', 'receive': 'obtain', 'build': 'finish', 'finish': 'build', 'kiss': 'touch', 'touch': 'kiss', 'snore': 'sleep', 'sleep': 'snore', "'girl'": "'young woman'", "'young woman'": "'girl'", "'boy'": "'young man'", "'young man'": "'boy'", "'polish'": "'clean'", "'clean'": "'polish'", "'trek'": "'hike'", "'hike'": "'trek'", "'dash'": "'jump'", "'jump'": "'dive'", "'bounce'": "'jump'", "'dive'": "'jump'", "'run'": "'sprint'", "'sprint'": "'run'", "'look'": "'stare'", "'stare'": "'look'", "'bikini'": "'swimming suite'", "'swimming suite'": "'bikini'", 'climb': 'climb_up', 'climb_up': 'climb', 'play': 'strum', 'practice': 'play', 'note': 'paper', 'paper': 'sheet', 'fi

We only search word relations between the lemmas in the FOL formulas.We use the function below to extract lemmas from FOL formulas.  

In [13]:
# extract words from FOL
def fol_to_word(s):
  # remove numbers
  s = re.sub(r'[0-9]+', '', s)
  # remove 'exists' 'all'
  s = s.replace("exists", "")
  s = s.replace("all", "")
  # "&" to "("
  s = s.replace("&", "(")
  # split "("
  l = s.split("(")
  ll = []
  for i in range(0,len(l)):
    if '.' in l[i]:
      ll.append(l[i])
    elif ')' in l[i]:
      ll.append(l[i])
    elif 'argu' in l[i]:
      ll.append(l[i])
    elif 'True' in l[i]:
      ll.append(l[i])
  s = list(set(l) - set(ll))
  return s

In [14]:
def get_knowledge(hypothesis: str, premises: List[str]):
  add_premises = []
  add_fol = []
  for premise in premises:
    p_word = premise.split(' ')
    h_word = hypothesis.split(' ')
    same_word = set(p_word).intersection(set(h_word))
    p_word = set(p_word) - same_word
    h_word = set(h_word) - same_word
    # print(p_word,h_word)
    for w1 in p_word:
      for w2 in h_word:
        # check knowledge base and wordnet for is_a relations for w1 and w1
        if is_a(w1, w2) or (w1 in knowledge_base.keys() and knowledge_base[w1] == w2):
          add_premises.append(f'{w1} is a {w2}')
          add_fol.append(f'all x.({w1}(x) -> {w2}(x))')
        # check wordnet for are_synonyms relations for w1 and w1
        if are_synonyms(w1, w2):
          add_premises.append(f'{w1} is equal to {w2}')
          add_fol.append(f'all x.({w1}(x) <-> {w2}(x))')
        # check wordnet for are_antonyms relations for w1 and w1
        if are_antonyms(w1, w2):
          add_premises.append(f'{w1} is not a {w2}')
          add_fol.append(f'all x.({w1}(x) -> -{w2}(x))')
          add_fol.append(f'all x.({w2}(x) -> -{w1}(x))')
  return add_premises, add_fol

# Step 4: Use Tableau prover to detect inference label

We decided to use prover9, which needs to be externally downloaded.

In [15]:
os.chdir("/content")

In [16]:
%%bash
prover9_file_name="p9m4-v05.tar.gz"
[[ ${prover9_file_name} =~ (.+)\.tar\.gz ]]
prover9_folder_name=${BASH_REMATCH[1]}
if [[ ! -d ${prover9_folder_name} ]]; then
  curl -L "https://www.cs.unm.edu/~mccune/prover9/gui/$prover9_file_name" -o ${prover9_file_name}
  tar -xvzf ${prover9_file_name}
  mv ${prover9_folder_name} 'prover9'
  rm ${prover9_file_name}
fi

In [17]:
def prover9_prove(conclusion: str, premises: List[str] = [], path=r"/content/prover9/bin") -> bool:
    """ 
    Give a conclusion and a list of premises, builds a tableau and
    detects whether the premises entail the conclusion.
    Returns a boolean value and optionally prints the tableau structure
    """
    str2exp = nltk.sem.Expression.fromstring
    c = str2exp(conclusion)
    ps = [ str2exp(p) for p in premises ] 
    prover9 = nltk.Prover9()
    if path: prover9.config_prover9(path) 
    return prover9.prove(c, ps)

## A short test:

In [18]:
premise = derivation_to_FOL(15)
hypothesis = derivation_to_FOL(16)

In [19]:
def get_inference_label(hypothesis: str, premise: List[str]):
  try:
    entailment = prover9_prove('-(' + hypothesis + ')', premise)
    contradiction = prover9_prove(hypothesis, premise)
    # print(f'Contradiction is {contradiction} and entailment is {entailment}')
    if abs(len(premise[0]) - len(hypothesis)) == 1 and '-' in list(set(premise[0]) ^ set(hypothesis)):
      return 'no'
    if entailment and not contradiction:
      return 'no'
    if contradiction and not entailment:
      return 'yes'
    else: 
      return 'unknown'
  except:
    print("Could not find a proof: return \'neutral\'")
    return 'unknown'

In [20]:
get_inference_label(hypothesis,[premise,"all x.(tree(x) -> plant(x))"])

'yes'

# Step 5: evaluation

In [21]:
import pandas as pd

## Trial set

You can run our model on other datasets(for example, SICK train set) by modifying the value of ccg_path and sen_path. ccg_path is the path of the ccg derivation file, sen_path is the path of the problem file.

In [22]:
ccg_path = path + "/SICK_trial_ccg.txt"
sen_path = path + "/SICK_trial_sen.txt"

In [23]:
'''
run on data set. catch the error when running.
input: sick dataset file name + ccg derivation file name
output: data(dataframe) + error
data(dataframe): problem_id(int), p_sen(str), h_sen(str), p_dev(int), h_dev(int), label(str), p_fol(str), h_fol(str), add_p(list of str), add_fol(list of fol), result(str)
'''
data = pd.DataFrame(columns=['problem_id','p_sen','h_sen', 'p_dev', 'h_dev', 'label', 'add_p', 'add_fol', 'p_fol', 'h_fol', 'result'])
data.set_index('problem_id')
error_cnt = {'LogicalExpressionException':0, "UnboundLocalError":0, "OtherError":0}

# ccg_derivation txt file ---> examples
# example[x] ---> the xth derivation in LOLA_SICK_corrected_derivations.txt
examples = []
with open(ccg_path, "r") as f:
    lines = [line for line in f]
    curr_example = ""
    for line in lines:
        if line == '\n':
            examples.append(curr_example)
            curr_example = ""
        else:
            curr_example += line

In [24]:
with open(sen_path, "r") as file:
  line = [l for l in file]
  for i in range(0,len(line),3):
    # print(line[i])
    problem_id = int(line[i].split('=')[1].strip())
    p_sen = line[i+1].split('(')[1].split(',')[4][2:-4]
    h_sen = line[i+2].split('(')[1].split(',')[4][2:-4]
    p_dev = int(line[i+1].split('(')[1].split(',')[0])
    h_dev = int(line[i+2].split('(')[1].split(',')[0])
    label = line[i+1].split('(')[1].split(',')[3][2:-1]

    # add_p, add_fol = get_knowledge(h_sen, [p_sen])
    try: 
      p_fol = derivation_to_FOL(p_dev)
      h_fol = derivation_to_FOL(h_dev)
    except LogicalExpressionException as e:
      print(f"{problem_id}: can't get a FOL formula based on the given lexical semantics")
      error_cnt["LogicalExpressionException"] += 1
      result = "unknown"
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,'', '', '', '',result]
    except UnboundLocalError as e:
      print(f"{problem_id}: missing lexical semantics")
      error_cnt["UnboundLocalError"] += 1
      result = "unknown"
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,'', '', '', '',result]
    except:
      print(f"{problem_id}: other problems")
      error_cnt["OtherError"] += 1
      result = "unknown"
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,'', '', '', '',result]
    else:
      add_p, add_fol = get_knowledge(' '.join(fol_to_word(h_fol)), [' '.join(fol_to_word(p_fol))])
      if 'patient' in p_fol and 'by' in p_fol:
        add_fol.append('all x all y (patient(x,y) <-> argu2(x,y))')
        add_fol.append('all x all y (by(x,y) <-> argu1(x,y))')
      elif 'patient' in h_fol and 'by' in h_fol:
        add_fol.append('all x all y (patient(x,y) <-> argu2(x,y))')
        add_fol.append('all x all y (by(x,y) <-> argu1(x,y))')
      result = get_inference_label(h_fol,[p_fol] + add_fol)
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,add_p,add_fol,p_fol,h_fol,result]

4: can't get a FOL formula based on the given lexical semantics
105: can't get a FOL formula based on the given lexical semantics
197: can't get a FOL formula based on the given lexical semantics
218: can't get a FOL formula based on the given lexical semantics
219: can't get a FOL formula based on the given lexical semantics
236: can't get a FOL formula based on the given lexical semantics
253: can't get a FOL formula based on the given lexical semantics
285: can't get a FOL formula based on the given lexical semantics
317: can't get a FOL formula based on the given lexical semantics
384: can't get a FOL formula based on the given lexical semantics
394: can't get a FOL formula based on the given lexical semantics
417: can't get a FOL formula based on the given lexical semantics
450: can't get a FOL formula based on the given lexical semantics
520: can't get a FOL formula based on the given lexical semantics
526: can't get a FOL formula based on the given lexical semantics
592: can't g

In [25]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

# result analysis
# accuracy(total)
print("correct problems: ",len(data[data['label'] == data['result']]))
print("total problems: ",len(data))
print("total accuracy: ",len(data[data['label'] == data['result']])/len(data))
print("\n")
# accuracy(translated problems)
translate_data = data[data['p_fol'] != '']
print("correct problems: ",len(translate_data[translate_data['label'] == translate_data['result']]))
print("translated problems: ",len(translate_data))
print("accuracy(translated problems): ",len(translate_data[translate_data['label'] == translate_data['result']])/len(translate_data))
print("\n")
# label distribution
print("the distribution of gold labels:")
print(data['label'].value_counts(normalize = True))
print("\nthe distribution of labels we get:")
print(data['result'].value_counts(normalize = True))
# confusion matrix
print("\nconfusion matrix")
print("contradiction neutral entailment")
print(confusion_matrix(data['label'], data['result']))
print("\n")
# precision
print("precise: ")
print(precision_score(data['label'], data['result'], average=None))
print("\n")
# recall
print("recall: ")
print(recall_score(data['label'], data['result'], average=None)) 

correct problems:  341
total problems:  500
total accuracy:  0.682


correct problems:  214
translated problems:  272
accuracy(translated problems):  0.7867647058823529


the distribution of gold labels:
unknown    0.564
yes        0.288
no         0.148
Name: label, dtype: float64

the distribution of labels we get:
unknown    0.864
yes        0.108
no         0.028
Name: result, dtype: float64

confusion matrix
contradiction neutral entailment
[[ 10  64   0]
 [  3 278   1]
 [  1  90  53]]


precise: 
[0.71428571 0.64351852 0.98148148]


recall: 
[0.13513514 0.9858156  0.36805556]


In [26]:
# problems we labeled wrong
wrong_data = data[data['label'] != data['result']]

print("wrong problems: \n")
for i in range(0,len(wrong_data)):
    print(wrong_data.iloc[i]['problem_id'], "label: ", wrong_data.iloc[i]['label'], "   result: ", wrong_data.iloc[i]['result'])
    print("premise: ", wrong_data.iloc[i]['p_sen'])
    print("add_premise: ", wrong_data.iloc[i]['add_fol'])
    print("hypothesis: ", wrong_data.iloc[i]['h_sen'])
    print("\n")

wrong problems: 

4 label:  no    result:  unknown
premise:  The young boys are playing outdoors and the man is smiling nearby
add_premise:  
hypothesis:  There is no boy playing outdoors and there is no man smiling


218 label:  yes    result:  unknown
premise:  A girl in white is dancing
add_premise:  
hypothesis:  A girl is wearing white clothes and is dancing


219 label:  no    result:  unknown
premise:  There is no girl in white dancing
add_premise:  
hypothesis:  A girl in white is dancing


253 label:  yes    result:  unknown
premise:  A hiker is on top of the mountain and is doing a joyful dance
add_premise:  
hypothesis:  A hiker is on top of the mountain and is dancing


384 label:  yes    result:  unknown
premise:  A white and tan dog is running through the tall and green grass
add_premise:  
hypothesis:  A white and tan dog is running through a field


592 label:  no    result:  unknown
premise:  A woman is taking off a cl
add_premise:  
hypothesis:  A woman is putting on 

In [27]:
# examples of problems we translated successfully

print("translated problems: \n")
for i in range(0,int(len(translate_data)/10)):
  print(translate_data.iloc[i]['problem_id'], "gold label: ", translate_data.iloc[i]['label'], "   result: ", translate_data.iloc[i]['result'])
  print("premise: ", translate_data.iloc[i]['p_sen'])
  print("premise fol: ", translate_data.iloc[i]['p_fol'])
  print("add_premise: ", translate_data.iloc[i]['add_fol'])
  print("hypothesis: ", translate_data.iloc[i]['h_sen'])
  print("hypothesis fol: ", translate_data.iloc[i]['h_fol'])
  print("\n")

translated problems: 

24 gold label:  unknown    result:  unknown
premise:  A person in a black jacket is doing tricks on a motorbike
premise fol:  exists x.(person(x) & exists z13.(exists z12.(black(z12) & argu1(z13,z12) & jacket(z13)) & in(x,z13) & exists z14.(trick(z14) & exists e.(do(e) & argu1(e,x) & argu2(e,z14) & exists z14.(motorbike(z14) & on(e,z14) & True(e))))))
add_premise:  ['all x.(motorbike(x) -> ride(x))']
hypothesis:  A skilled person is riding a bicycle on one wheel
hypothesis fol:  exists x.(exists z15.(skilled(z15) & argu1(x,z15) & person(x)) & exists z17.(bicycle(z17) & exists e.(ride(e) & argu1(e,x) & argu2(e,z17) & exists z17.(exists z16.(one(z16) & argu1(z17,z16) & wheel(z17)) & on(e,z17) & True(e)))))


116 gold label:  unknown    result:  unknown
premise:  A player is throwing the ball
premise fol:  exists x.(player(x) & exists z22.(ball(z22) & exists e.(throw(e) & argu1(e,x) & argu2(e,z22) & True(e))))
add_premise:  ['all x.(player(x) -> match(x))']
hypothes

## Test set

In [28]:
ccg_path = path + "/SICK_test_ccg.txt"
sen_path = path + "/SICK_test_sen.txt"

In [29]:
'''
run on data set. catch the error when running.
input: sick dataset file name + ccg derivation file name
output: data(dataframe) + error
data(dataframe): problem_id(int), p_sen(str), h_sen(str), p_dev(int), h_dev(int), label(str), p_fol(str), h_fol(str), add_p(list of str), add_fol(list of fol), result(str)
'''
data = pd.DataFrame(columns=['problem_id','p_sen','h_sen', 'p_dev', 'h_dev', 'label', 'add_p', 'add_fol', 'p_fol', 'h_fol', 'result'])
data.set_index('problem_id')
error_cnt = {'LogicalExpressionException':0, "UnboundLocalError":0, "OtherError":0}

# ccg_derivation txt file ---> examples
# example[x] ---> the xth derivation in LOLA_SICK_corrected_derivations.txt
examples = []
with open(ccg_path, "r") as f:
    lines = [line for line in f]
    curr_example = ""
    for line in lines:
        if line == '\n':
            examples.append(curr_example)
            curr_example = ""
        else:
            curr_example += line

In [30]:
with open(sen_path, "r") as file:
  line = [l for l in file]
  for i in range(0,len(line),3):
    # print(line[i])
    problem_id = int(line[i].split('=')[1].strip())
    p_sen = line[i+1].split('(')[1].split(',')[4][2:-4]
    h_sen = line[i+2].split('(')[1].split(',')[4][2:-4]
    p_dev = int(line[i+1].split('(')[1].split(',')[0])
    h_dev = int(line[i+2].split('(')[1].split(',')[0])
    label = line[i+1].split('(')[1].split(',')[3][2:-1]

    # add_p, add_fol = get_knowledge(h_sen, [p_sen])
    try: 
      p_fol = derivation_to_FOL(p_dev)
      h_fol = derivation_to_FOL(h_dev)
    except LogicalExpressionException as e:
      print(f"{problem_id}: can't get a FOL formula based on the given lexical semantics")
      error_cnt["LogicalExpressionException"] += 1
      result = "unknown"
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,'', '', '', '',result]
    except UnboundLocalError as e:
      print(f"{problem_id}: missing lexical semantics")
      error_cnt["UnboundLocalError"] += 1
      result = "unknown"
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,'', '', '', '',result]
    except:
      print(f"{problem_id}: other problems")
      error_cnt["OtherError"] += 1
      result = "unknown"
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,'', '', '', '',result]
    else:
      add_p, add_fol = get_knowledge(' '.join(fol_to_word(h_fol)), [' '.join(fol_to_word(p_fol))])
      if 'patient' in p_fol and 'by' in p_fol:
        add_fol.append('all x all y (patient(x,y) <-> argu2(x,y))')
        add_fol.append('all x all y (by(x,y) <-> argu1(x,y))')
      elif 'patient' in h_fol and 'by' in h_fol:
        add_fol.append('all x all y (patient(x,y) <-> argu2(x,y))')
        add_fol.append('all x all y (by(x,y) <-> argu1(x,y))')
      result = get_inference_label(h_fol,[p_fol] + add_fol)
      data.loc[problem_id] = [problem_id,p_sen,h_sen,p_dev,h_dev,label,add_p,add_fol,p_fol,h_fol,result]

6: can't get a FOL formula based on the given lexical semantics
8: can't get a FOL formula based on the given lexical semantics
13: can't get a FOL formula based on the given lexical semantics
15: can't get a FOL formula based on the given lexical semantics
16: can't get a FOL formula based on the given lexical semantics
17: can't get a FOL formula based on the given lexical semantics
20: can't get a FOL formula based on the given lexical semantics
27: can't get a FOL formula based on the given lexical semantics
31: can't get a FOL formula based on the given lexical semantics
32: can't get a FOL formula based on the given lexical semantics
33: can't get a FOL formula based on the given lexical semantics
34: can't get a FOL formula based on the given lexical semantics
36: can't get a FOL formula based on the given lexical semantics
37: can't get a FOL formula based on the given lexical semantics
38: can't get a FOL formula based on the given lexical semantics
39: can't get a FOL formula

In [31]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

# result analysis
# accuracy(total)
print("correct problems: ",len(data[data['label'] == data['result']]))
print("total problems: ",len(data))
print("total accuracy: ",len(data[data['label'] == data['result']])/len(data))
print("\n")
# accuracy(translated problems)
translate_data = data[data['p_fol'] != '']
print("correct problems: ",len(translate_data[translate_data['label'] == translate_data['result']]))
print("translated problems: ",len(translate_data))
print("accuracy(translated problems): ",len(translate_data[translate_data['label'] == translate_data['result']])/len(translate_data))
print("\n")
# label distribution
print("the distribution of gold labels:")
print(data['label'].value_counts(normalize = True))
print("\nthe distribution of labels we get:")
print(data['result'].value_counts(normalize = True))
# confusion matrix
print("\nconfusion matrix")
print("contradiction neutral entailment")
print(confusion_matrix(data['label'], data['result']))
print("\n")
# precision
print("precise: ")
print(precision_score(data['label'], data['result'], average=None))
print("\n")
# recall
print("recall: ")
print(recall_score(data['label'], data['result'], average=None)) 

correct problems:  3471
total problems:  4927
total accuracy:  0.7044854881266491


correct problems:  2205
translated problems:  2804
accuracy(translated problems):  0.786376604850214


the distribution of gold labels:
unknown    0.566876
yes        0.286990
no         0.146134
Name: label, dtype: float64

the distribution of labels we get:
unknown    0.845545
yes        0.116704
no         0.037751
Name: result, dtype: float64

confusion matrix
contradiction neutral entailment
[[ 168  548    4]
 [  17 2754   22]
 [   1  864  549]]


precise: 
[0.90322581 0.66106577 0.95478261]


recall: 
[0.23333333 0.98603652 0.38826025]


In [32]:
# problems we labeled wrong
wrong_data = data[data['label'] != data['result']]

print("wrong problems: \n")
for i in range(0,len(wrong_data)):
    print(wrong_data.iloc[i]['problem_id'], "label: ", wrong_data.iloc[i]['label'], "   result: ", wrong_data.iloc[i]['result'])
    print("premise: ", wrong_data.iloc[i]['p_sen'])
    print("add_premise: ", wrong_data.iloc[i]['add_fol'])
    print("hypothesis: ", wrong_data.iloc[i]['h_sen'])
    print("\n")

[1;30;43m流式输出内容被截断，只能显示最后 5000 行内容。[0m


4008 label:  yes    result:  unknown
premise:  A man is recklessly climbing a rope
add_premise:  
hypothesis:  A man is climbing up a rope


4016 label:  no    result:  unknown
premise:  The woman is putting down the kangaroo
add_premise:  
hypothesis:  The woman is picking up the kangaroo


4018 label:  yes    result:  unknown
premise:  The woman is picking up a baby kangaroo
add_premise:  
hypothesis:  The lady is picking up the kangaroo


4022 label:  no    result:  unknown
premise:  The woman is not picking up a baby kangaroo
add_premise:  
hypothesis:  The woman is picking up the kangaroo


4024 label:  yes    result:  unknown
premise:  The woman is picking up a baby kangaroo
add_premise:  
hypothesis:  The woman is picking up the kangaroo


4025 label:  yes    result:  unknown
premise:  A man is breaking tiles with his hands
add_premise:  []
hypothesis:  Tiles are being broken with his hands by a man


4027 label:  yes    result:  unknow

In [33]:
# examples of problems we translated successfully

print("translated problems: \n")
for i in range(0,int(len(translate_data)/10)):
  print(translate_data.iloc[i]['problem_id'], "gold label: ", translate_data.iloc[i]['label'], "   result: ", translate_data.iloc[i]['result'])
  print("premise: ", translate_data.iloc[i]['p_sen'])
  print("premise fol: ", translate_data.iloc[i]['p_fol'])
  print("add_premise: ", translate_data.iloc[i]['add_fol'])
  print("hypothesis: ", translate_data.iloc[i]['h_sen'])
  print("hypothesis fol: ", translate_data.iloc[i]['h_fol'])
  print("\n")

translated problems: 

7 gold label:  unknown    result:  unknown
premise:  A group of boys in a yard is playing and a man is standing in the background
premise fol:  (exists x.(group(x) & exists z2182.(boy(z2182) & exists z2181.(yard(z2181) & in(z2182,z2181) & of(x,z2182) & exists z2184.(play(z2184) & argu1(z2184,x) & True(z2184))))) & exists x.(man(x) & exists z2186.(stand(z2186) & argu1(z2186,x) & exists z2185.(background(z2185) & in(z2186,z2185) & True(z2186)))))
add_premise:  []
hypothesis:  The young boys are playing outdoors and the man is smiling nearby
hypothesis fol:  (exists x.(exists z2187.(young(z2187) & argu1(x,z2187) & boy(x)) & exists z2188.(outdoors(z2188) & exists e.(play(e) & argu1(e,x) & argu2(e,z2188) & True(e)))) & exists x.(man(x) & exists z2190.(smile(z2190) & argu1(z2190,x) & exists s.(nearby(s) & argu1(z2190,s) & True(z2190)))))


10 gold label:  yes    result:  unknown
premise:  A brown dog is attacking another animal in front of the tall man in pants
premise