## Extract Information from Winograd Schema XML

This notebook will extract linguistic information from Winograd Schema XML files and generate when-then rules for the engine. 

This isn't a perfect process, but rather is intended to demonstrate how to read existing academic data sets into the engine and how to begin using those data sets to construct rules. For example, this information could be used to generate a set of "plausible choices" given a premise.

Note - You'll need the "request" library installed in your environment:
pip install requests

## Grab the Winograd Schema XML

In [3]:
# importing element tree
import xml.etree.ElementTree as ET 
import requests
from os.path import exists

file_cache_location = "..\\..\\temp\\WSCollection.xml"
file_exists = exists(file_cache_location)

if file_exists == False:
    print("Downloading schema file")
    URL = "https://cs.nyu.edu/~davise/papers/WinogradSchemas/WSCollection.xml"
    response = requests.get(URL)
    with open(file_cache_location, "wb") as temp_file:
        temp_file.write(response.content)
else:
    print("Using cache")

Using cache


## Parse the File

In [4]:
tree = ET.parse(file_cache_location) 

root = tree.getroot() 
num = 0

rules = []

for elem in root.findall('schema'):

    # num = num + 1
    # if (num < 30 or num > 34): continue

    premise = str.strip(elem.find("text/txt1").text)
    pron = str.strip(elem.find("text/pron").text)
    focus = str.strip(elem.find("text/txt2").text)

    premise = premise.lower()
    premise = str.replace(premise, " because", "")
    premise = str.replace(premise, ", but", "")
    

    focus = str.replace(focus, ".","")

    print("Premise:", premise)
    print("Focus:", pron, focus)

    quote1 = elem.find("quote/quote1")
    quote1 = str.strip(quote1.text) if quote1 else ""
    qpron = str.strip(elem.find("quote/pron").text)
    quote2 = elem.find("quote/quote2")
    quote2 = str.strip(quote2.text) if quote2 else ""
    print("Referent:",quote1,qpron,quote2)

    correct_answer = str.strip(elem.find("correctAnswer").text.lower())
    correct_answer = str.replace(correct_answer,".", "")
    correct_answer_idx = ord(correct_answer) - ord("a") + 1
    # print(correct_answer, correct_answer_idx)

    answers = elem.findall("answers/answer")
    answer = answers[correct_answer_idx-1]
    answer_text = str.strip(answer.text)
    print("Answer:", answer_text)

    answer_text = answer_text.lower()
    when_clause = premise.replace(answer_text, "?x")
    then_clause = "?x " + focus

    when_clause = str.strip(when_clause)
    then_clause = str.strip(then_clause)

    print("#when:", when_clause)
    print("#then:", then_clause)

    rule = {"#when": when_clause, "#then": then_clause}
    rules.append(rule)

    print("")

Premise: the city councilmen refused the demonstrators a permit
Focus: they feared violence
Referent:  they 
Answer: The city councilmen
#when: ?x refused the demonstrators a permit
#then: ?x feared violence

Premise: the city councilmen refused the demonstrators a permit
Focus: they advocated violence
Referent:  they 
Answer: The demonstrators
#when: the city councilmen refused ?x a permit
#then: ?x advocated violence

Premise: the trophy doesn't fit into the brown suitcase
Focus: it is too large
Referent:  it 
Answer: the trophy
#when: ?x doesn't fit into the brown suitcase
#then: ?x is too large

Premise: the trophy doesn't fit into the brown suitcase
Focus: it is too small
Referent:  it 
Answer: the suitcase
#when: the trophy doesn't fit into the brown suitcase
#then: ?x is too small

Premise: joan made sure to thank susan for all the help
Focus: she had recieved
Referent:  she 
Answer: Joan
#when: ?x made sure to thank susan for all the help
#then: ?x had recieved

Premise: joan m

## Inspect the Rules

In [6]:
import pprint

for rule in rules:
    pprint.pprint(rule, sort_dicts=False)

{'#when': '?x refused the demonstrators a permit',
 '#then': '?x feared violence'}
{'#when': 'the city councilmen refused ?x a permit',
 '#then': '?x advocated violence'}
{'#when': "?x doesn't fit into the brown suitcase", '#then': '?x is too large'}
{'#when': "the trophy doesn't fit into the brown suitcase",
 '#then': '?x is too small'}
{'#when': '?x made sure to thank susan for all the help',
 '#then': '?x had recieved'}
{'#when': 'joan made sure to thank ?x for all the help',
 '#then': '?x had given'}
{'#when': '?x tried to call george on the phone',
 '#then': "?x wasn't successful"}
{'#when': 'paul tried to call ?x on the phone', '#then': "?x wasn't available"}
{'#when': '?x asked the witness a question',
 '#then': '?x was reluctant to repeat it'}
{'#when': 'the lawyer asked ?x a question',
 '#then': '?x was reluctant to answer it'}
{'#when': '?x zoomed by the school bus', '#then': '?x was going so fast'}
{'#when': 'the delivery truck zoomed by ?x', '#then': '?x was going so slow'}

## Test a Few Assertions

In [7]:
import os, sys
sys.path.insert(1, os.path.abspath('..\\..'))

from thoughts.rules_engine import RulesEngine
import pprint

engine = RulesEngine()

engine.load_rules_from_list(rules)

result = engine.process_assertions(["the doctors arrived after the police"])
pprint.pprint(result)

result = engine.process_assertions(["my cousin refused the demonstrators a permit"])
pprint.pprint(result)

result = engine.process_assertions(["tom couldn't lift his son"])
pprint.pprint(result)

result = engine.process_assertions(["alex tried to call george on the phone"])
pprint.pprint(result)


['the doctors were coming from so far away']
['my cousin feared violence']
['tom was so weak']
["alex wasn't successful"]
