# Advanced Question Analysis

The goal of this assignment is to write a more flexible version of the interactive QA system. As in the previous assignment, the system should be able to take a question in natural language (English) as input, analyse the question, and generate a SPARQL query for it.

## Assignment  // Additional requirements

* Make sure that your system can analyse at least two more question types. E.g. questions that start with *which*, *when*, where the property is expressed by a verb, etc.
* Apart from the techniques introduced last week (matching tokens on the basis of their lemma or part-of-speech), also include at least one pattern where you use the dependency relations to find the relevant property or entity in the question. 
* Include 10 examples of questions that your system can handle, and that illustrate the fact that you cover additional question types

## Examples

Here is a non-representative list of questios and question types to consider. See the list with all questions for more examples

* For what movie did Leonardo DiCaprio win an Oscar?
* How long is Pulp Fiction?
* How many episodes does Twin Peaks have?
* In what capital was the film The Fault in Our Stars, filmed?
* In what year was The Matrix released?
* When did Alan Rickman die?
* Where was Morgan Freeman born?
* Which actor played Aragorn in Lord of the Rings?
* Which actors played the role of James Bond
* Who directed The Shawshank Redemption?
* Which movies are directed by Alice Wu?


In [2]:
import spacy

nlp = spacy.load("en_core_web_sm") # this loads the model for analysing English text
                   

## Dependency Analysis with Spacy

All the functionality of Spacy, as in the last assignment, is still available for doing question analysis. 

In addition, also use the dependency relations assigned by spacy. Note that a dependency relation is a directed, labeled, arc between two tokens in the input. In the example below, the system detects that *movie* is the subject of the passive sentence (with label nsubjpass), and that the head of which this subject is a dependent is the word *are* with lemma *be*. 


In [3]:
question = 'Which movies are directed by Alice Wu?'

parse = nlp(question) # parse the input 

for word in parse : # iterate over the token objects 
    print(word.lemma_, word.pos_, word.dep_, word.head.lemma_)


which DET det movie
movie NOUN nsubjpass direct
be AUX auxpass direct
direct VERB ROOT direct
by ADP agent direct
Alice PROPN compound Wu
Wu PROPN pobj by
? PUNCT punct direct


## Phrases

You can also match with the full phrase that is the subject of the sentence, or any other dependency relation, using the subtree function 


In [4]:
def phrase(word) :
    children = []
    for child in word.subtree :
        children.append(child.text)
    return " ".join(children)
        
for word in parse:
    if word.dep_ == 'nsubjpass' or word.dep_ == 'agent' :
        phrase_text = phrase(word)
        print(phrase_text)
        

Which movies
by Alice Wu


## Visualisation

For a quick understanding of what the parser does, and how it assigns part-of-speech, entities, etc. you can also visualise parse results. Below, the entity visualiser and parsing visualiser is demonstrated. 
This code is for illustration only, it is not part of the assignment. 

In [13]:
from spacy import displacy

question = "Which parts does a film have?"

parse = nlp(question)

displacy.render(parse, jupyter=True, style="ent")

displacy.render(parse, jupyter=True, style="dep")

## Assignment 4

- "Who/What was/is/were (the) X of Y?" [What is the country of origin of Black Mirror?]
- "Y was/is/were X by whom/what?" [Inception was directed by whom?]
- "Who/What X Y?" [Who directed The Shawshank Redemption?]
- "Which movies are X by Y?" [Which movies are directed by Alice Wu?]
- "When did Y X?" [When did Alan Rickman die?]
- "By whom/what was/is/were Y X?" [By whom was Tarzan directed?]

In [1]:
import spacy
nlp = spacy.load("en_core_web_sm") # this loads the model for analysing English text

def getPropertyAndEntity(parse) :
    propRange = []
    prop = ""
    entityRange = [] 
    for i in range(len(parse)) : # iterate over the token objects 
        word = parse[i]
    
        if word.dep_ == "ROOT" and word.lemma_ != "be":
            # Set root as property when it isn't a form of to be
            prop = word.text
    
        if word.text.istitle():
            # Check if word starts with uppercase letter for entities
            if i != 0 or (word.pos_ != "PRON" and word.pos_ != "DET" and word.pos_ != "ADV" and word.pos_ != "ADP"):
                # If it isn't one of the question words (Who/What/Which/When)
                entityRange.append(i)
        elif word.pos_ == "NOUN" or word.pos_ == "VERB":
            # Properties are nouns or verbs
            previousWord = parse[i-1]
            if previousWord.pos_ == "ADJ" :
                # Also add adjectives of the properties
                propRange.append(i-1)
            propRange.append(i)

    if prop == "":
        minProp = propRange[0]
        maxProp = propRange[len(propRange)-1]
        prop = parse[minProp:maxProp+1].lemma_

    minEntity = entityRange[0]
    maxEntity = entityRange[len(entityRange)-1]
    entity = parse[minEntity:maxEntity+1].text
    return prop, entity

In [2]:
import requests

def findEntityID(entity) :
    entityID = []
    url = 'https://www.wikidata.org/w/api.php'
    params = {'action':'wbsearchentities',
              'language':'en',
              'format':'json',
              'search':entity}
    json = requests.get(url,params).json()
    for result in json['search']:
        entityID.append(result['id'])
    return entityID
        
def findPropertyID(prop) :
    propID = []
    url = 'https://www.wikidata.org/w/api.php'
    params = {'action':'wbsearchentities',
              'type': 'property',
              'language':'en',
              'format':'json',
              'search':prop}
    json = requests.get(url,params).json()
    for result in json['search']:
        propID.append(result['id'])
    return propID

In [3]:
import requests

# Get answer to query
def answerQuery(query) : 
    answer = []
    url = 'https://query.wikidata.org/sparql'
    results = requests.get(url, params={'query': query, 'format': 'json'}).json()
    
    try:
        answer.append(results['boolean'])
    except:
        for item in results['results']['bindings']:
            for var in item :
                answer.append(item[var]['value'])
    return answer

In [4]:
def getAnswer(propID, entityID) :
    answer = []
    # Try every combination of property and entity untill an answer is given
    for i in range(len(propID)) :
        for j in range(len(entityID)) :
            entity = entityID[j]
            prop = propID[i]
            
            # Construct the query
            query = 'SELECT ?answerLabel WHERE { wd:' + entity + ' wdt:' + prop + ''' ?answer . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }'''
            # Construct the opposite query
            query2 = 'SELECT ?answerLabel WHERE { ?answer wdt:' + prop + ' wd:' + entity + ''' . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }'''
            
            # Get the answer to the question
            answer = answerQuery(query)
            if answer != [] :
                return answer
            
            # Try to get the answer when using the opposite query
            answer = answerQuery(query2)
            if answer != [] :
                return answer
    
    return answer

In [5]:
def answerQuestion(question) :
    # Parse question
    parse = nlp(question)
    
    # Get the property and entity
    prop, entity = getPropertyAndEntity(parse)
    #print("prop:", prop)
    #print("entity:", entity)
    
    # Find the entity id from wikidata
    entityID = findEntityID(entity)
        
    # Find the property id from wikidata
    propID = findPropertyID(prop)
    
    # Get the answer to the question
    answer = getAnswer(propID, entityID)
                
    return answer

In [6]:
# Test question from input
question = input('Please ask a question\n')
answer = answerQuestion(question)
print(answer)

Please ask a question
"Inception was directed by whom?"
['Christopher Nolan']


In [17]:
# 10 questions that the function can answer
questions = [
    "Inception was directed by whom?",
    "Who directed The Shawshank Redemption?",
    "What is the country of origin of Black Mirror?",
    "Which movies are directed by Alice Wu?",
    "When did Alan Rickman die?",
    "Who is the composer of Lord of The Rings?",
    "Which movies did Christopher Nolan direct?",
    "What seasons does Twin Peaks have?",
    "When was Morgan Freeman born?",
    "By whom was Tarzan directed?"
]

index = 9
print(questions[index])
answer = answerQuestion(questions[index])
print(answer)

By whom was Tarzan directed?
['Chris Buck', 'Kevin Lima']
