# Question Answering Machine

### Final Project for AMLPP

### Carlos Grandet & Hector Salvador

The following is a demo for a question answering machine (QAM) that we build for a machine learning class. The QAM uses several NLP tools such as Stanford POSTagger, Word2Vec and 
BM25. 

In [1]:
# First step, load the relevant libraries and classes

from question_processing import Question
from synonyms import Synonyms
from Word2VecModel import W2V_Model 
from spanish_tagger import Spanish_Postagger
from query import Query
import json

# Then name the different files you will use to make the code work

tagfile = 'stanford-postagger-full-2016-10-31/models/spanish.tagger'
jarfile = 'stanford-postagger-full-2016-10-31/stanford-postagger.jar'

# Data for synonyms API
APIfile = 'http://store.apicultur.com/api/sinonimosporpalabra/1.0.0/' 
token = 'f7JE_2svUVwP5ARGfw8aQhnLXlga'

# Data for w2v model
w2vfile = 'SBW-vectors-300-min5.txt'

# Data for question_type
question_json = 'Data/question_type.json'
stopwords = 'Data/stopwords.json'


### Create a Query class, which allows you to make questions, get queries and find related words. 

Creating this query might take a while, since you need to load a Word2Vec model that is over 2.5 GB!!! We advise running the code below only once

In [2]:
query = Query(APIfile, token, tagfile, jarfile, 
    w2vfile)

Once we have created the Query class, we can start asking various questions

In [3]:
question = "¿Cuál es el castigo por homicidio?"

#set_question loads a question and a json file with info about the words that represent 
#a question type (i.e. time: years, period, days, etc )
query.set_question(question, question_json)

#get_query gives you a specific query list
query.get_query(stopwords)


http://store.apicultur.com/api/sinonimosporpalabra/1.0.0/homicidio
Bearer f7JE_2svUVwP5ARGfw8aQhnLXlga


You can access the query selected for that question and the question type with 
the following attributes

In [13]:
print(query.query)
print(query.qtype)

{'delito', 'culposo', 'homicidio', 'homicido', 'crimen'}
['años']


You can also add words to the query, if you want

In [5]:
query.add_words(["asesinato", "imprudencial"])

NameError: name 'list_noun' is not defined

You can also remove words

In [9]:
query.remove_words(["asesinato", "doloso"])

In [10]:
query.query

{'crimen', 'culposo', 'delito', 'homicidio', 'homicido'}

Finally, you can play with word to vec to find other words that could match your query or find words that do not belong there

In [7]:
query.W2V.find_concepts(positive = ["rey", "mujer"], negative = ["hombre"], top_n = 1)

['reina']

In [8]:
query.W2V.intruder(["rana", "pato", "simio", "pelota"])

'pelota'

### Once you have a query, the next step is to use information retrieval to find the relevant documents