# Description

This is a question answering system that is developed in Python. The system is only able to answer 'Who', 'What', 'Where' and 'When' questions. The question answering system is not domain specific and provides answers in complete sentences only if user question matches with the system pre defined format. If system does not know the answer to user questions, then it simply returns with default answer that is "I am sorry, I don't know the answer.". And if system is able to find out the output from wikipedia then it returns the result.



# Example

Welcome to Question Answering system

This is a QA system by Team1. It will try to answer questions that start with Who, What, When or Where.

Please enter exit to leave the Q&A session

Who is Joe Biden? (question)

Joe Biden is the 46th and current president of the United States. (system answer)

who was APJ Abdul Kalam? (question)

APJ Abdul Kalam was an Indian aerospace scientist who served as the 11th President of India from 2002 to 2007. (system answer)

When was Beyonce born? (question)

Beyonce Born was born on January 4 1954. (system answer)

Where is United States of America? (question)

United America is a country primarily located in North America. (system answer)
 
bye

Thank you for your time! Goodbye.


# Algorithm

Step 1. System welcome the user with pre-defined welcome message.

Step 2. System enters the while loop and decides the next course of action based on the user input.

        - Option 1: If the user wants to quit:
                    System reponds with the "Thank you for your time! Goodbye." and exit the program.
                    
        - Option 2: If the user wants to continue the conversation:

                    Step 1. User enters the question and system will match the question with the pre defined format using regrex.

                    Step 2. If the question is not related to 'Who', 'What', 'Where' and 'When', system will return "Please only ask Who, What, When 
                    and Where question."

                    Step 3. If the question matches with the existing format, system will perform below steps:

                            a. Tokenize the words, check for POS tags and store the noun and verb phrases in a seprate list.
                            b. Who, what and where question type, uses same function to find out the answer. When question uses 
                            different function to find out the answer.               
                            c. Search the wikipedia with the noun phrase using wikipedia library.
                            d. Search the sentences for getting relevant answer using verb phrases and regular expression.
                            e. Returns the relevant answer to the user.

In [317]:
# Importing libraries

import spacy
import nltk
import re
import en_core_web_sm
import wikipedia

from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.corpus import wordnet
import logging
import sys

In [319]:
# Creating logging object



log_file = sys.argv[1]

logging.basicConfig(filename = log_file, level = logging.DEBUG, format = '%(message)s')



In [311]:
# The function will parse the user question and return the answer back to user

def get_answer(question):
    
    wh_question_list = ['who','what','when','where']
    question_list = r'('+'|'.join(wh_question_list)+')'  
    
    verb_list = ['is','am','you','was','did','are','were','has','have','does']
    verb_list_temp = r'('+"|".join(verb_list)+')'
    
    rest_body = r'(.*)'
    
    parseQuestion = re.search( " ".join([question_list,verb_list_temp,rest_body]), question, re.IGNORECASE) #Checking the question format
    logging.info("Prase Question is: %s", parseQuestion)
    
    if(parseQuestion): #If question format matches correctly, function will go to the rest part
        
        question_type = parseQuestion[1]  # Checking QuestionType
        verb_type = parseQuestion[2]      # Storing Verb from the question
        rest_body = parseQuestion[3]      # Storing remaining body
        
        ## Extracting the noun and the verb phase from the rest body    
        pos_tags = nltk.pos_tag(nltk.word_tokenize(rest_body))
        
        ## Getting pronoun or noun phrase from the rest_body and storing in noun_phrase variable
        noun_phrase = " ".join(word[0] for word in pos_tags if word[1]=='NN' or word[1]=='NNS' or word[1]=='NNP')
        
        ## Getting only verb phrase from the rest_body and storing in verb_phrase variable
        verb_phrase = " ".join(word[0] for word in pos_tags if word[1]=='VBN' or word[1]=='VBD')
        
        if(question_type.lower() != 'when'):  #if the question type is where, what and why
            answer = get_answer_what_question(noun_phrase, verb_phrase, verb_type)
            return answer
        
        else: # If question type is when
            answer = get_answer_when_question(noun_phrase, verb_phrase, verb_type,question )
            return answer
    
    elif(re.search(question_list,word_tokenize(question)[0]) is not None): # If question format doesnt matches, default statement will be passed
        statement = 'Please enter valid question.'
        return statement
        
    else: # If question is other than 'Wh' questions.
        statement = 'Please only ask Who, What, When and Where question.'
        return statement

In [314]:
# Function defined to find out answer for 'when' question

def get_answer_when_question(noun_phrase,verb_phrase, verb_type, question):
    
    wiki_findings = []  #List to store the findings from the wiki pages
    
    birth_list = r'(birthday|birth day|birth|born)'  # Birth day patterns
    death_list = r'(died|die|death|killed|passed away)'   # Death type paterrns
    
    if(re.search(birth_list,question,re.IGNORECASE)):  # Loop for finding birth date
        sentences = wikisearch(noun_phrase) #Calling wikisearch function
        
        for sentence in sentences:
            if(re.search(verb_phrase, sentence, re.IGNORECASE) is not None):
                wiki_findings.append(sentence)
                            
        # Pattern to match born date like Jan 12, 1990 or mmddyyyy or 12 Jan 2001
        birth_pattern = r'''((\d{1,2}\d{1,2}\d{2,4}\W*) | (\d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w* \d{2,4}\W*) | ((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w* \d{1,2},? \d{2,4}\W*) | (\d{4} to \d{4}\W*))'''
        
        if(len(wiki_findings)>0):
            finding = wiki_findings[0]
            logging.info('\n Finding is:\n %s',finding)
            birth_date = re.search(birth_pattern,finding, re.IGNORECASE)
        
        
            if(birth_date != None):  # If birthdate matches with the birth_pattern
            
                try:
                    birth_date = birth_date.group(1)
                    name = noun_phrase.split()
            
                    if ')' in birth_date: # Checking if teh birth date contaings ')'
                        birth_date = birth_date.split(')')
                        birth_date=birth_date[0]

                    # Defining answer that need to send back to user
                    answer = name[0].capitalize() + " " + name[1].capitalize() + ' was born on ' + re.sub(r',|" "','',birth_date).rstrip() + "."
                    return answer

                except:
                     return "I am sorry, I don't know the answer."
            
            else:
                return "I am sorry, I don't know the answer."
              
            
        else:
            return "I am sorry, I don't know the answer."
            
            
     
    
    elif(re.search(death_list, question, re.IGNORECASE)): # Loop for finding death date
        
        sentences = wikisearch(noun_phrase) # Calling wikisearch function
        
        # Pattern to match death date, Jan 12, 1990 or 12 Jan 2001
        death_pattern = r'''(((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w* \d{1,2},? \d{2,4}\s*–\s*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w* \d{1,2}(,)?) \d{2,4}|(\d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w* \d{2,4}\s*–\s*\d{1,2} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w* \d{2,4}))'''
        
        if(len(sentences)>0):
            wiki_findings = sentences[0] # Using only first line to find out the answe
            logging.info('\n Finding is:\n %s',wiki_findings)#Logging the result
     
         
        
            death_date = re.search(death_pattern,wiki_findings, re.IGNORECASE) # Searching the death pattern in wikipedia result using regex

            if(death_date != None): # If death date matches the pattern

                try:
                    name = noun_phrase.split()

                    date = death_date.group(0).split('–')  #death date will contain "February 22, 1732 – December 14, 1799", we will use date December 14, 1799 as death date

                    # Defining answer that need to send back to user
                    answer = name[0].capitalize() + " " + name[1].capitalize() + ' died on ' + date[-1] + "."
                    return answer

                except:
                    return "I am sorry, I don't know the answer."

            else:
                return "I am sorry, I don't know the answer."
            
        else:
            return "I am sorry, I don't know the answer."
        
    else: # other than birth and death dates
        
        #Search for synonyms of verb_phrase
        syns = [lemma.name() for syn in wordnet.synsets(verb_phrase) for lemma in syn.lemmas()]
        
        sentences = wikisearch(noun_phrase) #Calling wikisearch function

        for sentence in sentences:
            if(re.search('('+verb_phrase+'|'.join(syns)+')', sentence, re.IGNORECASE) is not None):
                wiki_findings.append(sentence)     #Searching wikipedia
                logging.info('%s \n',wiki_findings)      #Logging the result
                
        if(len(wiki_findings)>0):
            
                try:
                    finding = wiki_findings[0]
                    match = re.search(r'\d{4}',finding, re.IGNORECASE)

                    if(match is not None):
                        answer = noun_phrase.capitalize() + " " + verb_type + " "+ verb_phrase + " on " + match.group(0) + "."
                        return answer

                    else:
                        return "I am sorry, I don't know the answer."
                    
                except:
                    return "I am sorry, I don't know the answer."
        else:
            return "I am sorry, I don't know the answer."


In [298]:
# Function to find out answer for what, where and who question

def get_answer_what_question(noun_phrase,verb_phrase, verb_type):

    wiki_findings = []  #List to store all the findings from the wiki pages
        
    if(noun_phrase): # If noun is present in the question, wikipedia is search using noun words
        sentences = wikisearch(noun_phrase)  # Calling wikisearch function and passing noun(s) in the arguments

        answer=[]  #it will store answer that need to return to user
            
        if(verb_phrase and sentences != None): # If verb_phrase exist and sentences not equals to Null
            for sentence in sentences:
                if(re.search(verb_phrase, sentence, re.IGNORECASE) is not None): 
                    wiki_findings.append(sentence)   # Storing all the sentences that matches the verb phrase passed in the quesion
                    logging.info('\n Wiki finding is:\n %s',wiki_findings)
                
            if(len(wiki_findings)>0):
                finding = wiki_findings[0] # Using only first result
                logging.info('\n Finding is:\n %s',finding) 

                match = re.search(r'(.*)'+ '( '  + verb_phrase + ' )' + '(.*)',finding)
                    
                try:
                    if(match is not None):
                        match_result = match.group(3)

                        # forming answer
                        answer = noun_phrase.capitalize() + " " + verb_type + " "+ verb_phrase + " " + match_result 
                        return answer
                    
                    elif(verb_type and sentences != None): # If verb_type exist and sentences not equals to Null
                        wiki_findings=[]
                        
                        for sentence in sentences:
                            if(re.search(verb_type, sentence, re.IGNORECASE) is not None): #Search with verb_type
                                wiki_findings.append(sentence)   # Storing all the sentences that matches the verb type passed in the quesion
                                logging.info('\n Wiki finding is:\n %s',wiki_findings)
                                
                        if(len(wiki_findings)>0):
                            finding = wiki_findings[0]
                            logging.info('\n Finding is:\n %s',finding)
                                
                            match = re.search(r'(.*)'+ '( '  + verb_type + ' )' + '(.*)',finding)
                            
                            if(match is not None):
                                # forming answer
                                answer= noun_phrase.capitalize() + " " + verb_type + " " + match.group(3) 
                                return answer
                                
                            else:
                                return "I am sorry, I don't know the answer." 
                        else:
                                return "I am sorry, I don't know the answer."
                    else:
                        return "I am sorry, I don't know the answer."
                        
                except:
                    return "I am sorry, I don't know the answer."
                        
            else:
                return "I am sorry, I don't know the answer."
      
            
        elif(verb_type and sentences != None): # If verb_type exist and sentences not equals to Null
            for sentence in sentences:
                if(re.search(verb_type, sentence, re.IGNORECASE) is not None): #Search with verb_type
                    wiki_findings.append(sentence)   # Storing all the sentences that matches the verb type passed in the quesion
                    logging.info('\n Wiki finding is \n: %s',wiki_findings)
                        
            if(len(wiki_findings)>0):
                finding = wiki_findings[0]
                logging.info('\n Finding is:\n %s',finding)

                match = re.search(r'(.*)'+ '( '  + verb_type + ' )' + '(.*)',finding)
                    
                try:
                    if(match is not None):
                        # forming answer
                        answer= noun_phrase.capitalize() + " " + verb_type + " " + match.group(3) 
                        return answer
                        
                    else:
                        return "I am sorry, I don't know the answer." 
                        
                except:
                    return "I am sorry, I don't know the answer."
                    
            else:
                return "I am sorry, I don't know the answer."  
                                
        else: 
            return "I am sorry, I don't know the answer."
    
    else: 
        return "I am sorry, I don't know the answer."

In [299]:
# The function is used to fetch data from wikipedia using wikipedia library

def wikisearch(searchterms):
    
    wiki_result = wikipedia.search(searchterms)
    
    try:
        firstresult = wiki_result[0]  # Extracting information from the first page
        logging.info("The first result from wikipedia: %s",firstresult)  # Logging first result got from wikipedia
        
        # Extracting contect from the first matching page
        page_content = wikipedia.page(firstresult, auto_suggest=False).content
        
        if(page_content):
            sentences = sent_tokenize(page_content) # Tokenizing the sentences and returning back to the calling function
        else:
            sentences = None
        
    except wikipedia.DisambiguationError as e: # Handling DisambiguationError error
        firstresult = wiki_result[1]  # Extracting information from the second page
        logging.info("The first result from wikipedia: %s",firstresult)  # Logging second result got from wikipedia
        
        # Extracting contect from the first matching page
        page_content = wikipedia.page(firstresult, auto_suggest=False).content
        
        if(page_content):
            sentences = sent_tokenize(page_content) # Tokenizing the sentences and returning back to the calling function
        else:
            sentences = None
        
    return sentences

In [300]:
# The main function of system

def main():
    
    exit_words = ['bye','Good bye','exit','quit']
    
    Flag = True
    
    # Welcome screen message
    print("Welcome to Question Answering system")
    print('''This is a QA system by Team1. It will try to answer questions that start with Who, What, When or Where.''')
    print("Please enter exit to leave the Q&A session")
    logging.info("==========QA-System Start==========")


    while(Flag==True):  
        user_question = input() # Taking user question
        
        logging.info("\n User Question: %s ",user_question) # Logging user question
        
        exitword = re.search(r'(.*)([b|B]ye|[E|e]xit|[Q|q]uit)(.*)',user_question)  # Exit system if user enters exit words
    
        if exitword:
            print("Thank you for your time! Goodbye.")
            logging.info("Thank you for your time! Goodbye.")
            
            Flag = False
            
        else:
            response = get_answer(user_question)
            
            logging.info("\n System response: %s",response) # logging answer
            print(response)

In [320]:
main()

Welcome to Question Answering system
This is a QA system by Team1. It will try to answer questions that start with Who, What, When or Where.
Please enter exit to leave the Q&A session


 Who is Emily Dickinson?




  lis = BeautifulSoup(html).find_all('li')


I am sorry, I don't know the answer.


 bye


Thank you for your time! Goodbye.
