Group Programming Assignment Two-QA System <br>
Kori Fogle, Michaela Herrick, Jackson Holland, Ishaan Indoori <br>
AIT 526 <br>
6/18/2025 <br>

<b> Problem to be Solved </b> <br>
The goal of this program is to create a question answering system executable from the command line. These questions must begin with "who," "what," "when," or "where" and must return a portion of the question back to the user along with the answer. Unclear questions or questions to which answers cannot be found must be addressed. A log file must be created within the program, and users must be able to exit the chat.
 <br>
<b> Algorithm and Flow</b><br>
Our program utilizes a class that contains several functions to accomplish the above goal. First, we loaded all relevant packages. Particularly, this program will use a Wikipedia API to return answers back to a user. We then initialize a log file, as well as create a way for the user to exit. These questions must begin with "who," "what," "when," or "where." The program uses regular expressions and if statements to parse user input, then return answers that rephrase part of the user's original question along with the answer. The user’s input is then categorized into one five types of questions- a who question, a what question, a where question, a when question, or an unclear/ unanswerable question. These questions, except for unclear/ unanswerable questions, are then queried using the Wikipedia API. An answer is then returned to the user. The program also alerts users when it is given a question it cannot answer with a phrase such as "I'm sorry, I don't quite know the answer." Further, users can exit the program using the word "exit." This program also creates a log file that can be referenced after program ends to review input and output. Finally, error handling is included to address specific errors, such as pages not loading, or too many possible answers being returned.
<br>
<b> Example of Input and Output</b><br>

<b> Usage Instructions</b><br>
<oi>
<li>Execute the program</li>
<li>You'll be asked to input a name for the log file. Name this file as you would any other file. It will be accessible in the same working directory as this program.</li>
<li>Instructions will prompt you to start all questions with who, what, when, or where.</li>
<li>Ask a question and wait for the response</li>
<li>Stop asking questions by typing the word "exit."</li>
</ol><br>

<b> References</b><br>
Anon. 2024. “Python Regex Cheat Sheet.” GeeksforGeeks. Retrieved May 31, 2025 (https://www.geeksforgeeks.org/python-regex-cheat-sheet/).<br>
Dib, Firas. n.d. “Regex101 - Online Regex Editor and Debugger.” @Regex101.(https://regex101.com/).<br>
Gadiraju, Sai Surya. n.d. “Program Assignment 2 Demo Video.”<br>
Jurafsky, Daniel, and James H. Martin. 2025. Speech and Language Processing (3rd Ed. Draft).<br>
Liao, Duoduo. n.d. "Tips and Hints- QA System Programming Assignment-2."


In [None]:
#Load relevant packages and en_core_web_sm
import sys
import wikipedia
import wikipedia.exceptions as wiki_exceptions
import nltk
from nltk.tokenize import sent_tokenize
import re
import spacy

nlp = spacy.load("en_core_web_sm")

In [None]:
#Much of the code below is derived from (Liao, n.d.) and (Gadiraju, n.d.).
#Create a class to hold functions
class QA_System:
    #Function to set up logging
    def __init__(self, logfile): 
        self.logfile = logfile
    #Function to set up greeting and user input, indicate way to exit chat    
    def run(self):
        print("*** This is a QA system. I will try to answer questions that start with Who, What, When, and Where. Type Exit to quit. ***")
        while True:
            try:
                #clean user input and make it all lowercase
                question = input ("*?>").strip().lower()
                if question == "exit":
                    print("Thank you, goodbye")
                    break
                self.answer_question(question)
            except Exception as e:
                print(f"An error occurred: {e}")
                continue
    #Function to use nlp to process user questions and calls question type function to return output to user.             
    def answer_question(self, question):
        #
        doc = nlp(question)
        question_type = self.identify_question_type(question)
        if question_type is None:
            print("I'm sorry I don't quite know the answer to this question.")
            self.log_question(question, "n/a")
            return 
        refined_query = self.extract_context(question)
        if not refined_query:
            refined_query = self.extract_dynamic_entity(doc, question_type)
        if refined_query: 
            print(f"Trying to search Wikipedia for the question: {refined_query}")
            self.search_wikipedia(refined_query, question_type, question)
        else: #If there is no answer found
            print("I'm sorry, but I was unable to find an answer. Make sure you've phrased your question correctly.")

    #Function to determine if the question is who, what, when, or where based
    def identify_question_type(self, question):
        question_lower = question.lower()
        if question_lower.startswith("who"):
            return "Who"
        elif question_lower.startswith("what"):
            return "What"
        elif question_lower.startswith("when"):
            return "When"
        elif question_lower.startswith("where"):
            return "Where"
        return None

    #Function containing regular expressions corresponding to the question types to get subjects from user input
    def extract_context(self, question):        
        patternWhoIs = [            
            #Who
            r'Who (Is|Was|are|) (.+)',
        ]
        patternWhoLong=[ 
            r'Who (?:owns|founded|leads|led|made|makes|created|invented|discovered|wrote) (.+)']
        patternWhat=[            
            #What
            r'what (Is|Was) ( .* ) Age',
            r'What (Is|Was) (.+)',
            r'What (.+)']
        patternWhen=[
            #When
            r'When (?:is|was) (.+) born',
            r'When (?:is|was) (.+) birthday',
            r'When did (.+)'
            r'When (Is|Was) ( .. ) Born',
            r'When (Is|Was) (.+) Birthday',
            r'When (.+) Born',
            r'When (.+) Birthday',
            r'When did (.+)']
        patternWhere=[            
            #Where
            r'Where (.+)',
            r'Where (?:is|was|are|did) (.+)']        

        #searching has been made a bit more robust- for more specific searches, the grouping returns slightly different search terms
        patterns = patternWhoIs+patternWhoLong+patternWhat+patternWhen+patternWhere
        for pattern in patterns:
            match = re.match(pattern, question, re.IGNORECASE)
            print(f"pattern is {pattern} match is {match}") ##DELETE or commetn out when done
            if match:
                if pattern in patternWhoIs:
                    return match.group(2).strip()    
                elif pattern in patternWhoLong:
                    return match.group(0).strip()
                elif pattern in patternWhat:
                    return match.group(2).strip()
                elif pattern in patternWhen:
                    return match.group(2).strip()
                elif pattern in patternWhere:
                    return match.group(2).strip()
        return None
    
    #Function to determine the entity of a question
    def extract_dynamic_entity(self, doc, question_type):
        entities = [ent.text for ent in doc.ents if ent.label_ in {"PERSON", "ORG", "GPE", "DATE"}]
        if entities:
            return " ".join(entities)
        return re.sub(r'[^a-zA-Z0-9\s]', '', doc.text).strip().lower()

    #Function to search wikipedia that returns a summary of the first found result of 5 sentances in length.
    def search_wikipedia(self, query, question_type, question):
        try:
            search_results = wikipedia.search(query)
            if search_results:
                summary = wikipedia.summary(search_results[0], sentences = 5)
                #print(f"test: {summary}")
                meaningful_summary = self.summarize_text(summary, question_type, query)
                if meaningful_summary:
                    print(f"=> {meaningful_summary}")
                    self.log_question(question, meaningful_summary)
                else:
                    print("I am sorry I cannot seem to find the answer.")
                    self.log_question(question, "Couldnt find answer")
            else:
                print("I am sorry I cannot seem to find the answer.")
                self.log_question(question, "Couldnt find answer.")
        #Other possible errors and messages to user when they occur
        except wiki_exceptions.DisambiguationError as e:
            print("This question is rather ambiguous, but here are some possible answers.")
            for option in e.options[:5]:
                print(f"-{option}")
            self.log_question(question, "Ambiguous question.")   
        except wiki_exceptions.PageError:
            print("Unfortunately I could not find a page on that topic.")
            self.log_question(question, "No Pages Found")
        except wiki_exceptions.HTTPTimeoutError:
            print("There's a network error, check your internet connection and try again.")
            self.log_question(question,"Time out error")

    #Function to take the information from 
    def summarize_text(self, text, question_type, query):
        sentences = sent_tokenize(text)
        results = []
        # Use relevant patterns to search for the correct information based on the question type
        if question_type == "Who":
            persons = self.clean_display_name(query)
            print(persons)
            
            for sentence in sentences:
                if persons in sentence or "was" in sentence or "is" in sentence:
                        return sentences [0:2]
                    
            return "No info found"
        
        elif question_type == "What":
            return sentences[0] # For "What", return the first sentence as a general definition
  
        elif question_type == "When":
            clean_name = self.clean_display_name(query)
            doc = nlp(query)
            is_person = any(ent.label_ == "PERSON" for ent in doc.ents)
            if is_person:
                birth_match = re.search(r'\b([A-Z][a-z]+ \d{1,2}(?:, \d{4})?)', text)
                if birth_match:
                    name = query.title()
                    date = birth_match.group(1)
                    return f"{name} was born on {date}."
            else:
                match = re.search(
                r'(?:started|began|occurred|took place|was fought|was held|broke out|commenced)(?: in| on)? ([A-Z][a-z]+ \d{1,2}, \d{4}|\d{4})', text)
                if match:
                    date = match.group(1)
                    return f"{clean_name} began in {date}."
                date_match = re.search(r'\b(?:in )?(\d{4})\b', text)
                if date_match:
                    return f"{clean_name} occurred in {date_match.group(1)}."
            return "Date or time information not found."
       
        elif question_type == "Where":
            for sentence in sentences:
                if "GPE" in [ent.label_ for ent in nlp(sentence).ents]:
                    return sentence
            return "Location information not found."

    #Function to add questions and answer to log file
    def log_question(self, question, answer):
        with open(self.logfile, 'a', encoding='utf-8') as log:
            log.write(f"Question: {question}\n")
            log.write(f"Answer: {answer}\m\n")

    #Function to strip and clean user input, extract a portion of the question to be used when returning an answer to the user.
    def clean_display_name(self, query):
        query = re.sub(r'\b(start(ed)?|begin|began|occur(red)?|happen(ed)?|was|did|when)\b', '', query, flags=re.IGNORECASE)
        query = re.sub(r'\s+', ' ', query).strip("? ").strip()
        return query.title()

        
#Function to execute that q/a chat
def main():
    #First ask user to name the log file
    log_filename = input("Enter the name of the log file: ").strip()
    try:
        qa_system= QA_System(log_filename)
        qa_system.run()
    except Exception as ex:
        print(ex)
    finally:
        print("Log is saved.")
if __name__ == "__main__": 
    main()