# FAQ based Q&A system prototype
Version: 1.0 <br>
Developer: Anirban Saha. <br>

**Drawbacks:**


*   Uses ElasticSearch Cloud. It is free version. It expires by 20.03.2021.
*   A few approaches are rudimentary and primitive.

**Future work includes:**


*   Using ElasticSearch in local machine.
*   Scraping FIN websites, create knowledge base.

*   Making a textual entailment model. It would take user's query, a FAQ question and see if the query entails the existing FAQ question. If yes, then it should return the answer.


*   Bettering the approach towards understanding intent - "Asking for link".
*   Creation of better knowledge.







# Importing and downloading stuff
For this section, please run the cells individually. Also please take a note of the comments.

In [1]:
import spacy
#!python -m spacy download en_core_web_lg
# After downloading en_core_eb_lg, please restart runtime.

In [2]:
!pip install transformers
!pip install elasticsearch
!pip install elastic_app_search



In [3]:
import pandas as pd 
import requests
import warnings
warnings.filterwarnings('ignore')
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
from elasticsearch import helpers, Elasticsearch 
from elastic_app_search import Client
from nltk.tokenize import word_tokenize

# Loading the datasets.
Loads the datasets to local variable. Nothing fancy. Pretty straightforward.

In [4]:
"""
Description: Takes download url, filename. Downloads it. Saves. 
"""
def download_file(download_url,filename):
  response = requests.get(download_url)
  with open('/content/'+filename+'.csv', 'wb') as f:
      f.write(response.content)
  return '/content/'+filename+'.csv'

In [5]:
"""
Description: loads data from file.
"""
def load_links(file_path): 
  links = pd.read_csv(file_path) 
  return links

In [6]:
def load_faq(file_path):   
  faq_data = pd.read_csv(file_path) 
  return faq_data

In [7]:
file_path_faq = download_file("https://www.anirbansaha.com/wp-content/uploads/2021/03/faq.csv","faq")
file_path_links = download_file("https://www.anirbansaha.com/wp-content/uploads/2021/03/links.csv","links")

In [8]:
"""
Description: Loads the data and saves it in local variable for use. 
"""
links = load_links(file_path_links)
faq_data = load_faq(file_path_faq)

# Connecting to ElasticSearch Cloud
Connects to the ElasticSearch Cloud. If you are replicating it, please take a note of the base endpoint. <br>


*   Create account, do stuff, following this: https://youtu.be/mIHYcxe70fc
*   I have converted csv files to json, uploaded to "documents" section of ElasticSearch Enterprise Search.
*   CSV --> Json: https://csvjson.com/ 



In [9]:
config = { "appsearch":{
              "base_endpoint":"2aa49783e67340e585db1d090ca796d0.ent-search.eastus2.azure.elastic-cloud.com/api/as/v1",
              "api_key":"private-3dd7pg3n5dr5nmitvn8647i6"
              }
          }
#note: do not forget to add the "/api/as/v1" at the end of the endpoint. 
client = Client(
   base_endpoint=config['appsearch']['base_endpoint'],
   api_key=config['appsearch']['api_key'],
   use_https=True)
engine_name = "inf-faq"

# The set of Questions we would primarily test this with.

In [10]:
questions = ["Give me the link to the mentors",
             "Where can I find the podcasts about student jobs?",
             "Who can sign a care-of letter?",
             "If I lose my student id card, where should i report?",
             "What documents do i need for visa extension?",
             "Where can I print documents?",
             "which is the computer science faculty building?",
             "when should we apply for jobs?",
             "When will I get a student card?",
             "Where can I buy coffee?",
             "What’s a “care-of letter”?",
             "What is cold rent?",
             "How can I find accommodation?",
             "What is FIN?",
             "Where will I find English speaking Doctors?",
             "Where will i find doctors who speak English?",
             "how should i choose my subjects?",
             "Where can i get a list of subjects offered by the university?",
             "Which health insurance should i take?"
             ]

# The Huggingface RoBERTa stuff
Nothing fancy. Code is self explanatory.

In [11]:
model_name = "deepset/roberta-base-squad2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

Some weights of RobertaModel were not initialized from the model checkpoint at deepset/roberta-base-squad2 and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
def fetching_searchterm(passage):  
  passage = str(passage).replace("**",",")
  query = "What is he searching for?"
  QA_input = {
    'question': query,
    'context': passage
  }
  ans = nlp(QA_input)  
  if ans['score']>0.32:
    return(ans)
  else:
    return""

In [13]:
def ask_question_huggingfaceRoberta(user_question, passage):  
  passage = str(passage).replace("**",",")
  QA_input = {
    'question': user_question,
    'context': passage
  }
  ans = nlp(QA_input)  
  return(ans)

# The NLP stuff
Again, nothing fancy. Takes two sentences, calculates similarity (word movers distance). 

In [14]:
nlp_sim = spacy.load('en_core_web_lg')

In [15]:
def similarity_spacy(question1, question2):
    text1 = nlp_sim(question1)
    text2 = nlp_sim(question2)
    simi = text1.similarity(text2)
    return simi 

# The Q&A stuff
**find_probable_answer_from_faq**<br>
Takes user query, tries to find closest match in the FAQ questions. If there is a close match, it returns the answer.

**understanding_question_link**<br>
Tries to understand if the user is asking for a link. This piece of code is far from being perfect. I need help in this.

**get_link_answer**<br>
If we understand that the user is asking for a link, we check if we have the link. If we have, we return the link.

**answer_query**<br>
This is the main fancy program. I am adding inline comments to the code. Please check that.

In [73]:
def find_probable_answer_from_faq(query):
  selected_question = ""
  answer = ""
  highest_score = 0
  for index, row in faq_data.iterrows(): 
    similarity_score = similarity_spacy(row['question'], query) 
    if similarity_score>highest_score:
      answer = row['answers']
      question= row['question']
      highest_score = similarity_score
  
  if highest_score>0.9446: #Do not change this number.  
    return answer
  else: 
    return "" 

In [74]:
def preprocess_for_searchterm(query):
  query = query.replace("catalogue", "catalog")
  query = query.replace("examination", "exam")
  query = query.replace("module handbook", "modulehandbook")
  query = query.replace("module hand book", "modulehandbook")
  query = query.replace("module catalog", "modulehandbook")
  query = query.replace("modulhandbuch", "modulehandbook")
  query = query.replace("si@fin videos", "SI@FIN_videos")
  query = query.replace("exam office", "exam_office")
  query = query.replace("comic strips", "comics")
  return query

In [75]:
def fetching_interest(passage):  
  query = "What is he searching for?"
  QA_input = {
    'question': query,
    'context': passage
  }
  ans = nlp(QA_input) 
  if ans['score']>0.32:
    return(ans['answer'])
  else:
    return""

In [76]:
def understanding_question_link(query): 
  query = query.lower()
  query = preprocess_for_searchterm(query) 
  search_term = fetching_interest(query)
  return search_term

  if search_term == "":
    #do all the tamasha -_- 
    stopwords = ["a", "an", "the"]
    asking_for_link = False 
    asking = ['give','send', "what is", "search"]
    for ask in asking:
      if ask in query.split("link")[0]:
        asking_for_link = True #wrong logic

    if asking_for_link == True: 
      case = 0
      if "link to" in query or "link of" in query or "link for" in query: case = 1
      if "'s link" in query or "s link" in query: case = 2

      

      if case == 1:
        text_tokens = query.split("link")[1].strip().split(" ")
        tokens_without_sw = [word for word in text_tokens if not word in stopwords]
        return tokens_without_sw[1]
      
      if case == 2:
        topic = query.split("link")[0].strip().split(" ")[-1]
        topic = topic.replace("'s","")
        return topic
    
    return "not found"

In [77]:
understanding_question_link("search a list of courses offered by the university?")

''

In [78]:
def get_link_answer(query):
  topic = understanding_question_link(query) 
  try:
    if len(links[links['Description'].str.match(topic)])>0:
      return links[links['Description'].str.match(topic)].iloc[0]['Link']
  except:
    status = "pending"
  return ""

In [79]:
def preprocess(query):
  changes = {"course list":"module handbook",
             "list of courses":"module handbook",
             "subject":"course",
             "website link":"link",
             "website":"link",
             "si@fin":"SI@FIN",
             "SI@FIN videos":"SI@FIN_videos",
             "course videos":"SI@FIN_videos"
             }
  for key in changes:
    query = query.replace(key,changes[key]) 
  return query

In [90]:
def understand_intent(query, num):
  query = query.lower()
  intent_search = {"where is the":"search",
                    "where can I find the":"search",
                    "where will i find":"search",
                    "where can i get":"search",
                    "where will i get":"search" 
                  }
  searchterm = fetching_interest(query) 
  if (searchterm) and num == 0:
    return "search" 
  for key in intent_search:
    if key in query and num == 0: 
      return intent_search[key]
    if key in query and num == 1:  
      return query.replace(key, intent_search[key]) 
  if num==0: return ""
  if num==1: return query

In [86]:
def answer_query(query):
  #basic preprocessing. Very primitive work. Needs improvement. 
  query = preprocess(query) 
  intent = understand_intent(query, 0) 
  if (intent):print("intent: "+intent)
  #Checks for exact matches with the questions in FAQ.
  #Most probably, it will not be the case. 
  try:
    result_exact_match = faq_data[faq_data['question'].str.match(query)]
    if len(result_exact_match)>0: 
      return "found by exact match.", result_exact_match.iloc[0]['answers'].replace("**",",")
  except:
    status = "paining." #because i do not know what to do. LOL.
  
  #Checks if the user might be wanting a link as an answer.
  #If yes, and if the link exists, then it returns the link.
  #The exhaustive the list of links, the better the responses.
  #Currently, this is in a primitive state. 
  if "link" in query or intent == "search":
    temp_query = understand_intent(query, 1)
    answer = get_link_answer(temp_query)
    if len(answer)>0:
      return "Retrieved link.", answer  

  #Tries to find the best match in FAQ queries by ElasticSearch.
  #If yes, returns the entire passage as answer. 
  data = client.search(engine_name, query, {}) 
  score_passage = data['results'][0]['_meta']['score']
  if score_passage > 100: 
    return "found by similarity match by Elastic Search.", data['results'][0]['answers']['raw'].replace("**",",") 


  #Checks for the closest match in FAQ questions.
  #In case there is a match, it returns the answer. 
  answer = find_probable_answer_from_faq(query) 
  if len(answer) > 0: 
    return "found by similarity match.", answer.replace("**",",") + "\nInfo: (This answer is retrieved using similarity search from FAQ. In case this is not the answer you are looking for, please rephrase your question or ask a mentor.)"
  
  #This is the last case and I suppose this would be the most used case.
  #If the question is not matched, it will try searching for an answer from the literature.
  #The responses of this is not very accurate always. We do not want to give inaccurate answers to users.
  #To avoid this, we should do the following:
  # * keep adding questions to the repository
  # * write exhaustive answers in simple english sentences. Use less reference words 
  #.    like he, she, it, this, that. 
  answer = ask_question_huggingfaceRoberta(query, data['results'][0]['answers']['raw'])
  if answer['score'] > 0.01:
    return "found by RoBERTa.", answer['answer'].replace("**",",") + "\nInfo:(This answer might be wrong. Please consult with a mentor or the faculty.)"
  
  return "please consult a mentor."+answer_query("give me the link to mentors")

# The Moment of Truth.
It uses the existing set of questions. Tries to find answers to it.


In [96]:
# For the purpose of this notebook, we have a fixed set of questions.
# Ideally user_question is a query by the user. 
for query in questions:
  print('Users question: ' + query)
  explanation, final_answer = answer_query(query) 
  print(final_answer) 
  print("*"*50)

Users question: Give me the link to the mentors
intent: search
https://www.inf.ovgu.de/inf/en/Study/Being+a+student/Incoming/Mentors-p-5082.html
**************************************************
Users question: Where can I find the podcasts about student jobs?
Anirban Saha (mentor; July 2018 - July 2021) has made a series of podcasts where you’ll find students of M.Sc. Digital Engineering and Data and Knowledge Engineering share their experience doing student jobs in and near Magdeburg. Here is the link: https://www.anirbansaha.com/podcasts/student-jobs-datascience-students-magdeburg/
**************************************************
Users question: Who can sign a care-of letter?
your friend
Info:(This answer might be wrong. Please consult with a mentor or the faculty.)
**************************************************
Users question: If I lose my student id card, where should i report?
intent: search
If you have lost your student card, please report it immediately at the Campus Ser

# Q&A System.
Based on the FAQ for the incoming international students.<br>
Please ask full sentence questions, the way you would talk to a human being. -_- <br> and relevant to your onboarding in OVGU.<br>
Anirban Saha <br>
05.03.2021

In [97]:
query = input("query: ")
explanation, answer = answer_query(query)
print(answer)
if explanation == "found by RoBERTa.": print("Explanation: "+explanation)

query: today is a sunday. Where can i find doctors?
intent: search
There are some pharmacies which are open on Sundays, but they vary as they take turns. You can find information at https://m.aponet.de/notdienstsuche.html). You can also always call 116117, they will tell you.
Info: (This answer is retrieved using similarity search from FAQ. In case this is not the answer you are looking for, please rephrase your question or ask a mentor.)
