# Question Answering Bot using Chatterbot

ChatterBot is a Python library that makes it easy to generate automated responses to a user’s input. ChatterBot uses a selection of machine learning algorithms to produce different types of responses. This makes it easy for developers to create chat bots and automate conversations with users. For more details about ChatterBot, check out its [documentation page](https://chatterbot.readthedocs.io/en/stable/).

## 1. Import libraries

In [1]:
import pandas as pd
import numpy as np

import chatterbot
from chatterbot import ChatBot
from chatterbot.conversation import Statement
from chatterbot.trainers import ListTrainer
from chatterbot.trainers import ChatterBotCorpusTrainer
from chatterbot import *

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize 
from nltk.stem import WordNetLemmatizer
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

import fuzzywuzzy
from fuzzywuzzy import fuzz

[nltk_data] Downloading package punkt to /Users/macbook/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/macbook/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/macbook/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [2]:
vaccine_faq = pd.read_csv('datasets/Vaccine FAQ.csv', encoding = "ISO-8859-1", index_col = 0).reset_index()
covid_faq = pd.read_csv('datasets/COVID FAQ.csv', encoding = "ISO-8859-1", index_col = 0).reset_index()

In [3]:
vaccine_faq.head(2)

Unnamed: 0,Topic,Source,Date Accessed,Question,Answer,Source:
0,Vaccines General Info,Department of Health,3/31/2021,How do vaccines differ?,Vaccines differ in their composition and how t...,https://doh.gov.ph/node/28189
1,Vaccines General Info,Department of Health,3/31/2021,How does a flu vaccine differ from a COVID vac...,Vaccines differ in their composition and how t...,


In [4]:
covid_faq.head(2)

Unnamed: 0,Topic,Source,Last Updated,Question,Answer
0,COVID-19 Disease,World Health Organization,3/29/2021,What is COVID-19?,COVID-19 is the disease caused by a new corona...
1,Symptoms,World Health Organization,3/29/2021,What are the symptoms of COVID-19?,The most common symptoms of COVID-19 are fever...


## 2. Formatting dataset

### 2.1 Create functions

Create function for removing punctuation marks

In [5]:
def remove_punctuation(sentence):
    punc = "!()-[]{};:'\,<>./?@#$%^&*_~"
    
    for word in sentence: 
        if word in punc: 
            sentence = sentence.replace(word, "")
            
    return sentence

Create function for lemmatizing words

In [6]:
def lemmatize(sentence):
    lemmatizer = WordNetLemmatizer()
    word_tokens = word_tokenize(sentence) 
    
    lemmatized_sentence = ''
    
    for word in word_tokens:
        lemma = lemmatizer.lemmatize(word)
        lemmatized_sentence = lemmatized_sentence + str(lemma) + ' '
        
    return lemmatized_sentence.rstrip()

Create function for removing stopwords

In [7]:
def remove_stopwords(sentence):
    stop_words = set(stopwords.words('english'))  - set(['where', 'when', 'why'])
    word_tokens = word_tokenize(sentence) 

    filtered_sentence = [w for w in word_tokens if not w in stop_words] 
    filtered_sentence = ''

    for w in word_tokens: 
        if w not in stop_words: 
            filtered_sentence = filtered_sentence +  str(w) + ' '
    
    return filtered_sentence.rstrip()

Create function to replace different variations of "COVID-19"

In [8]:
def replace_tags(sentence):
    tags = {'covid19':'covid','covid-19':'covid','covid 19':'covid','corona':'covid','coronavirus':'covid','virus':'covid'}
    
    for i, j in tags.items():
        sentence = sentence.replace(i, j)
    
    return sentence

In [9]:
def preprocessor(sentence):
    return replace_tags(remove_stopwords(lemmatize(remove_punctuation(sentence.lower()))))

## 2.2 Process datasets

Turn dataframes into lists of lists

In [10]:
covid_list=[]
vaccine_list=[]
for i in range(0, len(vaccine_faq)):
    vaccine_list.append(vaccine_faq['Question'][i])
    vaccine_list.append(vaccine_faq['Answer'][i])
for i in range(0, len(covid_faq)):
    covid_list.append(covid_faq['Question'][i])
    covid_list.append(covid_faq['Answer'][i])

Process the lists

In [11]:
covidfaqclean=[]
vaccinefaqclean=[]
for i in range(0,len(covid_list)//2):
    covidfaqclean.append(preprocessor(covid_list[2*i]).rstrip())
    covidfaqclean.append(covid_list[2*i+1])
for i in range(0,len(vaccine_list)//2):
    vaccinefaqclean.append(preprocessor(vaccine_list[2*i]).rstrip())
    vaccinefaqclean.append(vaccine_list[2*i+1])

## 3. Creating ChatBot Instances

ChatBot instance for the COVID dataset

In [12]:
covidchatbot = ChatBot('COVIDBot', read_only = True,
    storage_adapter='chatterbot.storage.SQLStorageAdapter',
    logic_adapters=[      
        {'import_path': 'chatterbot.logic.BestMatch',
         'default_response': 'I am sorry, but I do not understand. I am still learning.',
         'statement_comparison_function': 'chatterbot.comparisons.jaccard_similarity',
         'maximum_similarity_threshold': 0.95},
        {'import_path': 'chatterbot.logic.BestMatch',
         'default_response': 'I am sorry, but I do not understand. I am still learning.',
         'statement_comparison_function': 'chatterbot.comparisons.levenshtein_distance',
         'maximum_similarity_threshold': 0.95},
        {'import_path': 'chatterbot.logic.BestMatch',
         'default_response': 'I am sorry, but I do not understand. I am still learning.',
         'statement_comparison_function': 'chatterbot.comparisons.synset_distance',
         'maximum_similarity_threshold': 0.95}
    ],
    database_uri='sqlite:///coviddatabase.sqlite3',
    trainer='chatterbot.trainers.ListTrainer'
)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/macbook/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /Users/macbook/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/macbook/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


ChatBot instance for the vaccine dataset

In [13]:
vaccinechatbot = ChatBot('VaccineBot', read_only = True,
    storage_adapter='chatterbot.storage.SQLStorageAdapter',
    logic_adapters=[      
        {'import_path': 'chatterbot.logic.BestMatch',
         'default_response': 'I am sorry, but I do not understand. I am still learning.',
         'statement_comparison_function': 'chatterbot.comparisons.jaccard_similarity',
         'maximum_similarity_threshold': 0.95},
        {'import_path': 'chatterbot.logic.BestMatch',
         'default_response': 'I am sorry, but I do not understand. I am still learning.',
         'statement_comparison_function': 'chatterbot.comparisons.levenshtein_distance',
         'maximum_similarity_threshold': 0.95},
        {'import_path': 'chatterbot.logic.BestMatch',
         'default_response': 'I am sorry, but I do not understand. I am still learning.',
         'statement_comparison_function': 'chatterbot.comparisons.synset_distance',
         'maximum_similarity_threshold': 0.95}
    ],
    database_uri='sqlite:///vaccinedatabase.sqlite3',
    trainer='chatterbot.trainers.ListTrainer'
)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/macbook/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /Users/macbook/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/macbook/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## 4. Training

In [14]:
#covidchatbot.storage.drop()
#vaccinechatbot.storage.drop()

Using COVID dataset

In [15]:
covid_training_data = covidfaqclean
covidtrainer = ListTrainer(covidchatbot)
covidtrainer.train(covid_training_data)

List Trainer: [####################] 100%


Using vaccine dataset

In [16]:
vaccine_training_data = vaccinefaqclean
vaccinetrainer = ListTrainer(vaccinechatbot)
vaccinetrainer.train(vaccine_training_data)

List Trainer: [####################] 100%


## 5. Getting recommended questions and feedback

Create function for getting recommended questions

In [17]:
def recommend_questions(question_input, dataset):
    output_list = [fuzz.token_set_ratio(i,question_input) for i in dataset["Question"]]
    similarity_score = list(enumerate(output_list))
    similarity_score = sorted(similarity_score, key=lambda x: x[1], reverse=True)
    similarity_score = similarity_score[1:6]
    question_indices = [i[0] for i in similarity_score]
    recommended=dataset[['Question','Answer']].iloc[question_indices]
    recommendeddf=pd.DataFrame(recommended)
    return print("Looking for another answer? Try these questions:"),print(recommendeddf.to_string(header=False, index=False))

Create function for getting feedback

In [18]:
def get_feedback():

    text = input()

    if 'yes' in text.lower():
        return True
    elif 'no' in text.lower():
        return False
    else:
        print('Please type either "Yes" or "No"')
        return get_feedback()

## 6. QA Process

In [None]:
while True:
    category = input('\nChoose category (COVID or Vaccine): ')
    
    if category.lower() == 'covid':
        question = input('\nQuestion: ')
        
        if question.lower() != 'end':
            response = covidchatbot.get_response(preprocessor(question))
            print('\nResponse: ' + str(response))
            print('Confidence: ' + str(response.confidence))
            print("\n")
            recommend_questions(question, vaccine_faq)
            print('\n I am still learning. Does the response answer your question? Please type yes or no.')            
            #if get_feedback() is False:
                #correct_response = input('Please input correct response: ')
                #trainer.train([preprocessor(question), correct_response])
                #print('Response added to bot!')

        else:
            break
            
    elif category.lower() == 'vaccine':
        question = input('\nQuestion: ')
        
        if question.lower() != 'end':
            response = vaccinechatbot.get_response(preprocessor(question))
            print('\nResponse: ' + str(response))
            print('Confidence: ' + str(response.confidence))
            print("\n")
            recommend_questions(question, vaccine_faq)
            print('\n I am still learning. Does the response answer your question? Please type yes or no.')            
            #if get_feedback() is False:
                #correct_response = input('Please input correct response: ')
                #trainer.train([preprocessor(question), correct_response])
                #print('Response added to bot!')

        else:
            break


Choose category (COVID or Vaccine): vaccine

Question: what vaccines are available?

Response: when covid vaccine ready distribution
Confidence: 0.02


Looking for another answer? Try these questions:
What are the candidate vaccines that can become available once they pass FDA requirements and are granted EUA.                                                                                                Our current vaccine portfolio consists of eight vaccines - Pfizer-BioNTech, Oxford AstraZeneca, Sinovac CoronaVac, Gamaleya Sputnik V, Bharat BioTech, Moderna, Novavax, and Janssen. As of March 05, 2021, only Pfizer, AstraZeneca, and Sinovac have been issued an Emergency Use Authorization (EUA) by the Philippine FDA.
                                                                     What are the available COVID-19 vaccines? The government is currently in the initial phase of vaccine rollout with the availability of Sinovac and AstraZeneca vaccines in the country. Likewise, the countr