# Emotionally Aware Chatbot (EDAIC)
Emotion is one of the basic instincts of a human being. Emotion detection plays a vital role in the field of textual analysis. At present, people’s expressions and emotional states have turned into the leading topic for research works. People train chatbot that is a software program with artificial intelligence for detecting emotions and many other purposes. In this paper, a chatbot is created to converse with humans to find their emotional states through machine learning techniques.

EDAIC = Emotion Detection & Artificial Intellegent Chatbot

**Course No:** CSE4250

**Course Name:** Project & Thesis II

**Supervisor**


*   Md Khairul Hasan

**Team Members**

*   160204107 - Nowshin Rumali
*   170104061 - Amin Ahmed Toshib
*   170104116 - Rejone-E-Rasul Hridoy
*   170104118 - Mehedi Hasan Sami


In [None]:
# Basic Libraries
import pandas as pd
import numpy as np

# Visualization libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Text Libraries
import nltk 
import string
import re

# Feature Extraction Libraries
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.model_selection import train_test_split

# Classifier Model libraries
from sklearn.linear_model import SGDClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
import xgboost as xgb
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn import tree
from sklearn.neural_network import MLPClassifier
# from sklearn.pipeline import Pipeline

# Performance Matrix libraries
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
from sklearn.metrics import ConfusionMatrixDisplay

# other
import pickle
import os
import random
from scipy.spatial import distance
from datetime import datetime
!pip install pendulum
!pip install nameparser
import pendulum
from nameparser.parser import HumanName
from nltk.corpus import wordnet
import warnings
warnings.filterwarnings("ignore")

# Drive Mount
from google.colab import drive
drive.mount('/content/drive')
root_path = "/content/drive/MyDrive/CSE/4.1/CSE4100 - Project & Thesis-I/Emotion Detection Chatbot Papers/4.2/"
resource_root_path = "/content/drive/MyDrive/CSE/4.2/CSE4238 - Soft Computing Lab/Project - Emotion Detection from Twitter Text/"

Collecting pendulum
  Downloading pendulum-2.1.2-cp37-cp37m-manylinux1_x86_64.whl (155 kB)
[?25l[K     |██▏                             | 10 kB 23.0 MB/s eta 0:00:01[K     |████▎                           | 20 kB 25.7 MB/s eta 0:00:01[K     |██████▍                         | 30 kB 29.6 MB/s eta 0:00:01[K     |████████▌                       | 40 kB 23.1 MB/s eta 0:00:01[K     |██████████▋                     | 51 kB 11.6 MB/s eta 0:00:01[K     |████████████▊                   | 61 kB 12.2 MB/s eta 0:00:01[K     |██████████████▉                 | 71 kB 9.2 MB/s eta 0:00:01[K     |█████████████████               | 81 kB 10.1 MB/s eta 0:00:01[K     |███████████████████             | 92 kB 11.1 MB/s eta 0:00:01[K     |█████████████████████▏          | 102 kB 10.2 MB/s eta 0:00:01[K     |███████████████████████▎        | 112 kB 10.2 MB/s eta 0:00:01[K     |█████████████████████████▍      | 122 kB 10.2 MB/s eta 0:00:01[K     |███████████████████████████▌    | 133 k

# 1. Dataset

### 1.1 Chatbot Dataset

In [None]:
df_chatbot = pd.read_csv(root_path+'Chatbot Dataset_v12.11.csv',encoding='ISO-8859-1')
df_chatbot = df_chatbot.dropna(axis=0)
df_chatbot

Unnamed: 0,User,Chatbot,Intent
1,Hello,Hi <HUMAN> how are you?,Greeting
2,Hi,Hello <HUMAN> how are you?,Greeting
3,Hola,Hi <HUMAN> how are you?,Greeting
4,Hi there,Hi <HUMAN> how are you?,Greeting
5,Hya there,Hi <HUMAN> how are you?,Greeting
...,...,...,...
2363,Today I meditated for 30 minutes and I feel am...,I am glad you felt better after meditating,Surprise_Amazed
2364,I broke my foot,I am sorry to hear your foot broke,Health
2365,I broke my foot,I am sorry to hear you broke your foot,Health
2366,My boss gave me priase in front of the group a...,I am glad your work was praised,Happy_Excited_Joy


In [None]:
df_emotion = pd.read_csv(root_path+'text_emotions_neutral.csv')
df_emotion

Unnamed: 0,content,sentiment
0,i didnt feel humiliated,sadness
1,i can go from feeling so hopeless to so damned...,sadness
2,im grabbing a minute to post i feel greedy wrong,anger
3,i am ever feeling nostalgic about the fireplac...,love
4,i am feeling grouchy,anger
...,...,...
24995,Yeah. Did you know that in Nevada there is a...,Neutral
24996,"I wonder why, not many have had facial hair a...",Neutral
24997,"That is sad, it is bad that we really wind up...",Neutral
24998,Same here. Since 1900 the taller candidate h...,Neutral


# 2. Preporcessing Data

In [None]:
class Preprocessing:
    def __init__(self,Remove_stopwords=True):
        self.emojis = pd.read_csv(resource_root_path+'emojis.txt',sep=',',header=None)
        self.emojis_dict = {i:j for i,j in zip(self.emojis[0],self.emojis[1])}
        self.pattern = '|'.join(sorted(re.escape(k) for k in self.emojis_dict))
        nltk.download('stopwords')
        nltk.download('wordnet')
        self.rmv_stopword = Remove_stopwords

    def replace_emojis(self, text):
        text = re.sub(self.pattern,lambda m: self.emojis_dict.get(m.group(0)), text, flags=re.IGNORECASE)
        return text

    def remove_punct(self, text):
        text = self.replace_emojis(text)
        text  = "".join([char for char in text if char not in string.punctuation])
        text = re.sub('[0-9]+', '', text)
        return text

    def tokenization(self, text):
        text = text.lower()
        text = re.split('\W+', text)
        return text

    def remove_stopwords(self, text):
        stopword = nltk.corpus.stopwords.words('english')
        stopword.extend(['yr', 'year', 'woman', 'man', 'girl','boy','one', 'two', 'sixteen', 'yearold', 'fu', 'weeks', 'week',
              'treatment', 'associated', 'patients', 'may','day', 'case','old','u','n','didnt','ive','ate','feel','keep'
                ,'brother','dad','basic','im',''])
        
        text = [word for word in text if word not in stopword]
        return text

    def lemmatizer(self, text):
        wn = nltk.WordNetLemmatizer()
        text = [wn.lemmatize(word) for word in text]
        return text

    def clean_text(self, text):
        text = self.remove_punct(text)
        text = self.tokenization(text)
        if self.rmv_stopword == True:
            text = self.remove_stopwords(text)
        text = self.lemmatizer(text)
        return text

In [None]:
preprocess = Preprocessing(Remove_stopwords=False)

df_test = df_chatbot['User'].apply(lambda x: preprocess.clean_text(x))
df_test

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


1                                                [hello]
2                                                   [hi]
3                                                 [hola]
4                                            [hi, there]
5                                           [hya, there]
                             ...                        
621    [i, even, feel, surprised, if, it, dark, outside]
622    [i, have, chose, something, for, myself, that,...
623            [the, thunderstorm, really, surprise, me]
624    [i, am, amazed, by, nature, and, amazed, by, l...
625                      [wow, thats, really, amazing, ]
Name: User, Length: 596, dtype: object

# 3. Feature Extraction

In [None]:
class FeatureExtraction:
    def __init__(self,rmv_stopword=True):
        self.rmv_stopword = rmv_stopword
        self.preprocess = Preprocessing(self.rmv_stopword)
        self.countVectorizer1 = CountVectorizer(analyzer=self.preprocess.clean_text)
        self.tfidf_transformer_xtrain = TfidfTransformer()
        self.tfidf_transformer_xtest = TfidfTransformer()

    def get_features(self, X_train, X_test):
        # countVectorizer1 = CountVectorizer(analyzer=self.preprocess.clean_text)
        countVector1 = self.countVectorizer1.fit_transform(X_train)

        countVector2 = self.countVectorizer1.transform(X_test)

        # tfidf_transformer_xtrain = TfidfTransformer()
        x_train = self.tfidf_transformer_xtrain.fit_transform(countVector1)

        # tfidf_transformer_xtest = TfidfTransformer()
        x_test = self.tfidf_transformer_xtest.fit_transform(countVector2)

        return x_train, x_test

    def get_processed_text(self, input_str):
        return self.tfidf_transformer_xtest.fit_transform(self.countVectorizer1.transform([input_str]))


## 3.1 Train Test Split

In [None]:
X_train_ed, X_test_ed, y_train_ed, y_test_ed = train_test_split(df_emotion['content'], df_emotion['sentiment'],test_size=0.3, random_state = 116)
X_train_cb, X_test_cb, y_train_cb, y_test_cb = train_test_split(df_chatbot['User'], df_chatbot['Intent'],test_size=0.25, random_state = 16)

fe_cb = FeatureExtraction(rmv_stopword=False)
fe_ed = FeatureExtraction(rmv_stopword=True)

x_train_ed, x_test_ed = fe_ed.get_features(X_train_ed, X_test_ed)
x_train_cb, x_test_cb = fe_cb.get_features(X_train_cb, X_test_cb)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [None]:
x_train_ed.shape

(17500, 14646)

# 4. Models
1. Support Vector Machine (SVM)
2. Logistic Regression
3. Random Forest Classifier
4. XGBoost Classifier
5. Multinomial Naive Bayes
6. Decision Tree Classifier
7. Multi-Layer Perceptron (MLP)

**Performance Matrix :**
1.   **Accuracy** = $\frac{\text{TP+TN}}{\text{TP+TN+FP+FN}}$
2.   **Precision** = $\frac{\text{TP}}{\text{TP+FP}}$ 
3.   **Recall** = $\frac{\text{TP}}{\text{TP+FN}}$ 
4.   **F1-score** = $\frac{\text{2*Precision*Recall}}{\text{Precision+Recall}}$ 

In [None]:
class Models:
    def __init__(self, X_train, Y_train, X_test, Y_test, model_name='cb'):
        self.x_train = X_train
        self.x_test = X_test
        self.y_test = Y_test
        self.y_train = Y_train
        self.chatbot_model_file = root_path+'/Models/Chatbot Models_7 models.pkl'
        self.emotion_model_file = root_path+'/Models/Emotion Detection Models_7 models.pkl'

        self.chatbot_summary_file = root_path+'/Models/Chatbot Models Summary_7 models.pkl'
        self.emotion_summary_file = root_path+'/Models/Emotion Detection Models Summary_7 models.pkl'
        self.model_name = model_name    # cb = chatbot model, ed = emotion detection model

        self.svm = SGDClassifier()
        self.logisticRegr = LogisticRegression()
        self.rfc = RandomForestClassifier(n_estimators=1, random_state=0)
        self.xgbc = XGBClassifier(max_depth=16, n_estimators=1000,nthread = 6)
        self.mnb = MultinomialNB()
        self.dt = tree.DecisionTreeClassifier()
        self.mlp = MLPClassifier(random_state=5, max_iter=300)

        self.svm_summary = {}
        self.lr_summary = {}
        self.rfc_summary = {}
        self.xgbc_summary = {}
        self.mnb_summary = {}
        self.dt_summary = {}
        self.mlp_summary = {}

    def load_models(self):
        if self.model_name == 'ed':
            if os.path.isfile(self.emotion_model_file):
                with open(self.emotion_model_file,'rb') as f:
                    self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp = pickle.load(f)

                with open(self.emotion_summary_file,'rb') as f:
                    self.svm_summary, self.lr_summary, self.rfc_summary, self.xgbc_summary, self.mnb_summary, self.dt_summary, self.mlp_summary = pickle.load(f)
                    print('Emotion Detection Models retrived from Disk successfully')
                    return self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp
            else:
                self.train_models()
                self.save_models()
                return self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp
        elif self.model_name == 'cb':
            if os.path.isfile(self.chatbot_model_file):
                with open(self.chatbot_model_file,'rb') as f:
                    self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp = pickle.load(f)

                with open(self.chatbot_summary_file,'rb') as f:
                    self.svm_summary, self.lr_summary, self.rfc_summary, self.xgbc_summary, self.mnb_summary, self.dt_summary, self.mlp_summary = pickle.load(f)
                    print('Chabot Models retrived from Disk successfully')
                    return self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp
            else:
                self.train_models()
                self.save_models()
                return self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp
    def train_models(self):
        print('-----Model Training-----')
        print('Training SVM...')
        self.SVM()
        print('Training Logistic Regression...')
        self.LR()
        print('Training Random Forest...')
        self.RFC()
        print('Training XGBoost...')
        self.XGBC()
        print('Training Multinomial Naive Bayes...')
        self.MNB()
        print('Training Decision Tree...')
        self.DT()
        print('Training Multi-Layer Perceptron Model...')
        self.MLP()
        print('Successfully Trained All Models')

        return self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp
          

    def SVM(self):
        self.svm.fit(self.x_train, self.y_train)
        y_pred = self.svm.predict(self.x_test)

        svm_acc = round(accuracy_score(y_pred, self.y_test)*100,3)
        svm_prec = round(precision_score(self.y_test, y_pred, average='macro')*100,3)
        svm_recal = round(recall_score(self.y_test, y_pred, average='macro')*100,3)
        svm_cm = confusion_matrix(self.y_test,y_pred)
        svm_f1 = round(f1_score(self.y_test, y_pred, average='macro')*100,3)
        self.svm_summary['Accuracy'] = svm_acc
        self.svm_summary['Precision'] = svm_prec
        self.svm_summary['Recall'] = svm_recal
        self.svm_summary['F1'] = svm_f1
        self.svm_summary['CM'] = svm_cm
    
    def LR(self):
        self.logisticRegr.fit(self.x_train, self.y_train)

        y_pred = self.logisticRegr.predict(self.x_test)

        lr_acc = round(accuracy_score(y_pred, self.y_test)*100,3)
        lr_prec = round(precision_score(self.y_test, y_pred, average='macro')*100,3)
        lr_recal = round(recall_score(self.y_test, y_pred, average='macro')*100,3)
        lr_cm = confusion_matrix(self.y_test,y_pred)
        lr_f1 = round(f1_score(self.y_test, y_pred, average='macro')*100,3)
        self.lr_summary['Accuracy'] = lr_acc
        self.lr_summary['Precision'] = lr_prec
        self.lr_summary['Recall'] = lr_recal
        self.lr_summary['F1'] = lr_f1
        self.lr_summary['CM'] = lr_cm

    def RFC(self):
        self.rfc.fit(self.x_train, self.y_train)

        y_pred = self.rfc.predict(self.x_test)

        rfc_acc = round(accuracy_score(y_pred, self.y_test)*100,3)
        rfc_prec = round(precision_score(self.y_test, y_pred, average='macro')*100,3)
        rfc_recal = round(recall_score(self.y_test, y_pred, average='macro')*100,3)
        rfc_cm = confusion_matrix(self.y_test,y_pred)
        rfc_f1 = round(f1_score(self.y_test, y_pred, average='macro')*100,3)
        self.rfc_summary['Accuracy'] = rfc_acc
        self.rfc_summary['Precision'] = rfc_prec
        self.rfc_summary['Recall'] = rfc_recal
        self.rfc_summary['F1'] = rfc_f1
        self.rfc_summary['CM'] = rfc_cm

    def XGBC(self):
        self.xgbc.fit(self.x_train,self.y_train)
        y_pred = self.xgbc.predict(self.x_test)

        xgbc_acc = round(accuracy_score(y_pred, self.y_test)*100,3)
        xgbc_prec = round(precision_score(self.y_test, y_pred, average='macro')*100,3)
        xgbc_recal = round(recall_score(self.y_test, y_pred, average='macro')*100,3)
        xgbc_cm = confusion_matrix(self.y_test,y_pred)
        xgbc_f1 = round(f1_score(self.y_test, y_pred, average='macro')*100,3)
        self.xgbc_summary['Accuracy'] = xgbc_acc
        self.xgbc_summary['Precision'] = xgbc_prec
        self.xgbc_summary['Recall'] = xgbc_recal
        self.xgbc_summary['F1'] = xgbc_f1
        self.xgbc_summary['CM'] = xgbc_cm

    def MNB(self):
        self.mnb.fit(self.x_train, self.y_train)

        y_pred = self.mnb.predict(self.x_test)

        mnb_acc = round(accuracy_score(y_pred, self.y_test)*100,3)
        mnb_prec = round(precision_score(self.y_test, y_pred, average='macro')*100,3)
        mnb_recal = round(recall_score(self.y_test, y_pred, average='macro')*100,3)
        mnb_cm = confusion_matrix(self.y_test,y_pred)
        mnb_f1 = round(f1_score(self.y_test, y_pred, average='macro')*100,3)
        self.mnb_summary['Accuracy'] = mnb_acc
        self.mnb_summary['Precision'] = mnb_prec
        self.mnb_summary['Recall'] = mnb_recal
        self.mnb_summary['F1'] = mnb_f1
        self.mnb_summary['CM'] = mnb_cm

    def DT(self):
        self.dt.fit(self.x_train, self.y_train)
        y_pred = self.dt.predict(self.x_test)

        dt_acc = round(accuracy_score(y_pred, self.y_test)*100,3)
        dt_prec = round(precision_score(self.y_test, y_pred, average='macro')*100,3)
        dt_recal = round(recall_score(self.y_test, y_pred, average='macro')*100,3)
        dt_cm = confusion_matrix(self.y_test,y_pred)
        dt_f1 = round(f1_score(self.y_test, y_pred, average='macro')*100,3)
        self.dt_summary['Accuracy'] = dt_acc
        self.dt_summary['Precision'] = dt_prec
        self.dt_summary['Recall'] = dt_recal
        self.dt_summary['F1'] = dt_f1
        self.dt_summary['CM'] = dt_cm

    def MLP(self):
        self.mlp.fit(self.x_train, self.y_train)
        y_pred = self.mlp.predict(self.x_test)

        mlp_acc = round(accuracy_score(y_pred, self.y_test)*100,3)
        mlp_prec = round(precision_score(self.y_test, y_pred, average='macro')*100,3)
        mlp_recal = round(recall_score(self.y_test, y_pred, average='macro')*100,3)
        mlp_cm = confusion_matrix(self.y_test,y_pred)
        mlp_f1 = round(f1_score(self.y_test, y_pred, average='macro')*100,3)
        self.mlp_summary['Accuracy'] = mlp_acc
        self.mlp_summary['Precision'] = mlp_prec
        self.mlp_summary['Recall'] = mlp_recal
        self.mlp_summary['F1'] = mlp_f1
        self.mlp_summary['CM'] = mlp_cm

    def model_summary(self):
        return self.svm_summary, self.lr_summary, self.rfc_summary, self.xgbc_summary, self.mnb_summary, self.dt_summary, self.mlp_summary

    def save_models(self):
      if self.model_name == 'ed':
          with open(self.emotion_model_file, 'wb') as f:
              pickle.dump([self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp], f)

          with open(self.emotion_summary_file, 'wb') as f:
              pickle.dump([self.svm_summary, self.lr_summary, self.rfc_summary, self.xgbc_summary, self.mnb_summary, self.dt_summary, self.mlp_summary], f)

          print('Emotion Detection Models saved successfully in the disk')
      elif self.model_name == 'cb':
          with open(self.chatbot_model_file, 'wb') as f:
              pickle.dump([self.svm, self.logisticRegr, self.rfc, self.xgbc, self.mnb, self.dt, self.mlp], f)

          with open(self.chatbot_summary_file, 'wb') as f:
              pickle.dump([self.svm_summary, self.lr_summary, self.rfc_summary, self.xgbc_summary, self.mnb_summary, self.dt_summary, self.mlp_summary], f)

          print('Chatbot Models saved successfully in the disk')

In [None]:
chatbot_models = Models(x_train_cb, y_train_cb, x_test_cb, y_test_cb, model_name='cb')
emotion_models = Models(x_train_ed, y_train_ed, x_test_ed, y_test_ed, model_name='ed')

# svm_cb, logisticRegr_cb, rfc_cb, xgbc_cb, mnb_cb, dt_cb = chatbot_models.train_models()
# svm_summary_cb, lr_summary_cb, rfc_summary_cb, xgbc_summary_cb, mnb_summary_cb, dt_summary_cb = chatbot_models.model_summary()

## Load the models

In [None]:
svm_cb, logisticRegr_cb, rfc_cb, xgbc_cb, mnb_cb, dt_cb, mlp_cb = chatbot_models.load_models()
svm_summary_cb, lr_summary_cb, rfc_summary_cb, xgbc_summary_cb, mnb_summary_cb, dt_summary_cb, mlp_summary_cb = chatbot_models.model_summary()

svm_ed, logisticRegr_ed, rfc_ed, xgbc_ed, mnb_ed, dt_ed, mlp_ed = emotion_models.load_models()
svm_summary_ed, lr_summary_ed, rfc_summary_ed, xgbc_summary_ed, mnb_summary_ed, dt_summary_ed, mlp_summary_ed = emotion_models.model_summary()

Chabot Models retrived from Disk successfully
Emotion Detection Models retrived from Disk successfully


## Train the models

In [None]:
svm_cb, logisticRegr_cb, rfc_cb, xgbc_cb, mnb_cb, dt_cb, mlp_cb = chatbot_models.train_models()
svm_summary_cb, lr_summary_cb, rfc_summary_cb, xgbc_summary_cb, mnb_summary_cb, dt_summary_cb, mlp_summary_cb = chatbot_models.model_summary()
chatbot_models.save_models()

svm_ed, logisticRegr_ed, rfc_ed, xgbc_ed, mnb_ed, dt_ed, mlp_ed = emotion_models.train_models()
svm_summary_ed, lr_summary_ed, rfc_summary_ed, xgbc_summary_ed, mnb_summary_ed, dt_summary_ed, mlp_summary_ed = emotion_models.model_summary()
emotion_models.save_models()

print('Accuracy of Emotion Detection Model')
print('SVM:',svm_summary_ed['Accuracy'])
print('Logistic Regression:',lr_summary_ed['Accuracy'])
print('Random Forest:',rfc_summary_ed['Accuracy'])
print('XGBoost:',xgbc_summary_ed['Accuracy'])
print('Naive Bayes:',mnb_summary_ed['Accuracy'])
print('Decision Tree:',dt_summary_ed['Accuracy'])
print('MLP:',mlp_summary_ed['Accuracy'])

print('\n\nAccuracy of Chatbot Model')
print('SVM:',svm_summary_cb['Accuracy'])
print('Logistic Regression:',lr_summary_cb['Accuracy'])
print('Random Forest:',rfc_summary_cb['Accuracy'])
print('XGBoost:',xgbc_summary_cb['Accuracy'])
print('Naive Bayes:',mnb_summary_cb['Accuracy'])
print('Decision Tree:',dt_summary_cb['Accuracy'])
print('MLP:',mlp_summary_cb['Accuracy'])

-----Model Training-----
Training SVM...
Training Logistic Regression...
Training Random Forest...
Training XGBoost...
Training Multinomial Naive Bayes...
Training Decision Tree...
Training Multi-Layer Perceptron Model...
Successfully Trained All Models
Chatbot Models saved successfully in the disk
-----Model Training-----
Training SVM...
Training Logistic Regression...
Training Random Forest...
Training XGBoost...
Training Multinomial Naive Bayes...
Training Decision Tree...
Training Multi-Layer Perceptron Model...
Successfully Trained All Models
Emotion Detection Models saved successfully in the disk
Accuracy of Emotion Detection Model
SVM: 88.653
Logistic Regression: 86.12
Random Forest: 66.867
XGBoost: 88.84
Naive Bayes: 71.587
Decision Tree: 76.107
MLP: 85.307


Accuracy of Chatbot Model
SVM: 70.543
Logistic Regression: 53.876
Random Forest: 44.186
XGBoost: 51.744
Naive Bayes: 41.085
Decision Tree: 49.806
MLP: 70.736


In [None]:
print('Accuracy')
print('SVM:',svm_summary_ed['Accuracy'])
print('Logistic Regression:',lr_summary_ed['Accuracy'])
print('Random Forest:',rfc_summary_ed['Accuracy'])
print('XGBoost:',xgbc_summary_ed['Accuracy'])
print('Naive Bayes:',mnb_summary_ed['Accuracy'])
print('Decision Tree:',dt_summary_ed['Accuracy'])
print('Decision Tree:',mlp_summary_ed['Accuracy'])

Accuracy
SVM: 88.533
Logistic Regression: 86.12
Random Forest: 66.867
XGBoost: 88.84
Naive Bayes: 71.587
Decision Tree: 75.973


# 5. Prediction

In [None]:
class Chatbot:
    def __init__(self):
        accuracies = np.array([svm_summary_cb['Accuracy'], lr_summary_cb['Accuracy'], rfc_summary_cb['Accuracy'], xgbc_summary_cb['Accuracy'], 
             mnb_summary_cb['Accuracy'], dt_summary_cb['Accuracy'], mlp_summary_cb['Accuracy']])
        norm_accuracy = accuracies - min(accuracies)
        self.model_weight = norm_accuracy/sum(norm_accuracy)
        self.Intents = df_chatbot['Intent'].unique()
        self.Human_name = 'Hridoy'

    def response_generate(self, text, intent_name):
        reply = self.respond(text, intent_name)
        return reply

    def cosine_distance_countvectorizer_method(self, s1, s2):    
        # sentences to list
        allsentences = [s1 , s2]

        # text to vector
        vectorizer = CountVectorizer()
        all_sentences_to_vector = vectorizer.fit_transform(allsentences)

        text_to_vector_v1 = all_sentences_to_vector.toarray()[0].tolist()
        text_to_vector_v2 = all_sentences_to_vector.toarray()[1].tolist()

        # distance of similarity
        cosine = distance.cosine(text_to_vector_v1, text_to_vector_v2)
        return round((1-cosine),2)

    def respond(self, text, intent_name):
        maximum = float('-inf')
        response = ""
        closest = ""
        replies = {}
        list_sim, list_replies = [],[]
        dataset = df_chatbot[df_chatbot['Intent']==intent_name]
        for i in dataset.iterrows():
            sim = self.cosine_distance_countvectorizer_method(text, i[1]['User'])
            list_sim.append(sim)
            list_replies.append(i[1]['Chatbot'])

        for i in range(len(list_sim)):
            if list_sim[i] in replies:
                replies[list_sim[i]].append(list_replies[i])
            else:
                replies[list_sim[i]] = list()
                replies[list_sim[i]].append(list_replies[i])
        d1 = sorted(replies.items(), key = lambda pair:pair[0],reverse=True)
        return d1[0][1][random.randint(0,len(d1[0][1])-1)]


    def extract_best_intent(self, list_intent_pred):
        intent_scores = {}
        for intent in self.Intents:
            intent_scores[intent] = 0.0   
        for i in range(len(list_intent_pred)):
            intent_scores[list_intent_pred[i]] += self.model_weight[i]
        si = sorted(intent_scores.items(), key = lambda pair:pair[1],reverse=True)[:6]
        return si[0][0], round(si[0][1],2)

    def get_human_names(self, text):
        person_list = []
        person_names=person_list
        tokens = nltk.tokenize.word_tokenize(text)
        pos = nltk.pos_tag(tokens)
        sentt = nltk.ne_chunk(pos, binary = False)

        person = []
        name = ""
        for subtree in sentt.subtrees(filter=lambda t: t.label() == 'PERSON'):
            for leaf in subtree.leaves():
                person.append(leaf[0])
            if len(person) > 0: #avoid grabbing lone surnames
                for part in person:
                    name += part + ' '
                if name[:-1] not in person_list:
                    person_list.append(name[:-1])
                name = ''
            person = []
        # print (person_list)
        return person_list

    def replace_tag(self, text):
        text = text.replace('<HUMAN>',self.Human_name)

        # get current time
        BDT = pendulum.timezone('Asia/Dhaka')
        cdt = datetime.timetuple(datetime.now(BDT))
        hrs = int(cdt[3])
        am_pm = 'am'
        if int(cdt[3]) > 12:
            hrs = int(cdt[3]) - 12
            am_pm = 'pm'

        current_time = str(cdt[2])+'-'+str(cdt[1])+'-'+str(cdt[0]) + ' '+ str(hrs)+':'+str(cdt[4])+' '+am_pm
        text = text.replace('<TIME>',current_time)
        return text

    def chatbot_reply(self, text):
        processed_text = fe_cb.get_processed_text(text)

        if self.get_human_names(text):
            self.Human_name = self.get_human_names(text)[0]

        print('Intent using SVM: ',end = '')
        svm_intent = svm_cb.predict(processed_text)[0]
        lr_intent = logisticRegr_cb.predict(processed_text)[0]
        dt_intent = dt_cb.predict(processed_text)[0]
        mnb_intent = mnb_cb.predict(processed_text)[0]
        xgbc_intent = xgbc_cb.predict(processed_text)[0]
        rfc_intent = rfc_cb.predict(processed_text)[0]
        mlp_intent = mlp_cb.predict(processed_text)[0]
        print(svm_intent)
        
        print('Intent using Logistic Regression: ',end = '')
        print(lr_intent)
        print('Intent using Decision Tree: ',end = '')
        print(dt_intent)
        print('Intent using Naive Bayes: ',end = '')
        print(mnb_intent)
        print('Intent using XGBoost: ',end = '')
        print(xgbc_intent)
        print('Intent using Random Forest: ',end = '')
        print(rfc_intent)
        print('Intent using Multi-Layer Perceptron: ',end = '')
        print(mlp_intent)


        # generating reply
        list_intent = [svm_intent, lr_intent, rfc_intent, xgbc_intent, mnb_intent, dt_intent, mlp_intent]
        best_intent, prob = self.extract_best_intent(list_intent)
        print('Best Intent:',best_intent,':',prob)

        reply = "I don't understand. Please be specific" if prob < 0.4 else self.response_generate(text, best_intent)

        reply = self.replace_tag(reply)
        print('EDAIC:',reply)
        print()
        return reply, prob, best_intent

In [None]:
class Emotion:
    def __init__(self):
        self.Emotions = df_emotion['sentiment'].unique()
        accuracies = np.array([svm_summary_ed['Accuracy'], lr_summary_ed['Accuracy'], rfc_summary_ed['Accuracy'], xgbc_summary_ed['Accuracy'], 
             mnb_summary_ed['Accuracy'], dt_summary_ed['Accuracy'], mlp_summary_ed['Accuracy']])
        norm_accuracy = accuracies - min(accuracies)
        self.emotion_model_weight = norm_accuracy/sum(norm_accuracy)

    def extract_best_emotion(self, list_emotion_pred):
        emotion_scores = {}
        for emotions in self.Emotions:
            emotion_scores[emotions] = 0.0   
        for i in range(len(list_emotion_pred)):
            emotion_scores[list_emotion_pred[i]] += self.emotion_model_weight[i]
        se = sorted(emotion_scores.items(), key = lambda pair:pair[1],reverse=True)
        return se[0][0], round(se[0][1],2)

    def detect_emotion(self, text):
        processed_text = fe_ed.get_processed_text(text)

        svm_emotion = svm_ed.predict(processed_text)[0]
        lr_emotion = logisticRegr_ed.predict(processed_text)[0]
        dt_emotion = dt_ed.predict(processed_text)[0]
        mnb_emotion = mnb_ed.predict(processed_text)[0]
        xgbc_emotion = xgbc_ed.predict(processed_text)[0]
        rfc_emotion = rfc_ed.predict(processed_text)[0]
        mlp_emotion = mlp_ed.predict(processed_text)[0]

        list_emotion_pred = [svm_emotion, lr_emotion, rfc_emotion, xgbc_emotion, mnb_emotion, dt_emotion, mlp_emotion]
        best_emotion, prob = self.extract_best_emotion(list_emotion_pred)
        print('Best Emotion:',best_emotion,':',prob)

        print('Emotion using SVM: ',end = '')
        print(svm_emotion)
        print('Emotion using Logistic Regression: ',end = '')
        print(lr_emotion)
        print('Emotion using Decision Tree: ',end = '')
        print(dt_emotion)
        print('Emotion using Naive Bayes: ',end = '')
        print(mnb_emotion)
        print('Emotion using XGBoost: ',end = '')
        print(xgbc_emotion)
        print('Emotion using Random Forest: ',end = '')
        print(rfc_emotion)
        print('Emotion using Multi-Layer Perceptron ',end = '')
        print(mlp_emotion)
        print()
        return best_emotion, prob

In [None]:
chatbot = Chatbot()
emotion = Emotion()

In [None]:
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


True

In [None]:
msg = "I am very much excited about my birthday"

_, _, _ = chatbot.chatbot_reply(msg)

_, _ = emotion.detect_emotion(msg)

Intent using SVM: Info
Intent using Logistic Regression: Sad
Intent using Decision Tree: Angry_Frustrated
Intent using Naive Bayes: Sad
Intent using XGBoost: Farewell
Intent using Random Forest: Farewell
Intent using Multi-Layer Perceptron: Info
Best Intent: Info : 0.63
EDAIC: What are you going to do?

Best Emotion:  Neutral : 0.95
Emotion using SVM:  Neutral
Emotion using Logistic Regression:  Neutral
Emotion using Decision Tree:  Neutral
Emotion using Naive Bayes: sadness
Emotion using XGBoost:  Neutral
Emotion using Random Forest:  Neutral
Emotion using Multi-Layer Perceptron  Neutral



### Determining Model Weights using the basics of Ensemble Learning

In [None]:
accuracies = np.array([svm_summary_cb['Accuracy'], lr_summary_cb['Accuracy'], rfc_summary_cb['Accuracy'], xgbc_summary_cb['Accuracy'], 
             mnb_summary_cb['Accuracy'], dt_summary_cb['Accuracy'], mlp_summary_cb['Accuracy']])
norm_accuracy = accuracies - min(accuracies)
chatbot_model_weight = norm_accuracy/sum(norm_accuracy)

print('SVM:',svm_summary_cb['Accuracy'])
print('Logistic regression:',lr_summary_cb['Accuracy'])
print('Random forest:',rfc_summary_cb['Accuracy'])
print('XGBoost:',xgbc_summary_cb['Accuracy'])
print('Naive Bayes:',mnb_summary_cb['Accuracy'])
print('Decision Tree:',dt_summary_cb['Accuracy'])
print('MLP:',mlp_summary_cb['Accuracy'])
chatbot_model_weight

SVM: 59.167
Logistic regression: 44.167
Random forest: 30.833
XGBoost: 35.0
Naive Bayes: 24.167
Decision Tree: 38.333
MLP: 61.667


array([0.28188298, 0.16107599, 0.05368663, 0.08724681, 0.        ,
       0.11409012, 0.30201748])

In [None]:
accuracies = np.array([svm_summary_ed['Accuracy'], lr_summary_ed['Accuracy'], rfc_summary_ed['Accuracy'], xgbc_summary_ed['Accuracy'], 
             mnb_summary_ed['Accuracy'], dt_summary_ed['Accuracy'], mlp_summary_ed['Accuracy']])
norm_accuracy = accuracies - min(accuracies)
chatbot_model_weight = norm_accuracy/sum(norm_accuracy)


print('SVM:',svm_summary_ed['Accuracy'])
print('Logistic regression:',lr_summary_ed['Accuracy'])
print('Random forest:',rfc_summary_ed['Accuracy'])
print('XGBoost:',xgbc_summary_ed['Accuracy'])
print('Naive Bayes:',mnb_summary_ed['Accuracy'])
print('Decision Tree:',dt_summary_ed['Accuracy'])
print('MLP:',mlp_summary_ed['Accuracy'])
chatbot_model_weight

SVM: 88.52
Logistic regression: 86.12
Random forest: 66.867
XGBoost: 88.84
Naive Bayes: 71.587
Decision Tree: 75.893
MLP: 85.307


array([0.22777047, 0.20252459, 0.        , 0.23113659, 0.04965024,
       0.09494556, 0.19397255])