This is an example of one of the many things you can do with this dataset. Here we will make a simple program to predict the emotion a person is feeling based on their message. We will start by importing the neccessary libraries

In [None]:
#Import numpy for linear algebra purposes
import numpy as np
#Import pandas to read the file
import pandas as pd
#Import a function to split our data for evaluation purposes
from sklearn.model_selection import train_test_split
#Import neccessary functions to process the input message
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.pipeline import Pipeline
#For this model, we will be using KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

Next up, we have to process our inputs to create an X and y for our model to learn from. We will not have to worry about vectorizing our inputs, as the pipeline will do that for us. We will also split our data using train_test_split so we can evaluate how well our model preforms

In [None]:
#Read the csv file
df = pd.read_csv('/kaggle/input/chatbot-dataset-topical-chat/topical_chat.csv')
#X will be our message, and y will be the emotion
X = df['message']
y = df['sentiment']
#Split our data for evaluation purposes, with our text size as 30% of the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

Now, it is time to actually make the model. Here we can use a Pipeline which vectorizes the input, transforms it, and then applies the KNeighborsClasisifer to it. The reason we are using KNeighborsClassifier is because it is much faster than other algorithms, and we will use 1 neighbor

In [None]:
#We will create our model to vectorize the text to be intepreted by KNeighborsClassifier
model = Pipeline([
    ("vect",CountVectorizer()),
    ("tfid",TfidfTransformer()),
    ("algorithm",KNeighborsClassifier(1))
])
#We will fit our pipeline to X_train and y_train to test it out
model.fit(X_train, y_train)

Now, it is time for the evaluation of the model. Before finding out the actual score of the model, we will create a function that can use the model to predict results, as we want to toy around with what we just created.

In [None]:
#This function uses our model to predict based on text 
def predict_emotion(text):
    return model.predict([text])[0]

#Here, we are testing out the model and it's responses
print(predict_emotion("I'm glad to hear that you are doing good! I love our conversation as of right now!"))
print(predict_emotion("You disguist me you horrible creature"))
print(predict_emotion("Please don't hurt me, I feel scared by you"))

Now that we've toyed around with it, it's time to actually evaluate how good our model is doing. 

In [None]:
#Convert our score to a string version of an integer from 1-100
score = model.score(X_test, y_test) * 100
score = str(int(score))
#Display the score 
print("Model accuracy: " + score + "%")

There are many ways that our model could be improved. The most obvious one is cleaning the text, by lowercasing it all along with deleting certain puncuation. You could also make a neural network use pretrained embeddings to further the accuracy. I will leave that for other people to experiment with. 