# Emotion Classifier Using mlpack
This notebook is an attempt at using mlpack to classify the emotional content of text inputs. The dataset used to train this model was obtained from Kaggle, from the [Emotion dataset for NLP](https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp).

In [1]:
import mlpack
import pandas as pd
import numpy as np

In [2]:
# Load emotions dataset for NLP from Kaggle
df = pd.read_csv("data/train.txt", delimiter = ";", names = ["text", "label"])

In [3]:
# Split the labels
labels = df["label"]
dataset = df.drop("label", axis = 1)

In [4]:
labels.value_counts()

joy         5362
sadness     4666
anger       2159
fear        1937
love        1304
surprise     572
Name: label, dtype: int64

In [9]:
import spacy
from spacy import cli
cli.download("en_core_web_md")
nlp=spacy.load('en_core_web_md');

✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_md')


In [5]:
# Creating preprocess function to preprocess the data
import re

def preprocess(text):
 # Removing white spaces from our data
  text=re.sub(" +"," ",text);
 # Converting the text to lower case
  text=text.lower();
 # Word tokenization
  doc=nlp(text);
 # Removing stop words punctuations and doing Lemmatization
  filtered_words=[token.lemma_ for token in doc if not token.is_stop and not token.is_punct];
  
  return " ".join(filtered_words);

In [10]:
df['preprocessed']=df['text'].apply(preprocess);

In [11]:
df

Unnamed: 0,text,label,preprocessed
0,i didnt feel humiliated,sadness,not feel humiliate
1,i can go from feeling so hopeless to so damned...,sadness,feel hopeless damn hopeful care awake
2,im grabbing a minute to post i feel greedy wrong,anger,m grab minute post feel greedy wrong
3,i am ever feeling nostalgic about the fireplac...,love,feel nostalgic fireplace know property
4,i am feeling grouchy,anger,feel grouchy
...,...,...,...
15995,i just had a very brief time in the beanbag an...,sadness,brief time beanbag say anna feel like beat
15996,i am now turning and i feel pathetic that i am...,sadness,turn feel pathetic wait table sub teaching degree
15997,i feel strong and good overall,joy,feel strong good overall
15998,i feel like this was such a rude comment and i...,anger,feel like rude comment m glad t


In [12]:
# First we will use Tf-idf and multinomial naive bayes 
# Imports 
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

In [21]:
features=df['preprocessed'];
labels=df['label'];
labels=labels.map({'joy':0,'anger':1,'fear':2,'sadness':3, 'love':4, 'surprise':5});

In [22]:
# Train test splitting the the data.
X_train,X_test,Y_train,Y_test=train_test_split(features,labels,test_size=0.2,random_state=1000);

In [23]:
X_train

5887     go work feel agitated lazy transform state yel...
12564           think d get mark question make feel clever
3195                        try let anxiety feel unwelcome
13295         feel weepy today sure feel well tomorrow xxx
14214                                    feel dazed desert
                               ...                        
15611                                hate feel like stupid
3776                                feel utterly devastate
6215     report dear soul energy feel strange today won...
4695          love smell make feel invigorated fresh happy
9651     think fair life want feel sincere connection p...
Name: preprocessed, Length: 12800, dtype: object

In [24]:
Y_train

5887     1
12564    0
3195     3
13295    3
14214    5
        ..
15611    3
3776     3
6215     2
4695     0
9651     0
Name: label, Length: 12800, dtype: int64

In [36]:
clf=RandomizedSearchCV(RandomForestClassifier(),params);
vectorizer=TfidfVectorizer();
x_train=vectorizer.fit_transform(X_train);

In [37]:
x_train

<12800x10569 sparse matrix of type '<class 'numpy.float64'>'
	with 99244 stored elements in Compressed Sparse Row format>

In [43]:
x_train[0]

<1x10569 sparse matrix of type '<class 'numpy.float64'>'
	with 10 stored elements in Compressed Sparse Row format>

In [44]:
type(x_train)

scipy.sparse._csr.csr_matrix