<a href="https://colab.research.google.com/github/samyamaryal/Emotion-Classifier/blob/main/emotion_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import pandas as pd
import numpy as np
import tensorflow
import sklearn

In [3]:
import nltk
# The NLTK data package includes a pre-trained Punkt tokenizer for English.
# punkt had to be manually downloaded using the command below
# same with stopwords
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [4]:
df = pd.read_csv('ISEAR.csv')

In [5]:
df.head()

Unnamed: 0,0,joy,On days when I feel close to my partner and other friends. \nWhen I feel at peace with myself and also experience a close \ncontact with people whom I regard greatly.
0,1,fear,Every time I imagine that someone I love or I ...
1,2,anger,When I had been obviously unjustly treated and...
2,3,sadness,When I think about the short time that we live...
3,4,disgust,At a gathering I found myself involuntarily si...
4,5,shame,When I realized that I was directing the feeli...


In [6]:
#Renaming the columns
df.columns = ['No', 'emotion', 'word']
df.head()

Unnamed: 0,No,emotion,word
0,1,fear,Every time I imagine that someone I love or I ...
1,2,anger,When I had been obviously unjustly treated and...
2,3,sadness,When I think about the short time that we live...
3,4,disgust,At a gathering I found myself involuntarily si...
4,5,shame,When I realized that I was directing the feeli...


In [7]:
emotion_labels = df['emotion']
sentences = df['word']

We need to create an embedding vector for all the words. But before that, let us preprocess the text.

DATAFRAME PREPROCESSING DONE

CORPUS PREPROCESSING

In [8]:
#Lowercase

sentences = sentences.apply(lambda x: x.lower())
sentences

0       every time i imagine that someone i love or i ...
1       when i had been obviously unjustly treated and...
2       when i think about the short time that we live...
3       at a gathering i found myself involuntarily si...
4       when i realized that i was directing the feeli...
                              ...                        
7440    last week i had planned to play tennis and had...
7441    when i was ill and had to stay at the hospital...
7442    a few days back i was waiting for the bus at t...
7443    a few days back i had a tutorial class and the...
7444    once i quarrelled with my sister and after thi...
Name: word, Length: 7445, dtype: object

In [9]:
#Punctuation removal

import string
print(string.punctuation)
def removepunctuation(sentence):
  #iterate over every single character to see if it is a punctuation or not, and then concatenate them using "join"
   punctuationfree="".join([i for i in sentence if i not in string.punctuation])
   return punctuationfree
sentences = sentences.apply(removepunctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [10]:
#Tokenization

sentences = sentences.apply(word_tokenize)

In [None]:
print(sentences)

In [12]:
stop_words = stopwords.words('english')
len(stop_words)

179

In [56]:
# there are a lot of stopwords in this dataset. let us remove that using stopwords from nltk


prepro = []

def stopwordremoval(words):
  for i in words:
    post_removal = [word for word in i if word not in stop_words]
    lists = [" ".join(post_removal)]
    prepro.append(lists)
  return prepro

print(sentences.shape)
print(stopwordremoval(sentences))

(7445,)


In [None]:
# Checking out an example from the sentences, the dataset with set words still there
sentences[1]

In [108]:
prepro = []
prepro = stopwordremoval(sentences)
print(prepro)



CORPUS PREPROCESSED, NOW WE VECTORIZE

In [110]:
len(prepro)

7445

In [111]:
preprocessed_sentences = pd.DataFrame(prepro)

In [112]:
preprocessed_sentences

Unnamed: 0,0
0,every time imagine someone love could contact ...
1,obviously unjustly treated possibility elucida...
2,think short time live relate periods life thin...
3,gathering found involuntarily sitting next two...
4,realized directing feelings discontent partner...
...,...
7440,last week planned play tennis booked tennis co...
7441,ill stay hospital period time
7442,days back waiting bus bus stop getting bus pre...
7443,days back tutorial class teacher randomly assi...


In [113]:
# Now that we have both the preprocessed sentences and emotion labels, we now start building the neural network to train this.

In [114]:
from tensorflow.keras import layers

In [115]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [116]:
preprocessed_sentences[0]

0       every time imagine someone love could contact ...
1       obviously unjustly treated possibility elucida...
2       think short time live relate periods life thin...
3       gathering found involuntarily sitting next two...
4       realized directing feelings discontent partner...
                              ...                        
7440    last week planned play tennis booked tennis co...
7441                        ill stay hospital period time
7442    days back waiting bus bus stop getting bus pre...
7443    days back tutorial class teacher randomly assi...
7444     quarrelled sister deliberately messed belongings
Name: 0, Length: 7445, dtype: object

In [117]:
# TRIAL CODE BLOCK TO VERIFY THE WORKING OF TFIDF VECTORIZER

from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ['This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names_out()

trial = X.toarray()
trialdf = pd.DataFrame(trial)

In [118]:
vectorizer = TfidfVectorizer(lowercase=False, max_features = 5000)
vectorized_sentences = vectorizer.fit_transform(preprocessed_sentences[0])

In [133]:
preprocessed_sentences[0]

0       every time imagine someone love could contact ...
1       obviously unjustly treated possibility elucida...
2       think short time live relate periods life thin...
3       gathering found involuntarily sitting next two...
4       realized directing feelings discontent partner...
                              ...                        
7440    last week planned play tennis booked tennis co...
7441                        ill stay hospital period time
7442    days back waiting bus bus stop getting bus pre...
7443    days back tutorial class teacher randomly assi...
7444     quarrelled sister deliberately messed belongings
Name: 0, Length: 7445, dtype: object

In [134]:
type(vectorized_sentences)

scipy.sparse._csr.csr_matrix

In [135]:
print(vectorized_sentences)

  (0, 1052)	0.2706243205455566
  (0, 1420)	0.273557360792798
  (0, 2017)	0.33745825310514527
  (0, 3863)	0.34901780453196474
  (0, 895)	0.35352527284599955
  (0, 941)	0.22230904359096507
  (0, 2398)	0.2639451423445508
  (0, 4041)	0.22741500419359323
  (0, 2022)	0.4302771385582494
  (0, 4419)	0.1875362193020916
  (0, 1426)	0.31645386045587054
  (1, 3048)	0.5197396041744989
  (1, 4542)	0.44379473047473633
  (1, 4729)	0.48891570980771504
  (1, 2729)	0.5421055337834225
  (2, 4761)	0.2769245233190164
  (2, 2330)	0.22996630864267117
  (2, 3582)	0.38037425004914266
  (2, 2361)	0.2573063474455763
  (2, 3921)	0.5565041414757597
  (2, 4390)	0.5022654140591568
  (2, 4419)	0.31236160631210647
  (3, 2405)	0.30271549343046456
  (3, 889)	0.3120807851194871
  (3, 2778)	0.3583415081946123
  :	:
  (7443, 753)	0.12530899893910308
  (7443, 3127)	0.14829633255952007
  (7443, 2435)	0.14201567847612775
  (7443, 1187)	0.16322962786996137
  (7443, 602)	0.2122036538120647
  (7443, 4608)	0.19718673075403317
  (7

In [136]:
x_input = []
x_input = vectorized_sentences.toarray()

LABEL RESHAPING

In [122]:
labels_reshaped = emotion_labels.values.reshape(-1, 1)

In [123]:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse_output=True)
emotion_labels_new = ohe.fit_transform(labels_reshaped).toarray()

In [124]:
ohe.categories_

[array(['anger', 'disgust', 'fear', 'guilt', 'joy', 'sadness', 'shame'],
       dtype=object)]

In [125]:
labels = ohe.categories_
label_columns = np.array(labels).ravel()

In [126]:
len(x_input)

7445

In [137]:
finaldf = pd.DataFrame(x_input)

In [138]:
finaldf

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4990,4991,4992,4993,4994,4995,4996,4997,4998,4999
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7440,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7441,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7442,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7443,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [128]:
EPOCHS = 25
VALSPLIT = 0.2

In [129]:
emotion_labels_new.shape

(7445, 7)

**NEURAL NETWORK PORTION**


In [145]:
output_dim = 7


model = tensorflow.keras.Sequential([
  layers.InputLayer(input_shape = (5000, )),
  layers.Dense(125),
  layers.Dense(25),
  layers.Dense(output_dim)])
model.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = 'accuracy')
hist = model.fit(x = finaldf, 
                 y = emotion_labels_new,
                 batch_size = 32,
                 epochs = EPOCHS,
                 validation_split = VALSPLIT)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
