<a href="https://colab.research.google.com/github/lazarjevtovic/Emotions-Detection-Using-DL/blob/main/emotions_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Emotions detection using neural networks
This notebook demonstrates a step-by-step implementation of a model that classifies text in 6 classes - emotions.

## 1. Data Loading
Dataset used in the project: https://www.kaggle.com/datasets/nelgiriyewithana/emotions

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("nelgiriyewithana/emotions")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/nelgiriyewithana/emotions?dataset_version_number=1...


100%|██████████| 15.7M/15.7M [00:00<00:00, 52.5MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/nelgiriyewithana/emotions/versions/1


In [None]:
import pandas as pd

df = pd.read_csv("/root/.cache/kagglehub/datasets/nelgiriyewithana/emotions/versions/1/text.csv", index_col = 0)
df

Unnamed: 0,text,label
0,i just feel really helpless and heavy hearted,4
1,ive enjoyed being able to slouch about relax a...,0
2,i gave up my internship with the dmrg and am f...,4
3,i dont know i feel so lost,0
4,i am a kindergarten teacher and i am thoroughl...,4
...,...,...
416804,i feel like telling these horny devils to find...,2
416805,i began to realize that when i was feeling agi...,3
416806,i feel very curious be why previous early dawn...,5
416807,i feel that becuase of the tyranical nature of...,3


In [None]:
# Label meanings
emotions = {0:"sadness", 1:"joy", 2:"love", 3:"anger", 4:"fear", 5:"surprise"}

## 2. Data Preprocessing

Splitting data into train and test datasets

In [None]:
from sklearn.model_selection import train_test_split
import numpy as np

TEST_SIZE = 0.2

train_text, test_text, train_labels, test_labels = train_test_split(df["text"].to_numpy(), df["label"].to_numpy(), test_size = TEST_SIZE, random_state = 42)

len(train_text), len(test_text), len(train_labels), len(test_labels)

(333447, 83362, 333447, 83362)

In [None]:
# Looking into dataset
train_text[:10], train_labels[:10]

(array(['ive blabbed on enough for tonight im tired and ive been feeling pretty crappy from this kentucky weather',
        'i woke up really early this morning and drove in and i just feel ecstatic about everything getting your photo taken people wanting you to wear their clothes i love all of it',
        'i feel i never gave myself a rest day after the megabrick because i was feeling stubborn and belligerent and my legs are waaaaaaay tired i keep pressing on with the scheduled workouts ignoring the numbers watch for the most part and trying to keep disappointment far off my radar',
        'i am feeling restless teary flat sad and strange today',
        'i feel like im doomed before ive even began',
        'i feel agitated i want to do stuff well that s totally fine',
        'i feel a lot of positive intention behind it',
        'i feel ashamed with such prolific exc',
        'i start to feel lonely again',
        'i want it to be understood well received i want it to feel com

Encoding label values

In [None]:
from sklearn.preprocessing import OneHotEncoder

one_hot = OneHotEncoder(sparse_output=False)
train_labels_one_hot = one_hot.fit_transform(train_labels.reshape(-1, 1))
test_labels_one_hot = one_hot.fit_transform(test_labels.reshape(-1, 1))

train_labels_one_hot

array([[1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0.],
       ...,
       [0., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0.]])

Preparing and prefetching datasets for faster training

In [None]:
import tensorflow as tf

train_dataset = tf.data.Dataset.from_tensor_slices((train_text, train_labels_one_hot))
test_dataset = tf.data.Dataset.from_tensor_slices((test_text, test_labels_one_hot))

In [None]:
train_dataset = train_dataset.batch(32).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(32).prefetch(tf.data.AUTOTUNE)

Average words per text sample

In [None]:
sum([len(x.split()) for x in train_text])/len(train_text)

19.215338569547786

Words number that covers 95% of the text data

In [None]:
np.percentile([len(x.split()) for x in train_text], 95)

41.0

## Model building and training

Making text vectorizer layer

In [None]:
MAX_VOCAB_LENGTH = 68000
MAX_SEQUENCE_LENGTH = 41

text_vectorizer = tf.keras.layers.TextVectorization(max_tokens = MAX_VOCAB_LENGTH,
                                                   output_sequence_length = MAX_SEQUENCE_LENGTH)

text_vectorizer.adapt(train_text)

In [None]:
words_in_vocab = text_vectorizer.get_vocabulary()
top_5_words = words_in_vocab[:5]
bottom_5_words = words_in_vocab[-5:]
print(f"Number of words in vocab: {len(words_in_vocab)}")
print(f"Top 5 most common words: {top_5_words}")
print(f"Bottom 5 least common words: {bottom_5_words}")

Number of words in vocab: 67951
Top 5 most common words: ['', '[UNK]', 'i', 'feel', 'and']
Bottom 5 least common words: ['aaaaah', 'aaaaaand', 'aaaaaaaall', 'aaaaaaaaaaaaaaaaggghhhh', 'aaaa']


In [None]:
import random

random_text = random.choice(train_text)
print(f"Original text:\n{random_text}\nVectorized text:")
text_vectorizer([random_text])

Original text:
i feel i am so un useful on sunday nights i feel i do more chasing around of my children than socializing with the teens
Vectorized text:


<tf.Tensor: shape=(1, 41), dtype=int64, numpy=
array([[   2,    3,    2,   24,   15, 1798,  576,   30, 1306, 1411,    2,
           3,    2,   39,   37, 3799,  133,   10,   11,  402,   93, 6933,
          25,    6, 4759,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0]])>

Making text embedding layer

In [None]:
text_embedding = tf.keras.layers.Embedding(input_dim=MAX_VOCAB_LENGTH,
                             output_dim=128)

In [None]:
random_text = random.choice(train_text)
print(f"Original text:\n{random_text}\nVectorized text:")
print(text_vectorizer([random_text]))
print("Text embedding:")
text_embedding(text_vectorizer(random_text))

Original text:
im terribly disappointed and yet i feel ludicrous saying so its a damn good excuse his father is having heart trouble may need repeat surgery
Vectorized text:
tf.Tensor(
[[  17 1261  405    4  218    2    3 2148  341   15   81    7  832  110
  1714  103  888   22  150  234 1419  213  104 2285 2137    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0]], shape=(1, 41), dtype=int64)
Text embedding:


<tf.Tensor: shape=(41, 128), dtype=float32, numpy=
array([[ 0.01029282,  0.02558963, -0.03156801, ...,  0.02796421,
         0.02906498, -0.00204965],
       [-0.00565909, -0.02650976,  0.00992889, ..., -0.03016945,
         0.00012017,  0.04670367],
       [-0.02311485,  0.01984605,  0.04970229, ...,  0.0373924 ,
        -0.03912692, -0.03616142],
       ...,
       [ 0.02211178,  0.00935607,  0.04463864, ..., -0.03909401,
         0.04629716,  0.04271359],
       [ 0.02211178,  0.00935607,  0.04463864, ..., -0.03909401,
         0.04629716,  0.04271359],
       [ 0.02211178,  0.00935607,  0.04463864, ..., -0.03909401,
         0.04629716,  0.04271359]], dtype=float32)>

### Experimentation with different model architectures

Dense model

In [None]:
tf.random.set_seed(42)

inputs = tf.keras.layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = text_embedding(x)
x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = tf.keras.layers.Dense(64, activation="relu")(x)
outputs = tf.keras.layers.Dense(len(emotions), activation="softmax")(x)

model_0 = tf.keras.Model(inputs, outputs)

model_0.compile(loss="categorical_crossentropy",
                optimizer = tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

model_0.summary()

checkpoint_filepath = '/ckpt/model_0_best.model.keras'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

model_0.fit(train_dataset,
            epochs = 5,
            validation_data = test_dataset,
            callbacks = [model_checkpoint_callback])



Epoch 1/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m108s[0m 10ms/step - accuracy: 0.7504 - loss: 0.6508 - val_accuracy: 0.8984 - val_loss: 0.2070
Epoch 2/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m141s[0m 10ms/step - accuracy: 0.9084 - loss: 0.1863 - val_accuracy: 0.8991 - val_loss: 0.2075
Epoch 3/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m100s[0m 10ms/step - accuracy: 0.9179 - loss: 0.1623 - val_accuracy: 0.8967 - val_loss: 0.2131
Epoch 4/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m103s[0m 10ms/step - accuracy: 0.9261 - loss: 0.1479 - val_accuracy: 0.8946 - val_loss: 0.2284
Epoch 5/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m103s[0m 10ms/step - accuracy: 0.9319 - loss: 0.1369 - val_accuracy: 0.8921 - val_loss: 0.2493


<keras.src.callbacks.history.History at 0x7db10c21c8d0>

LSTM model

In [None]:
tf.random.set_seed(42)

inputs = tf.keras.layers.Input(shape = (1,), dtype="string")
x = text_vectorizer(inputs)
x = text_embedding(x)
x = tf.keras.layers.LSTM(64)(x)
x = tf.keras.layers.Dense(64, activation="relu")(x)
outputs = tf.keras.layers.Dense(len(emotions), activation="softmax")(x)

model_1 = tf.keras.Model(inputs, outputs)

model_1.compile(loss="categorical_crossentropy",
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ["accuracy"])

checkpoint_filepath = '/ckpt/model_1_best.model.keras'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

model_1.fit(train_dataset,
            epochs = 5,
            validation_data = test_dataset,
            callbacks = [model_checkpoint_callback])


Epoch 1/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m129s[0m 12ms/step - accuracy: 0.8810 - loss: 0.2973 - val_accuracy: 0.9321 - val_loss: 0.1150
Epoch 2/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m135s[0m 13ms/step - accuracy: 0.9454 - loss: 0.0941 - val_accuracy: 0.9340 - val_loss: 0.1208
Epoch 3/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m131s[0m 12ms/step - accuracy: 0.9521 - loss: 0.0836 - val_accuracy: 0.9317 - val_loss: 0.1356
Epoch 4/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 12ms/step - accuracy: 0.9557 - loss: 0.0776 - val_accuracy: 0.9328 - val_loss: 0.1382
Epoch 5/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 13ms/step - accuracy: 0.9577 - loss: 0.0750 - val_accuracy: 0.9356 - val_loss: 0.1429


<keras.src.callbacks.history.History at 0x7db18bb20c90>

Bidirectional LSTM model

In [None]:
tf.random.set_seed(42)

inputs = tf.keras.layers.Input(shape = (1,), dtype="string")
x = text_vectorizer(inputs)
x = text_embedding(x)
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))(x)
outputs = tf.keras.layers.Dense(len(emotions), activation="softmax")(x)

model_2 = tf.keras.Model(inputs, outputs)

model_2.compile(loss="categorical_crossentropy",
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ["accuracy"])

checkpoint_filepath = '/ckpt/model_2_best.model.keras'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_filepath,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

model_2.fit(train_dataset,
            epochs = 5,
            validation_data = test_dataset,
            callbacks = [model_checkpoint_callback])

Epoch 1/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m160s[0m 15ms/step - accuracy: 0.9299 - loss: 0.1703 - val_accuracy: 0.9303 - val_loss: 0.1449
Epoch 2/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m197s[0m 15ms/step - accuracy: 0.9589 - loss: 0.0784 - val_accuracy: 0.9323 - val_loss: 0.1558
Epoch 3/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m157s[0m 15ms/step - accuracy: 0.9623 - loss: 0.0708 - val_accuracy: 0.9324 - val_loss: 0.1628
Epoch 4/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m146s[0m 14ms/step - accuracy: 0.9647 - loss: 0.0664 - val_accuracy: 0.9321 - val_loss: 0.1706
Epoch 5/5
[1m10421/10421[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m146s[0m 14ms/step - accuracy: 0.9671 - loss: 0.0628 - val_accuracy: 0.9305 - val_loss: 0.1835


<keras.src.callbacks.history.History at 0x7db0f4608c90>

## Viewing and analyzing predictions

In [None]:
best_model = tf.keras.models.load_model("/ckpt/model_1_best.model.keras")

In [None]:
def make_preds(model):
  pred_probs = model.predict(test_dataset)
  print(pred_probs.shape)
  probs = np.argmax(pred_probs, axis=1)
  return probs

In [None]:
preds = make_preds(best_model)
preds[:100]

[1m2606/2606[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 4ms/step
(83362, 6)


array([0, 0, 3, 0, 1, 2, 1, 1, 5, 0, 3, 0, 4, 1, 0, 2, 4, 0, 1, 5, 4, 4,
       2, 0, 1, 0, 1, 3, 0, 0, 1, 0, 1, 0, 4, 1, 0, 1, 0, 0, 2, 0, 1, 0,
       3, 3, 4, 1, 0, 4, 2, 4, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 3, 4, 2,
       0, 0, 2, 0, 0, 4, 3, 3, 3, 3, 0, 0, 5, 2, 3, 0, 2, 4, 1, 1, 1, 0,
       3, 1, 2, 4, 1, 1, 3, 0, 0, 0, 0, 0])

In [None]:
from sklearn.metrics import accuracy_score

accuracy_score(test_labels, preds)

0.93559415561047

In [None]:
import random

def view_predictions(test_text, test_labels, preds):
  ind = random.randint(0, len(test_text)-10)
  for i in range(ind, ind+10):
    print("Text:")
    print(test_text[i])
    print(f"Emotion: {emotions[test_labels[i]]}, prediction: {emotions[preds[i]]}")
    print("\n")

In [None]:
view_predictions(test_text, test_labels, preds)

Text:
i make these kinds of cakes i feel more confident and every time the cakes looks better and more professional
Emotion: joy, prediction: joy


Text:
i feel stupid and my sense of self is very low
Emotion: sadness, prediction: sadness


Text:
i feel as if working at banana republic is allowing me to get a bit more outgoing and meet a lot of new people
Emotion: joy, prediction: joy


Text:
ive been wondering if im getting anywhere being the impatient soul i am but woke in the night with a deeper understanding of something which has sobered me up a lot yet at the same time it feels hopeful
Emotion: joy, prediction: joy


Text:
i was feeling pretty pleased too until i realized these problems i didnt have anything whatsoever for mad eye moody
Emotion: joy, prediction: joy


Text:
i kind of feel fearful of starting
Emotion: fear, prediction: fear


Text:
i usually joke around to deal with my health but the way i am feeling lately i have been very depressed because it seems like i get on

## Using the model

In [None]:
def predict_on_sentence(model, sent):
  pred_probs = model.predict(tf.data.Dataset.from_tensor_slices([[sent]]))
  return emotions[np.argmax(pred_probs)]

In [None]:
predict_on_sentence(best_model, "i won a lottery!")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step


'joy'