## Emotion Classifier

A neural network-based classifier that identifies emotion in text limited to six basic emotions: anger, fear, joy, love, sadness, and surprise. 

Dataset: 
https://github.com/dair-ai/emotion_dataset 

Data has been largely preprocessed already, using technique from this paper: https://www.aclweb.org/anthology/D18-1404/

Data dictionary:

- text: string 
- emotions: class label

In [205]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
#import numpy as np 

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow.data import Dataset
#from tensorflow.keras import utils, losses, optimizers, Input
from tensorflow.keras import losses
#from tensorflow.keras.layers import Dense, Dropout, GlobalMaxPooling1D, Conv1D, Embedding, Flatten, TextVectorization, LSTM, Bidirectional, Activation
from tensorflow.keras.layers import Dense, TextVectorization, Activation
from tensorflow.keras.models import Sequential

### Load and read the data

In [206]:
!wget https://www.dropbox.com/s/ikkqxfdbdec3fuj/test.txt
!wget https://www.dropbox.com/s/1pzkadrvffbqw6o/train.txt
!wget https://www.dropbox.com/s/2mzialpsgf9k5l3/val.txt

--2022-01-03 02:39:10--  https://www.dropbox.com/s/ikkqxfdbdec3fuj/test.txt
Resolving www.dropbox.com (www.dropbox.com)... 162.125.2.18, 2620:100:6017:18::a27d:212
Connecting to www.dropbox.com (www.dropbox.com)|162.125.2.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/ikkqxfdbdec3fuj/test.txt [following]
--2022-01-03 02:39:10--  https://www.dropbox.com/s/raw/ikkqxfdbdec3fuj/test.txt
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc78ffa455ac780cfe6e68249a20.dl.dropboxusercontent.com/cd/0/inline/BdAl33fj72_-CqnFVUxlmeT8ZZTMwqj-BY91oFZwOtIF0GBH2L2bJ80nVyhaQp7sAYPz9u5_r3ARg08UJoAuqpZM4E7l7LONnYkFS4Gg3pLDn07A_Z1LSbgwgxFP1LbXTyUPWjP0ln-auLe5VEDfpWRV/file# [following]
--2022-01-03 02:39:11--  https://uc78ffa455ac780cfe6e68249a20.dl.dropboxusercontent.com/cd/0/inline/BdAl33fj72_-CqnFVUxlmeT8ZZTMwqj-BY91oFZwOtIF0GBH2L2bJ80nVyhaQp7sAYPz9u5_r3ARg08UJoAuqpZM4E7l7LONnYkFS4

In [207]:
!mkdir emotion_data
!mv *.txt emotion_data

mkdir: cannot create directory ‘emotion_data’: File exists


In [208]:
train_path = "emotion_data/train.txt"
test_path = "emotion_data/test.txt"
val_path = "emotion_data/val.txt"

In [209]:
data = pd.read_csv(train_path, sep=";", header=None, names=['text', 'emotion'],
                               engine="python")
data.emotion.unique()

array(['sadness', 'anger', 'love', 'surprise', 'fear', 'joy'],
      dtype=object)

In [210]:
data.head()

Unnamed: 0,text,emotion
0,i didnt feel humiliated,sadness
1,i can go from feeling so hopeless to so damned...,sadness
2,im grabbing a minute to post i feel greedy wrong,anger
3,i am ever feeling nostalgic about the fireplac...,love
4,i am feeling grouchy,anger


In [211]:
data.count()

text       16000
emotion    16000
dtype: int64

In [214]:
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
 
# Encode labels in column 'emotion'.
data['emotion']= label_encoder.fit_transform(data['emotion'])
 
data['emotion'].unique()

array([4, 0, 3, 5, 1, 2])

In [215]:
text = data.text
labels = data.emotion
data.head()

Unnamed: 0,text,emotion
0,i didnt feel humiliated,4
1,i can go from feeling so hopeless to so damned...,4
2,im grabbing a minute to post i feel greedy wrong,0
3,i am ever feeling nostalgic about the fireplac...,3
4,i am feeling grouchy,0


### Data Split

In [216]:
SEED = 100

In [217]:
X = data['text']
labels = data['emotion']

# create training and validation sets with 80-20 split
X_train, X_validation, y_train, y_validation = train_test_split(X, labels, test_size=0.2, random_state = SEED)

# split the validation sets to get a holdout dataset (for testing) 50-50 split
X_validation, X_test, y_validation, y_test = train_test_split(X_validation, y_validation, test_size=0.5, random_state = SEED)

print(X_train.shape)
print(X_validation.shape)
print(y_train.shape)
print(y_validation.shape)
print(X_test.shape)
print(y_test.shape)

(12800,)
(1600,)
(12800,)
(1600,)
(1600,)
(1600,)


### Prepare data for training

In [218]:
"""
If you want to apply tf.data transformations to a DataFrame of a uniform dtype, the Dataset.from_tensor_slices method will create a dataset 
that iterates over the rows of the DataFrame. 
Each row is initially a vector of values. 
To train a model, you need (inputs, labels) pairs.
tf.data.Dataset.from_tensor_slices
"""

AUTOTUNE = tf.data.AUTOTUNE
BATCH_SIZE = 32
BUFFER_SIZE = 2000

# train dataset
train_numeric_ds = Dataset.from_tensor_slices((X_train, y_train))

# in tensorflow it is expected that you pass batches. tf.keras models are optimized to make predictions on a batch, or collection, of examples at once. 
# in this case, batches of (text, emotion) pairs
# also shuffle the data for training 
# prefetch overlaps data preprocessing and model execution while training
train_numeric_ds = train_numeric_ds.batch(BATCH_SIZE).shuffle(BUFFER_SIZE).prefetch(AUTOTUNE)

# val dataset
val_numeric_ds = Dataset.from_tensor_slices((X_validation, y_validation))
val_numeric_ds = val_numeric_ds.batch(BATCH_SIZE).shuffle(BUFFER_SIZE).prefetch(AUTOTUNE)

# test dataset 
test_numeric_ds = Dataset.from_tensor_slices((X_test, y_test))
test_numeric_ds = test_numeric_ds.batch(BATCH_SIZE).shuffle(BUFFER_SIZE).prefetch(AUTOTUNE)

print(train_numeric_ds.element_spec)

for text, emotion in train_numeric_ds.take(1):
    print("Sentence: ", text.numpy())
    print("Label:", emotion.numpy())

(TensorSpec(shape=(None,), dtype=tf.string, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))
Sentence:  [b'i feel like if you get something really cool you could easily turn it into a finished piece but that s kind of up to what you get out of the two hours'
 b'i know when i have had a crappy day and didn t feel productive i feel lousy and sleepy in the evening'
 b'i am not holding in my anger but i am holding it back so that i can still choose with a clearer mind and can feel it without executing someone for something petty'
 b'i feel like i liked it but at the same time i feel let down'
 b'i feel jealous with them why they can'
 b'i feel thats a valuable piece of consumer knowledge and one item of many ive added to my good to know stores'
 b'i didn t think that it would come that fast or would come at all but i suppose it is because i feel cranky today'
 b'i feel so honored to know all of you'
 b'i feel tortured every moment and theres nowhere i can go to get away fr

### Vectorize

In [219]:
VOCAB_SIZE = 1000

binary_vectorize_layer = TextVectorization(
    max_tokens=VOCAB_SIZE,
    output_mode='binary')

binary_vectorize_layer.adapt(train_numeric_ds.map(lambda text, labels: text))

In [220]:
def binary_vectorize_text(text, label):
  text = tf.expand_dims(text, -1)
  return binary_vectorize_layer(text), label

In [221]:
# apply the TextVectorization layers you created earlier to the training, validation, and test sets:

binary_train_ds = train_numeric_ds.map(binary_vectorize_text)
binary_val_ds = val_numeric_ds.map(binary_vectorize_text)
binary_test_ds = test_numeric_ds.map(binary_vectorize_text)

### Modelling

In [222]:
binary_model = Sequential([Dense(6)]) # param passed is the number of labels + 1

binary_model.compile(
    loss=losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer='adam',
    metrics=['accuracy'])

history = binary_model.fit(
    binary_train_ds, validation_data=binary_val_ds, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [223]:
print("Linear model on binary vectorized data:")
print(binary_model.summary())

Linear model on binary vectorized data:
Model: "sequential_18"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_25 (Dense)            (None, 6)                 6006      
                                                                 
Total params: 6,006
Trainable params: 6,006
Non-trainable params: 0
_________________________________________________________________
None


In [224]:
binary_loss, binary_accuracy = binary_model.evaluate(binary_test_ds)

print("Binary model accuracy: {:2.2%}".format(binary_accuracy))

Binary model accuracy: 78.75%


### Model Export

In [225]:
"""
You applied tf.keras.layers.TextVectorization to the dataset before feeding text to the model. 

To make the model capable of processing raw strings (for example, to simplify deploying it), you include the TextVectorization layer inside the model.
Create a new model using the weights you have just trained:
"""

export_model = Sequential(
    [binary_vectorize_layer, binary_model,
     Activation('sigmoid')])

export_model.compile(
    loss=losses.SparseCategoricalCrossentropy(from_logits=False),
    optimizer='adam',
    metrics=['accuracy'])

# Test it with `test_numeric_ds`, which yields raw strings
loss, accuracy = export_model.evaluate(test_numeric_ds)
print("Accuracy: {:2.2%}".format(binary_accuracy))

Accuracy: 78.75%


In [226]:
"""
A function to find the label with the maximum score.
"""
class_values = tf.constant([0, 1, 2, 3, 4, 5])

def get_string_labels(predicted_scores_batch):
  predicted_int_labels = tf.argmax(predicted_scores_batch, axis=1)
  predicted_labels = tf.gather(class_values, predicted_int_labels)
  return predicted_labels

### Run inference on new data

In [227]:
EMOTIONS = {
    0: 'anger',
    1: 'fear',
    2: 'joy',
    3: 'love',
    4: 'sadness',
    5: 'surprise' 
}

In [228]:
"""
Now, the model can take raw strings as input and predict a score for each label using Model.predict. 
"""

inputs = [
    "i can't escape the tears after the loss of my pet", 
    "i am ever feeling nostalgic about the house"
]
predicted_scores = export_model.predict(inputs)
predicted_labels = get_string_labels(predicted_scores)
for input, label in zip(inputs, predicted_labels):
  print("User text: ", input)
  print("Predicted label: ", EMOTIONS[label.numpy()])

User text:  i can't escape the tears after the loss of my pet
Predicted label:  sadness
User text:  i am ever feeling nostalgic about the house
Predicted label:  love
