# ü§ñ Building an Emotion Classifier with GRUs

### üìò Welcome to Your 2-Hour Practical Guide!

In this session, we'll apply our knowledge of Gated Recurrent Units (GRUs) to a real-world problem: **classifying emotions in tweets**. We will build a model that can read a tweet and predict whether the emotion is joy, sadness, anger, fear, love, or surprise.

This is a multi-class text classification task, and it's a perfect job for a GRU because understanding emotion requires understanding the sequence and context of words.

--- 

### üéØ Learning Objectives for Today:

By the end of this 2-hour session, you will be able to:
1.  **Load and prepare** a real-world text dataset using Pandas.
2.  **Preprocess text data** for a neural network using a Tokenizer.
3.  **Convert text labels** into a format the model can understand (one-hot encoding).
4.  **Build a GRU-based model** for multi-class classification using TensorFlow/Keras.
5.  **Train your model** on the emotions dataset.
6.  **Evaluate its performance** and use it to predict emotions on new sentences.

## Topic 1: Setup and Importing Libraries üõ†Ô∏è

First things first, let's import all the tools we'll need for our project. We'll be using:
- **TensorFlow & Keras:** For building and training our GRU model.
- **Pandas:** For loading and manipulating our data.
- **Numpy:** For numerical operations.
- **Scikit-learn:** For processing our labels.

In [17]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense, Dropout
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## Topic 2: Loading and Exploring the Dataset üìÇ

We will use a popular public dataset of tweets, each labeled with one of six emotions. We can load it directly from a URL into a pandas DataFrame. This makes our notebook easy to run anywhere!

The data is in a `.txt` file, but we can treat it like a CSV with a semicolon (`;`) separator.

In [2]:
from datasets import load_dataset
import pandas as pd  # Optional: for DataFrame inspection

# Load the full dataset (includes train, validation, test splits automatically)
dataset = load_dataset("dair-ai/emotion")

# Convert to DataFrames (matches your df_train/df_test structure)
# Note: Column is 'label' (integers 0-5), not 'emotion'‚Äîeasy to rename/map
df_train = pd.DataFrame(dataset['train'])
df_test = pd.DataFrame(dataset['test'])

# Optional: Rename 'label' to 'emotion' for consistency with your code
df_train = df_train.rename(columns={'label': 'emotion'})
df_test = df_test.rename(columns={'label': 'emotion'})

# Quick inspection (should match what you expected)
print(df_train.head())
print(df_train['emotion'].value_counts())
print(f"Train shape: {df_train.shape}, Test shape: {df_test.shape}")

                                                text  emotion
0                            i didnt feel humiliated        0
1  i can go from feeling so hopeless to so damned...        0
2   im grabbing a minute to post i feel greedy wrong        3
3  i am ever feeling nostalgic about the fireplac...        2
4                               i am feeling grouchy        3
emotion
1    5362
0    4666
3    2159
4    1937
2    1304
5     572
Name: count, dtype: int64
Train shape: (16000, 2), Test shape: (2000, 2)


Let's check the distribution of emotions in our training data to see what we're working with.

In [3]:
print("Emotion Distribution:")
print(df_train['emotion'].value_counts())

Emotion Distribution:
emotion
1    5362
0    4666
3    2159
4    1937
2    1304
5     572
Name: count, dtype: int64


In [4]:
# Emotion label mapping for the dair-ai/emotion dataset
id2label = {
    0: "sadness",
    1: "joy",
    2: "love",
    3: "anger",
    4: "fear",
    5: "surprise"
}

label2id = {v: k for k, v in id2label.items()}

# Display the mapping
print("Emotion Codes:")
for code, emotion in id2label.items():
    print(f"  {code} ‚Üí {emotion}")

print("\nReverse mapping (label2id):", label2id)

Emotion Codes:
  0 ‚Üí sadness
  1 ‚Üí joy
  2 ‚Üí love
  3 ‚Üí anger
  4 ‚Üí fear
  5 ‚Üí surprise

Reverse mapping (label2id): {'sadness': 0, 'joy': 1, 'love': 2, 'anger': 3, 'fear': 4, 'surprise': 5}


### üéØ Practice Task

Write one line of code to get the first tweet in the training data that has the emotion 'fear'. 

**Hint:** You can filter a DataFrame like this: `df_train[df_train['emotion'] == 'fear']`.

In [5]:
# Your code here to find and print the first 'fear' tweet

## Topic 3: Text Preprocessing & Tokenization üßπ

A neural network can't understand words directly. We need to convert our text into numbers. This process involves two main steps:

1.  **Tokenization:** We'll create a vocabulary of all the unique words in our training data. Then, we'll assign a unique integer to each word. For example, `{'the': 1, 'love': 2, 'i': 3, ...}`.
2.  **Sequencing:** We'll convert each tweet into a sequence of these integers.

In [6]:
# Set parameters for tokenization
vocab_size = 10000  # We will only consider the top 10,000 most frequent words
oov_token = "<OOV>" # A special token for words not in our vocabulary

# Initialize the Keras Tokenizer
tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_token)

# Build the vocabulary based on the training text
tokenizer.fit_on_texts(df_train['text'])

# --- Convert text to sequences of integers ---
X_train_sequences = tokenizer.texts_to_sequences(df_train['text'])
X_test_sequences = tokenizer.texts_to_sequences(df_test['text'])

print("Original Tweet:")
print(df_train['text'][0])
print("\nTweet after Tokenization (converted to numbers):")
print(X_train_sequences[0])

Original Tweet:
i didnt feel humiliated

Tweet after Tokenization (converted to numbers):
[2, 139, 3, 679]


### üéØ Practice Task

The tokenizer has a `word_index` attribute that holds the vocabulary map. Print the first 10 items of `tokenizer.word_index` to see what it looks like.

**Hint:** You can't slice a dictionary directly, but you can loop through its items.

In [7]:
# Your code here to see the first 10 words in the vocabulary
word_index = tokenizer.word_index
print(list(word_index.items())[:10])

[('<OOV>', 1), ('i', 2), ('feel', 3), ('and', 4), ('to', 5), ('the', 6), ('a', 7), ('feeling', 8), ('that', 9), ('of', 10)]


## Topic 4: Padding Sequences & Preparing Labels üìè

Our sequences of numbers have different lengths because tweets have different lengths. GRU models, however, require inputs to have a uniform length.

### Padding
We will use **padding** to make all sequences the same length. We'll add zeros to the end of shorter sequences until they match the length of the longest one (or a `maxlen` we define).

### Preparing Labels
We also need to convert our emotion labels (like 'joy', 'sadness') into numbers. We'll do this in two steps:
1.  **Label Encoding:** Convert each emotion string to a unique integer ('joy' -> 0, 'sadness' -> 1, etc.).
2.  **One-Hot Encoding:** Convert each integer into a binary vector. This is the standard format for multi-class classification.
    - `0` (joy) becomes `[1, 0, 0, 0, 0, 0]`
    - `1` (sadness) becomes `[0, 1, 0, 0, 0, 0]`

In [8]:
# --- Padding the Sequences ---
max_length = 100 # Maximum length of a sequence
padding_type = 'post' # Add padding at the end
trunc_type = 'post' # Truncate from the end if longer than max_length

X_train_padded = pad_sequences(X_train_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)
X_test_padded = pad_sequences(X_test_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)

print("--- Padded Sequences ---")
print(f"Shape of training data: {X_train_padded.shape}")
print(f"Example padded sequence:\n{X_train_padded[0]}")

# --- Preparing the Labels ---
le = LabelEncoder()
y_train_encoded = le.fit_transform(df_train['emotion'])
y_test_encoded = le.transform(df_test['emotion'])

# One-Hot Encode
y_train_onehot = tf.keras.utils.to_categorical(y_train_encoded, num_classes=6)
y_test_onehot = tf.keras.utils.to_categorical(y_test_encoded, num_classes=6)

print("\n--- Prepared Labels ---")
print(f"Original label: {df_train['emotion'][0]}")
print(f"Encoded label: {y_train_encoded[0]}")
print(f"One-hot encoded label:\n{y_train_onehot[0]}")

# For later use, let's store the mapping from class index to emotion name
emotion_labels = le.classes_
print(f"\nEmotion classes: {emotion_labels}")

--- Padded Sequences ---
Shape of training data: (16000, 100)
Example padded sequence:
[  2 139   3 679   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0]

--- Prepared Labels ---
Original label: 0
Encoded label: 0
One-hot encoded label:
[1. 0. 0. 0. 0. 0.]

Emotion classes: [0 1 2 3 4 5]


## Topic 5: Building the GRU Model üß†

Time to build our GRU model! It will have three main layers:

1.  **Embedding Layer:** This layer learns a dense vector representation for each word in our vocabulary. These vectors capture semantic meaning, so words like 'happy' and 'joyful' will have similar vectors.
2.  **GRU Layer:** This is the core of our model. It will process the sequence of word vectors and learn to identify patterns related to different emotions.
3.  **Dense Output Layer:** This layer takes the output from the GRU and makes a final prediction. Because we have 6 emotions, it will have 6 neurons and use the `softmax` activation function to output a probability for each emotion.

In [13]:
# Model parameters
embedding_dim = 64
gru_units = 128

model = Sequential([
    # 1. Embedding Layer
    Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length),
    
    # 2. GRU Layer
    GRU(units=gru_units, return_sequences=True),
    GRU(units=256, return_sequences=False),
    
    # We can add a Dropout layer to prevent overfitting
    Dropout(0.2),
    
    # 3. Dense Output Layer
    Dense(6, activation='softmax') # 6 units for 6 emotions, softmax for multi-class probability
])

# Compile the model
model.compile(
    loss='categorical_crossentropy', # Use this loss for multi-class, one-hot encoded labels
    optimizer='adam',
    metrics=['accuracy']
)

model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_4 (Embedding)     (None, 100, 64)           640000    
                                                                 
 gru_7 (GRU)                 (None, 100, 128)          74496     
                                                                 
 gru_8 (GRU)                 (None, 256)               296448    
                                                                 
 dropout_4 (Dropout)         (None, 256)               0         
                                                                 
 dense_4 (Dense)             (None, 6)                 1542      
                                                                 
Total params: 1,012,486
Trainable params: 1,012,486
Non-trainable params: 0
_________________________________________________________________


### üéØ Practice Task

Explain in one sentence why the final `Dense` layer has `6` units and uses the `'softmax'` activation function.

## Topic 6: Training the Model üöÇ

Now we feed our prepared data into the model and let it learn! We will train it for a few epochs. An **epoch** is one complete pass through the entire training dataset.

We also provide our test data as `validation_data`. This allows us to monitor how well our model is performing on unseen data after each epoch, which helps us spot overfitting.

In [14]:
num_epochs = 10

print("üöÄ Starting model training...")

history = model.fit(
    X_train_padded, 
    y_train_onehot, 
    epochs=num_epochs, 
    validation_data=(X_test_padded, y_test_onehot),
    verbose=2 # Show one line per epoch
)

print("\n‚úÖ Training complete!")

üöÄ Starting model training...
Epoch 1/10
500/500 - 100s - loss: 1.5857 - accuracy: 0.3271 - val_loss: 1.5618 - val_accuracy: 0.3475 - 100s/epoch - 200ms/step
Epoch 2/10
500/500 - 98s - loss: 1.5808 - accuracy: 0.3311 - val_loss: 1.5646 - val_accuracy: 0.3475 - 98s/epoch - 197ms/step
Epoch 3/10
500/500 - 109s - loss: 1.5795 - accuracy: 0.3298 - val_loss: 1.5635 - val_accuracy: 0.2905 - 109s/epoch - 219ms/step
Epoch 4/10
500/500 - 107s - loss: 1.5786 - accuracy: 0.3325 - val_loss: 1.5616 - val_accuracy: 0.3475 - 107s/epoch - 213ms/step
Epoch 5/10
500/500 - 117s - loss: 1.5783 - accuracy: 0.3322 - val_loss: 1.5603 - val_accuracy: 0.3475 - 117s/epoch - 234ms/step
Epoch 6/10
500/500 - 106s - loss: 1.5784 - accuracy: 0.3327 - val_loss: 1.5596 - val_accuracy: 0.3475 - 106s/epoch - 213ms/step
Epoch 7/10
500/500 - 105s - loss: 1.5772 - accuracy: 0.3351 - val_loss: 1.5596 - val_accuracy: 0.3475 - 105s/epoch - 211ms/step
Epoch 8/10
500/500 - 126s - loss: 1.5776 - accuracy: 0.3344 - val_loss: 1.

### üéØ Practice Task

Training a model can take time. If you want to train for longer, what single number would you change in the code cell above? What do you think would happen to the training and validation accuracy if you set it to `20`?

## Topic 7: Evaluating the Model & Making Predictions üìä

After training, let's see how well our model performs on the test set, which it has never seen before.

In [13]:
# Evaluate the model on the test data
loss, accuracy = model.evaluate(X_test_padded, y_test_onehot)

print(f"\nTest Accuracy: {accuracy * 100:.2f}%")


Test Accuracy: 34.75%


Now for the most exciting part: let's write our own sentences and see what emotion our model predicts!

In [None]:
# Function to predict emotion of a custom sentence
def predict_emotion(sentence):
    # 1. Convert to sequence
    sequence = tokenizer.texts_to_sequences([sentence])
    # 2. Pad the sequence
    padded = pad_sequences(sequence, maxlen=max_length, padding=padding_type, truncating=trunc_type)
    # 3. Make a prediction
    prediction = model.predict(padded)
    # 4. Get the emotion label with the highest probability
    predicted_class_index = np.argmax(prediction)
    return emotion_labels[predicted_class_index]

# --- Let's test it! ---
my_sentence_1 = "I am so happy and excited about the trip tomorrow"
my_sentence_2 = "I feel so alone and lost right now"
my_sentence_3 = "that is an awful thing to say"

print(f"Sentence: '{my_sentence_1}' -> Predicted Emotion: {predict_emotion(my_sentence_1)}")
print(f"Sentence: '{my_sentence_2}' -> Predicted Emotion: {predict_emotion(my_sentence_2)}")
print(f"Sentence: '{my_sentence_3}' -> Predicted Emotion: {predict_emotion(my_sentence_3)}")

In [15]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dropout, Dense

# Model parameters
embedding_dim = 64
lstm_units_1 = 128
lstm_units_2 = 256

model = Sequential([
    # 1. Embedding Layer
    Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length),
    
    # 2. First LSTM Layer (returns sequences for the next LSTM)
    LSTM(units=lstm_units_1, return_sequences=True),
    
    # 3. Second LSTM Layer (last LSTM, no need to return sequences)
    LSTM(units=lstm_units_2, return_sequences=False),
    
    # Optional Dropout to prevent overfitting
    Dropout(0.2),
    
    # 4. Dense Output Layer
    Dense(6, activation='softmax')  # 6 classes for emotions
])

# Compile the model
model.compile(
    loss='categorical_crossentropy',  # multi-class classification
    optimizer='adam',
    metrics=['accuracy']
)

model.summary()


Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_5 (Embedding)     (None, 100, 64)           640000    
                                                                 
 lstm (LSTM)                 (None, 100, 128)          98816     
                                                                 
 lstm_1 (LSTM)               (None, 256)               394240    
                                                                 
 dropout_5 (Dropout)         (None, 256)               0         
                                                                 
 dense_5 (Dense)             (None, 6)                 1542      
                                                                 
Total params: 1,134,598
Trainable params: 1,134,598
Non-trainable params: 0
_________________________________________________________________


In [None]:
num_epochs = 10

print("üöÄ Starting model training...")

history = model.fit(
    X_train_padded, 
    y_train_onehot, 
    epochs=num_epochs, 
    validation_data=(X_test_padded, y_test_onehot),
    verbose=2 # Show one line per epoch
)

print("\n‚úÖ Training complete!")

üöÄ Starting model training...
Epoch 1/10
500/500 - 8040s - loss: 1.5843 - accuracy: 0.3295 - val_loss: 1.5682 - val_accuracy: 0.3475 - 8040s/epoch - 16s/step
Epoch 2/10
500/500 - 103s - loss: 1.5799 - accuracy: 0.3307 - val_loss: 1.5610 - val_accuracy: 0.3475 - 103s/epoch - 206ms/step
Epoch 3/10
500/500 - 106s - loss: 1.5787 - accuracy: 0.3332 - val_loss: 1.5628 - val_accuracy: 0.3475 - 106s/epoch - 213ms/step
Epoch 4/10
500/500 - 230s - loss: 1.5781 - accuracy: 0.3316 - val_loss: 1.5594 - val_accuracy: 0.3475 - 230s/epoch - 461ms/step
Epoch 5/10
500/500 - 109s - loss: 1.5780 - accuracy: 0.3314 - val_loss: 1.5585 - val_accuracy: 0.3475 - 109s/epoch - 219ms/step
Epoch 6/10
500/500 - 118s - loss: 1.5775 - accuracy: 0.3319 - val_loss: 1.5690 - val_accuracy: 0.2905 - 118s/epoch - 236ms/step
Epoch 7/10
500/500 - 123s - loss: 1.5779 - accuracy: 0.3300 - val_loss: 1.5604 - val_accuracy: 0.3475 - 123s/epoch - 246ms/step
Epoch 8/10
500/500 - 126s - loss: 1.5776 - accuracy: 0.3328 - val_loss: 

## üéì Final Revision Assignment

Great job today! Here are a few tasks to help you solidify your understanding of building GRU models for text classification.

---

**1. Conceptual Question:** Why is a GRU (or any RNN) a better choice for this emotion classification task than a simple feed-forward neural network that doesn't consider word order?

**2. Short Answer:** What is the purpose of the `Embedding` layer? What does it do to the integer-encoded sequences?

**3. Coding Task 1: Hyperparameter Tuning**

Go back to the model-building cell (Topic 5) and change some hyperparameters. Try one of the following:
- Change the `embedding_dim` to `128`.
- Change the `gru_units` to `64`.

Re-run the training and evaluation cells. Does the test accuracy improve or get worse? Document your findings.

**4. Coding Task 2: Predict Your Own Emotions**

In the final prediction cell (Topic 7), add three of your own sentences to test the model. Try to make them tricky! Do the model's predictions match the emotions you were trying to convey?

**5. Coding Task 3: Build a Deeper Model**

Try building a stacked GRU model with two GRU layers. You will need to add `return_sequences=True` to the first GRU layer so it passes its full output sequence to the next layer. 

```python
# Example of a stacked GRU architecture
model = Sequential([
    Embedding(...),
    GRU(units=128, return_sequences=True), # First layer
    Dropout(0.2),
    GRU(units=64), # Second layer
    Dropout(0.5),
    Dense(6, activation='softmax')
])
```
Does this deeper model perform better?

## üéâ You've built your first end-to-end NLP model! Congratulations!