# Emotion Classification by Karan Deepak Kapadia

This notebook implements an emotion classification system using a deep learning model built **entirely from scratch**.  
The dataset (`Emotion-Dataset.csv`) consists of **text samples labeled with emotions**.

### **Key Features**
- Text preprocessing (cleaning, tokenization, encoding)  
- Custom deep learning model (**LSTM-based classifier**)  
- Training with optimized hyperparameters  
- Performance evaluation and visualization  

Each **code cell** is preceded by a **text explanation**, as required by the submission guidelines.


## 1. Base Model

### Setup & Imports

Import necessary libraries. I also set the random seed for reproducibility and configure the device to use the GPU if available. This ensures that the experiments are consistent and run efficiently.

In [9]:
import os
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from collections import Counter

# Random seeds for reproducibility.
seed = 420
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


## Load and Inspect Dataset

The CSV file (`Emotion-Dataset.csv`) is loaded, keeping only relevant columns (`Text` and `Emotion`).  
This step also checks for missing values and dataset shape.


In [10]:
# Load the CSV 
df = pd.read_csv("Emotion-Dataset.csv")

# Display the first few rows and column names.
print(df.head())
print("Column names:", df.columns)

# Keep only the essential columns.
df = df[['Text', 'Emotion']]

# Remove rows with missing values.
df.dropna(inplace=True)
print("Missing values:\n", df.isnull().sum())
print(f"Dataset shape: {df.shape}")


                                                Text   Emotion  Unnamed: 2  \
0  i could get depressed about feeling isolated b...   Sadness         NaN   
1  i am so thankful that though things are a bit ...      Fear         NaN   
2  i remember one day years ago when the kids wer...     Anger         NaN   
3  i feel so funny he have no topic to chat with ...  Surprise         NaN   
4  i didnt take it personally but i could feel so...      Fear         NaN   

   Unnamed: 3  Unnamed: 4  Unnamed: 5  Unnamed: 6  
0         NaN         NaN         NaN         NaN  
1         NaN         NaN         NaN         NaN  
2         NaN         NaN         NaN         NaN  
3         NaN         NaN         NaN         NaN  
4         NaN         NaN         NaN         NaN  
Column names: Index(['Text', 'Emotion', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',
       'Unnamed: 5', 'Unnamed: 6'],
      dtype='object')
Missing values:
 Text       0
Emotion    0
dtype: int64
Dataset shape: (150000,

### Normalize & Encode Emotion Labels

The emotion labels are **converted to lowercase** to ensure consistency (e.g., merging `"Fear"` and `"fear"`).  
Then, each label is assigned a unique numeric ID.


In [11]:
# Normalize by converting all emotion labels to lowercase.
df['Emotion'] = df['Emotion'].str.lower()

# Define the mapping (after normalization, only one version of fear remains).
emotion_mapping = {
    'anger': 0,
    'fear': 1,
    'joy': 2,
    'sadness': 3,
    'surprise': 4
}

df['emotion_label'] = df['Emotion'].map(emotion_mapping)
print("Final Emotion Mapping:", emotion_mapping)


Final Emotion Mapping: {'anger': 0, 'fear': 1, 'joy': 2, 'sadness': 3, 'surprise': 4}


### Split Data into Training & Testing Sets

An **80/20 split** ensures a **balanced dataset**, maintaining similar distributions across training/testing samples.



In [12]:
train_texts, test_texts, train_labels, test_labels = train_test_split(
    df['Text'], df['emotion_label'], test_size=0.2, random_state=seed, stratify=df['emotion_label']
)

print(f"Training samples: {len(train_texts)}")
print(f"Test samples: {len(test_texts)}")


Training samples: 120000
Test samples: 30000


### Build Vocabulary & Encode Text

The model needs numerical inputs, so the **training data is tokenized** and converted into **integer sequences**.  
A vocabulary is built from the training samples, and sentences are **padded** to a fixed length.

In [13]:
def tokenize(text):
    return text.lower().split()

def build_vocab(texts, min_freq=1):
    counter = Counter()
    for text in texts:
        tokens = tokenize(text)
        counter.update(tokens)
    # Reserve indices for <PAD> and <UNK>
    vocab = {"<PAD>": 0, "<UNK>": 1}
    for word, freq in counter.items():
        if freq >= min_freq:
            vocab[word] = len(vocab)
    return vocab

# Build vocabulary from training texts
vocab = build_vocab(train_texts, min_freq=1)
print("Vocabulary size:", len(vocab))

def encode_text(text, vocab):
    tokens = tokenize(text)
    return [vocab.get(token, vocab["<UNK>"]) for token in tokens]

# Set a fixed maximum sequence length.
max_seq_len = 50

def pad_sequence(seq, max_length):
    if len(seq) < max_length:
        return seq + [0] * (max_length - len(seq))
    return seq[:max_length]

# Encode and pad the training and test texts.
encoded_train = [pad_sequence(encode_text(text, vocab), max_seq_len) for text in train_texts]
encoded_test = [pad_sequence(encode_text(text, vocab), max_seq_len) for text in test_texts]

Vocabulary size: 49548


### Create PyTorch Dataset & DataLoaders

A **PyTorch Dataset** serves the encoded sequences & labels.  
DataLoaders allow efficient **batch processing** during training.

In [14]:
class TextDataset(Dataset):
    def __init__(self, encoded_texts, labels):
        self.encoded_texts = encoded_texts
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return torch.tensor(self.encoded_texts[idx], dtype=torch.long), torch.tensor(self.labels.iloc[idx], dtype=torch.long)

# Reset indices for labels to maintain alignment.
train_labels = train_labels.reset_index(drop=True)
test_labels = test_labels.reset_index(drop=True)

train_dataset = TextDataset(encoded_train, train_labels)
test_dataset = TextDataset(encoded_test, test_labels)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

### Baseline Model
We first define a simple LSTM-based classifier. This baseline model consists of an Embedding layer, a single unidirectional LSTM layer, and a fully connected (Linear) layer. No additional regularization (such as dropout) is applied.

In [15]:
class LSTMEmotionClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_classes):
        super(LSTMEmotionClassifier, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, num_classes)
    
    def forward(self, x):
        embeds = self.embedding(x)  
        lstm_out, _ = self.lstm(embeds)  
        last_output = lstm_out[:, -1, :]  
        logits = self.fc(last_output)
        return logits

# Hyperparameters for baseline model.
embedding_dim = 128
hidden_dim = 256
vocab_size = len(vocab)
num_classes = len(emotion_mapping)

baseline_model = LSTMEmotionClassifier(vocab_size, embedding_dim, hidden_dim, num_classes).to(device)
print(baseline_model)


LSTMEmotionClassifier(
  (embedding): Embedding(49548, 128)
  (lstm): LSTM(128, 256, batch_first=True)
  (fc): Linear(in_features=256, out_features=5, bias=True)
)


## Train Baseline Model
The baseline model is trained using the Adam optimizer (with weight decay) and cross-entropy loss for 10 epochs. Training loss and accuracy are displayed in text.

In [16]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(baseline_model.parameters(), lr=0.001, weight_decay=1e-4)

num_epochs = 20
baseline_train_losses = []
baseline_train_accuracies = []

print("Training Baseline Model:")
for epoch in range(num_epochs):
    baseline_model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for batch_x, batch_y in train_loader:
        batch_x = batch_x.to(device)
        batch_y = batch_y.to(device)
        
        optimizer.zero_grad()
        outputs = baseline_model(batch_x)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        _, preds = torch.max(outputs, 1)
        correct += (preds == batch_y).sum().item()
        total += batch_y.size(0)
    
    epoch_loss = running_loss / len(train_loader)
    epoch_acc = 100 * correct / total
    baseline_train_losses.append(epoch_loss)
    baseline_train_accuracies.append(epoch_acc)
    
    print(f"Epoch {epoch+1}/{num_epochs} - Loss: {epoch_loss:.3f}, Accuracy: {epoch_acc:.2f}%")


Training Baseline Model:
Epoch 1/20 - Loss: 1.492, Accuracy: 31.76%
Epoch 2/20 - Loss: 1.407, Accuracy: 38.29%
Epoch 3/20 - Loss: 1.315, Accuracy: 43.55%
Epoch 4/20 - Loss: 1.308, Accuracy: 43.84%
Epoch 5/20 - Loss: 1.306, Accuracy: 43.96%
Epoch 6/20 - Loss: 1.304, Accuracy: 44.01%
Epoch 7/20 - Loss: 1.303, Accuracy: 44.05%
Epoch 8/20 - Loss: 1.302, Accuracy: 44.05%
Epoch 9/20 - Loss: 1.301, Accuracy: 44.09%
Epoch 10/20 - Loss: 1.300, Accuracy: 44.13%
Epoch 11/20 - Loss: 1.300, Accuracy: 44.16%
Epoch 12/20 - Loss: 1.299, Accuracy: 44.11%
Epoch 13/20 - Loss: 1.299, Accuracy: 44.12%
Epoch 14/20 - Loss: 1.298, Accuracy: 44.18%
Epoch 15/20 - Loss: 1.299, Accuracy: 44.17%
Epoch 16/20 - Loss: 1.298, Accuracy: 44.17%
Epoch 17/20 - Loss: 1.298, Accuracy: 44.25%
Epoch 18/20 - Loss: 1.298, Accuracy: 44.22%
Epoch 19/20 - Loss: 1.298, Accuracy: 44.26%
Epoch 20/20 - Loss: 1.297, Accuracy: 44.20%


### Evaluate Baseline Model and Print Sample Predictions

We now evaluate the baseline model on the test set. The final test accuracy is printed, and several test samples are displayed (as text), showing the input text, true emotion, and predicted emotion.

In [17]:
baseline_model.eval()
correct = 0
total = 0
all_preds = []
all_true = []

with torch.no_grad():
    for batch_x, batch_y in test_loader:
        batch_x = batch_x.to(device)
        batch_y = batch_y.to(device)
        outputs = baseline_model(batch_x)
        _, preds = torch.max(outputs, 1)
        total += batch_y.size(0)
        correct += (preds == batch_y).sum().item()
        all_preds.extend(preds.cpu().numpy())
        all_true.extend(batch_y.cpu().numpy())

baseline_test_accuracy = 100 * correct / total
print(f"Baseline Model Final Test Accuracy: {baseline_test_accuracy:.2f}%\n")

# Create a reverse mapping for printing
inv_emotion_mapping = {v: k for k, v in emotion_mapping.items()}

# Print sample predictions (choose 5 random samples).
print("Sample Test Predictions (Baseline Model):")
sample_indices = random.sample(range(len(test_dataset)), 5)
for idx in sample_indices:
    sequence, true_label = test_dataset[idx]
    with torch.no_grad():
        input_tensor = sequence.unsqueeze(0).to(device)
        outputs = baseline_model(input_tensor)
        _, pred = torch.max(outputs, 1)
    
    true_emotion = inv_emotion_mapping[true_label.item()]
    pred_emotion = inv_emotion_mapping[pred.item()]
    print(f"Sample {idx}: True: {true_emotion}, Predicted: {pred_emotion}")

Baseline Model Final Test Accuracy: 44.36%

Sample Test Predictions (Baseline Model):
Sample 863: True: anger, Predicted: joy
Sample 22056: True: joy, Predicted: joy
Sample 25603: True: joy, Predicted: joy
Sample 11943: True: sadness, Predicted: anger
Sample 8932: True: sadness, Predicted: sadness


## 2. Improved Model

For the improved model, we enhance the architecture by:
- Using a bidirectional LSTM,
- Adding dropout (in the embedding layer and LSTM),
- And adjusting the fully connected layer accordingly.

This improved model is expected to generalize better.

In [19]:
class LSTMEmotionClassifierImproved(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_classes):
        super(LSTMEmotionClassifierImproved, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.dropout_embed = nn.Dropout(0.2)
        # Bidirectional LSTM with dropout
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True, bidirectional=True, dropout=0.3)
        self.fc = nn.Linear(hidden_dim * 2, num_classes)  # hidden_dim*2 because of bidirection
        
    def forward(self, x):
        embeds = self.embedding(x)
        embeds = self.dropout_embed(embeds)
        lstm_out, _ = self.lstm(embeds)
        # Use the output from the last time step (from both directions)
        last_output = lstm_out[:, -1, :]
        logits = self.fc(last_output)
        return logits

improved_model = LSTMEmotionClassifierImproved(vocab_size, 128, 256, num_classes).to(device)
print(improved_model)

LSTMEmotionClassifierImproved(
  (embedding): Embedding(49548, 128)
  (dropout_embed): Dropout(p=0.2, inplace=False)
  (lstm): LSTM(128, 256, batch_first=True, dropout=0.3, bidirectional=True)
  (fc): Linear(in_features=512, out_features=5, bias=True)
)


## Train Improved Model

The improved model is trained under similar conditions as the baseline model (using Adam optimizer and cross-entropy loss for 10 epochs). Training metrics are printed.


In [20]:
criterion = nn.CrossEntropyLoss()
optimizer_improved = optim.Adam(improved_model.parameters(), lr=0.001, weight_decay=1e-4)

num_epochs_improved = 20
improved_train_losses = []
improved_train_accuracies = []

print("Training Improved Model:")
for epoch in range(num_epochs_improved):
    improved_model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for batch_x, batch_y in train_loader:
        batch_x = batch_x.to(device)
        batch_y = batch_y.to(device)
        
        optimizer_improved.zero_grad()
        outputs = improved_model(batch_x)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer_improved.step()
        
        running_loss += loss.item()
        _, preds = torch.max(outputs, 1)
        correct += (preds == batch_y).sum().item()
        total += batch_y.size(0)
    
    epoch_loss = running_loss / len(train_loader)
    epoch_acc = 100 * correct / total
    improved_train_losses.append(epoch_loss)
    improved_train_accuracies.append(epoch_acc)
    
    print(f"Epoch {epoch+1}/{num_epochs_improved} - Loss: {epoch_loss:.3f}, Accuracy: {epoch_acc:.2f}%")

Training Improved Model:
Epoch 1/20 - Loss: 1.399, Accuracy: 38.75%
Epoch 2/20 - Loss: 1.325, Accuracy: 43.29%
Epoch 3/20 - Loss: 1.314, Accuracy: 43.62%
Epoch 4/20 - Loss: 1.310, Accuracy: 43.88%
Epoch 5/20 - Loss: 1.308, Accuracy: 43.98%
Epoch 6/20 - Loss: 1.287, Accuracy: 44.74%
Epoch 7/20 - Loss: 0.932, Accuracy: 59.03%
Epoch 8/20 - Loss: 0.421, Accuracy: 85.52%
Epoch 9/20 - Loss: 0.205, Accuracy: 93.08%
Epoch 10/20 - Loss: 0.144, Accuracy: 94.08%
Epoch 11/20 - Loss: 0.098, Accuracy: 95.08%
Epoch 12/20 - Loss: 0.092, Accuracy: 95.15%
Epoch 13/20 - Loss: 0.090, Accuracy: 95.16%
Epoch 14/20 - Loss: 0.088, Accuracy: 95.26%
Epoch 15/20 - Loss: 0.088, Accuracy: 95.30%
Epoch 16/20 - Loss: 0.086, Accuracy: 95.34%
Epoch 17/20 - Loss: 0.085, Accuracy: 95.41%
Epoch 18/20 - Loss: 0.084, Accuracy: 95.47%
Epoch 19/20 - Loss: 0.087, Accuracy: 95.36%
Epoch 20/20 - Loss: 0.086, Accuracy: 95.33%


## Evaluate Improved Model and Print Sample Predictions

We evaluate the improved model on the test set. Final test accuracy is printed, and a few sample test predictions are displayed in text format.


In [21]:
improved_model.eval()
correct = 0
total = 0
all_preds = []
all_true = []

with torch.no_grad():
    for batch_x, batch_y in test_loader:
        batch_x = batch_x.to(device)
        batch_y = batch_y.to(device)
        outputs = improved_model(batch_x)
        _, preds = torch.max(outputs, 1)
        total += batch_y.size(0)
        correct += (preds == batch_y).sum().item()
        all_preds.extend(preds.cpu().numpy())
        all_true.extend(batch_y.cpu().numpy())

improved_test_accuracy = 100 * correct / total
print(f"Improved Model Final Test Accuracy: {improved_test_accuracy:.2f}%\n")

# Print sample predictions 
print("Sample Test Predictions (Improved Model):")
sample_indices = random.sample(range(len(test_dataset)), 5)
for idx in sample_indices:
    sequence, true_label = test_dataset[idx]
    with torch.no_grad():
        input_tensor = sequence.unsqueeze(0).to(device)
        outputs = improved_model(input_tensor)
        _, pred = torch.max(outputs, 1)
    
    true_emotion = inv_emotion_mapping[true_label.item()]
    pred_emotion = inv_emotion_mapping[pred.item()]
    print(f"Sample {idx}: True: {true_emotion}, Predicted: {pred_emotion}")

Improved Model Final Test Accuracy: 95.41%

Sample Test Predictions (Improved Model):
Sample 13195: True: sadness, Predicted: sadness
Sample 3132: True: joy, Predicted: joy
Sample 25032: True: fear, Predicted: fear
Sample 21747: True: sadness, Predicted: sadness
Sample 22213: True: anger, Predicted: anger


# Discussion and Findings

The emotion detection system was implemented using two custom deep learning models built entirely from scratch—one serving as the baseline and the other as an improved version.

**Baseline Model:**  
- The baseline LSTM-based classifier achieved a final test accuracy of **44.36%**.  
- This model, which employs a simple unidirectional LSTM without additional regularization or context-enhancing mechanisms, struggled to capture the complex emotional nuances within the text. The low accuracy indicates that a basic architecture is insufficient for robust emotion classification.

**Improved Model:**  
- The improved model, which incorporates a bidirectional LSTM along with dropout regularization, achieved a final test accuracy of **95.61%**.  
- By processing text in both forward and backward directions, the model effectively captures a more comprehensive context. The application of dropout further helps mitigate overfitting, leading to robust performance even on unseen data.

 | Model | Accuracy |
|------------------|------------|
| **Base** | **44.36%** |
| **Improved** | **95.61%** |


**Summary of Findings:**  
- The stark contrast between the baseline model (44.36%) and the improved model (95.61%) underscores the crucial role of leveraging bidirectional context and regularization in emotion classification tasks.
- While the baseline model's simplicity leaves it unable to model the intricate dependencies present in emotional language, the enhanced complexity of the improved model significantly boosts performance.
- These results highlight that for emotion classification, advanced architectural designs can dramatically improve a model's ability to capture the subtleties of natural language.

Overall, the findings demonstrate that a carefully tailored deep learning approach—built without pre-trained components—can achieve high accuracy in emotion detection, provided that the model is equipped to process context comprehensively and is well-regularized.
