## Sentiment Classification Tutorial: Classical vs. Deep Learning Models (IMDB Reviews)

This tutorial adapts the text classification framework to a binary sentiment analysis task using the IMDB Movie Reviews dataset. We will classify customer reviews into two categories: 'positive' or 'negative'.

We will compare the performance of:

* **Bag-of-Words (BoW) Classifier using TF-IDF features.**

* **Gated Recurrent Unit (GRU) from PyTorch.**

* **Bi-directional GRU (Bi-GRU) from PyTorch.**


#### Dataset: IMDB-Dataset.csv (The actual uploaded file is now used)

#### Target: Binary Classification (2 classes: Positive, Negative)

### In this Notebook, we will focus on GRU and Bi-GRU.

## 1. Setup and Data Loading

We import necessary libraries and load the IMDB dataset, converting the categorical sentiment labels into numerical format.

In [2]:
# %%
# Core data science and NLP libraries
import numpy as np
import pandas as pd
import re
import os

# Scikit-learn for classical ML
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

# PyTorch for Deep Learning
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pad_sequence # Utility for sequence padding

# Tqdm for progress bars (helpful during training)
from tqdm.notebook import tqdm

from google.colab import drive
drive.mount('/content/drive')

# --- CRITICAL IMPORTS FOR TPU/XLA ---
try:
    import torch_xla.core.xla_model as xm
    # Note: parallel_loader and xla_multiprocessing are not needed for single core
    XLA_AVAILABLE = True
except ImportError:
    XLA_AVAILABLE = False
    print("WARNING: torch_xla not found. Running on CPU/CUDA fallback.")
# --- END XLA IMPORTS ---

# Set a random seed for reproducibility
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

# Set device for PyTorch operations
if XLA_AVAILABLE:
    # Use xm.xla_device() to get the primary TPU core device
    DEVICE = xm.xla_device()
    N_DEVICES = 1 # Force single device count
    print(f"Using Single XLA Device: {DEVICE}")
elif torch.cuda.is_available():
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.enabled = True
    torch.cuda.manual_seed_all(SEED)
    DEVICE = torch.device('cuda')
    print(f'Using device: {DEVICE}')
elif torch.backends.mps.is_available():
    DEVICE = torch.device('mps')
    print(f'Using device: {DEVICE}')
else:
    DEVICE = torch.device('cpu')
    print(f'Using device: {DEVICE}')

Mounted at /content/drive
Using device: cuda


In [3]:
# --- Load the IMDB Dataset ---

FILE_PATH = '/content/drive/MyDrive/Colab Notebooks/IMDB-Dataset.csv'
df = pd.read_csv(FILE_PATH)


# 1. Label Encoding: Convert 'positive' to 1 and 'negative' to 0
# The 'sentiment' column is the raw target string
df['target'] = df['sentiment'].apply(lambda x: 1 if x == 'positive' else 0)

X = df['review'].values
y = df['target'].values

# Use only first 10,000 samples for quicker experimentation
X = X[:10000]
y = y[:10000]

# 2. Split Data into Training and Testing Sets
# We use a standard 80/20 split for training and testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.1,
    random_state=SEED,
    stratify=y # Ensure equal class distribution in both splits
)

target_names = ['negative', 'positive'] # Map numerical targets back to names (0, 1)

print(f"\nTraining Samples: {len(X_train)}")
print(f"Testing Samples: {len(X_test)}")
print(f"Total Classes: {len(target_names)}")
print(f"Classes: {target_names}")
print("-" * 50)
print(f"Example Data Point (Class: {target_names[y_train[0]]}):\n{X_train[0][:300]}...\n")
# %%


Training Samples: 9000
Testing Samples: 1000
Total Classes: 2
Classes: ['negative', 'positive']
--------------------------------------------------
Example Data Point (Class: negative):
I found Super Troopers only mildly amusing at best (seemed like a glorified Police Academy ripoff to me), and I rented this movie in hopes of it being better. It wasn't.<br /><br />The writing is absolutely horrible and the pacing of this film is even worse. It doesn't feel like a whole lot happens ...



### 3. **Preprocessing for Deep Learning (PyTorch)**

The text data is tokenized, vocabulary is built, and sequences are padded to prepare them for the recurrent neural networks.

The pipeline follows the standard NLP preprocessing flow:
* Tokenization: $f_{tok}: \text{String} \to \{t_1, t_2, ... t_n\}$
* Vocabulary Construction: Building a bijection $V \leftrightarrow \mathbb{Z}$ based on corpus statistics (Zipf's Law).
* Numericalization: Mapping observed tokens to indices for Embedding layer lookups.

In [6]:
from collections import Counter

# --- Deep Learning Preprocessing and Tokenization --- #
# --- Configuration --- #
MAX_VOCAB_SIZE = 10000
MIN_WORD_FREQ = 5
SPECIAL_TOKENS = {'<unk>': 0, '<pad>': 1}

def preprocess_and_tokenize(text):
    """
    Cleans text and converts to a list of tokens.
    Theory: Reduces dimensionality by normalizing case and removing noise (HTML/Punctuation).
    """
    # 1. Remove HTML tags (e.g., <br />)
    text = re.sub(r'<br />', ' ', text)

    # 2. Filter non-alphabetic chars
    # Keeps the feature space dense by removing low-signal symbols.
    text = re.sub(r'[^a-zA-Z\s]', '', text)

    # 3. Lowercase and split (Whitespace Tokenization)
    return text.lower().split()

def build_vocab(corpus, max_size, min_freq):
    """
    Constructs mappings based on token frequency.
    """
    # Flatten corpus and count frequencies: O(N)
    # We tokenize once here to build statistics.
    token_counts = Counter()
    for doc in corpus:
        token_counts.update(preprocess_and_tokenize(doc))

    # Initialize vocab with special tokens
    # <unk>: Handles Out-Of-Vocabulary (OOV) words.
    # <pad>: Used for batching variable-length sequences.
    vocab = SPECIAL_TOKENS.copy()

    # Sort by frequency (descending) to prioritize most informative words
    sorted_tokens = token_counts.most_common()

    for token, count in sorted_tokens:
        # Stop if vocab is full
        if len(vocab) >= max_size:
            break

        # Stop if frequency threshold is met (Long-tail truncation)
        if count < min_freq:
            break

        if token not in vocab:
            vocab[token] = len(vocab)

    return vocab

def numericalize(text, vocab):
    """
    Maps tokens to integer indices $w \to i$.
    """
    tokens = preprocess_and_tokenize(text)
    # Get index or default to <unk> (0)
    return [vocab.get(token, vocab['<unk>']) for token in tokens]

# --- Execution ---

# 1. Build Vocabulary
# Assuming X_train is a list of strings
token_to_idx = build_vocab(X_train, MAX_VOCAB_SIZE, MIN_WORD_FREQ)
idx_to_token = {idx: token for token, idx in token_to_idx.items()}

print(f"Vocabulary Size: {len(token_to_idx)}")

# 2. Numericalize Data (Vectorization)
# Converts text to distinct integer tensors.
# Note: Result is a list of variable-length tensors (Ragged Tensor).
# For batch training, these must be padded later (e.g., using pad_sequence).
X_train_seq = [torch.tensor(numericalize(doc, token_to_idx), dtype=torch.long) for doc in X_train]
X_test_seq  = [torch.tensor(numericalize(doc, token_to_idx), dtype=torch.long) for doc in X_test]

Vocabulary Size: 10000


In [7]:
# %%
# --- Deep Learning Preprocessing and Tokenization --- #

# 3. PyTorch Dataset Class
class TextDataset(Dataset):
    def __init__(self, sequences, labels):
        self.sequences = sequences
        self.labels = torch.tensor(labels, dtype=torch.long)

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return self.sequences[idx], self.labels[idx]

# 4. Collate Function for Padding (Crucial for Batch Processing)
def collate_batch(batch):
    # Separate sequences and labels
    sequences, labels = zip(*batch)

    # Pad sequences to the length of the longest sequence in the batch
    sequences_padded = pad_sequence(sequences, batch_first=True, padding_value=token_to_idx['<pad>'])

    # Move labels to tensor
    labels = torch.stack(labels)
    return sequences_padded.to(DEVICE), labels.to(DEVICE)

# Create datasets
train_dataset = TextDataset(X_train_seq, y_train)
test_dataset = TextDataset(X_test_seq, y_test)

# Define Hyperparameters and DataLoaders
BATCH_SIZE = 32

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    collate_fn=collate_batch
)
test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    collate_fn=collate_batch
)

# Check a batch from the DataLoader
for batch_sequences, batch_labels in train_loader:
    print(f"Batch Sequences Shape: {batch_sequences.shape}") # (batch_size, seq_length)
    print(f"Batch Labels Shape: {batch_labels.shape}")       # (batch_size,)
    break

Batch Sequences Shape: torch.Size([32, 645])
Batch Labels Shape: torch.Size([32])


### 4. Classifiers 2 & 3: GRU and Bi-directional GRU (PyTorch)

The GRU networks are ideal for capturing the sequential dependency of words, which is critical for sentiment where early words can set the tone for later ones.

In [12]:
# %%
# --- PyTorch GRU Model Definition (Reused) ---

class TextClassifierRNN(nn.Module):
    """
    A customizable Recurrent Neural Network for Text Classification
    using either a standard GRU or a Bi-directional GRU layer.
    """
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, num_layers, dropout, bidirectional):
        super().__init__()

        # 1. Embedding Layer: Converts word indices into dense vectors.
        # dropout is applied immediately after the embedding layer for regularization
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=token_to_idx['<pad>'])

        # 2. GRU Layer: The core sequence processing unit.
        self.rnn = nn.GRU(
            input_size=embedding_dim,
            hidden_size=hidden_dim,
            num_layers=num_layers,
            bidirectional=bidirectional, # Toggles between GRU and Bi-GRU
            dropout=dropout,
            batch_first=True
        )

        # 3. Fully Connected (FC) Layer for Classification
        fc_input_dim = hidden_dim * 2 if bidirectional else hidden_dim
        self.fc = nn.Linear(fc_input_dim, output_dim)

        # 4. Dropout for regularization
        self.dropout = nn.Dropout(dropout)

        self.bidirectional = bidirectional
        self.num_layers = num_layers
        self.hidden_dim = hidden_dim

    def forward(self, text):
        # text shape: [batch size, seq len]

        # 1. Embedding
        embedded = self.dropout(self.embedding(text))
        # embedded shape: [batch size, seq len, embedding dim]

        # 2. GRU Forward Pass
        output, hidden = self.rnn(embedded)

        # 3. Global Pooling / Final State Extraction
        if self.bidirectional:
            # Concatenate the final forward and backward hidden states from the last layer
            # hidden shape: [num_layers * num_directions, batch size, hidden dim]
            last_hidden_forward = hidden[-2, :, :]
            last_hidden_backward = hidden[-1, :, :]
            final_hidden = torch.cat((last_hidden_forward, last_hidden_backward), dim=1)
        else:
            # Use the final hidden state of the last layer
            final_hidden = hidden[-1, :, :]

        # 4. Classification Layer
        prediction = self.fc(final_hidden)

        return prediction
# %%



### Generic Training and Evaluation Functions

In [13]:
# %%
# --- Training and Evaluation Helpers (Unchanged from previous version) ---

def train_epoch(model, iterator, optimizer, criterion, grad_thresh):
    """Performs a single training epoch."""
    epoch_loss = 0
    model.train() # Set model to training mode

    for text, labels in tqdm(iterator, desc="Training"):
        optimizer.zero_grad()

        # Move data to the single device (TPU or CPU/GPU)
        text, labels = text.to(DEVICE), labels.to(DEVICE)

        predictions = model(text)
        loss = criterion(predictions, labels)
        loss.backward()

        # Gradient Clipping for to mitigate Exploding Gradient problem.
        nn.utils.clip_grad_norm_(model.parameters(), grad_thresh) # Gradient clipping

        if XLA_AVAILABLE:
            # XLA specific optimization step
            xm.optimizer_step(optimizer)
            xm.mark_step() # Signal end of computation step to XLA
        else:
            optimizer.step()


        # Loss aggregation (standard, non-distributed)
        epoch_loss += loss.item() * len(text)

    return epoch_loss / len(iterator.dataset)

def evaluate(model, iterator, criterion):
    """Evaluates the model on the test set."""
    epoch_loss = 0
    all_predictions = []
    all_labels = []
    model.eval() # Set model to evaluation mode

    with torch.no_grad(): # Disable gradient calculation
        for text, labels in tqdm(iterator, desc="Evaluating"):
            # Move data to the single device (TPU or CPU/GPU)
            text, labels = text.to(DEVICE), labels.to(DEVICE)

            predictions = model(text)
            loss = criterion(predictions, labels)

            # Standard Calculate metrics
            _, predicted_classes = torch.max(predictions, 1)
            all_predictions.extend(predicted_classes.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            epoch_loss += loss.item() * len(text)

    avg_loss = epoch_loss / len(iterator.dataset)
    accuracy = accuracy_score(all_labels, all_predictions)
    f1_weighted = f1_score(all_labels, all_predictions, average='weighted')

    return avg_loss, accuracy, f1_weighted, all_predictions, all_labels

def run_experiment(model, name, train_loader, test_loader, num_epochs, learning_rate, grad_thresh):
    """Main experiment runner function."""
    print(f"\n--- Running Experiment: {name} ---")

    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    criterion = nn.CrossEntropyLoss() # Standard loss for multi-class classification

    best_f1 = 0

    for epoch in range(num_epochs):
        train_loss = train_epoch(model, train_loader, optimizer, criterion, grad_thresh)
        test_loss, test_acc, test_f1, _, _ = evaluate(model, test_loader, criterion)

        print(f'Epoch: {epoch+1:02} | Train Loss: {train_loss:.4f} | Test Loss: {test_loss:.4f} | Test Acc: {test_acc:.4f} | Test F1: {test_f1:.4f}')

        if test_f1 > best_f1:
            best_f1 = test_f1
            # Save the best model state dict
            # Use xm.save if XLA is available, otherwise standard torch.save
            save_func = xm.save if XLA_AVAILABLE else torch.save
            save_func(model.state_dict(), f'{name}_best_model.pt')

    print(f"Loading best model weights for {name}...")
    # Load the state dict to ensure we use the best performing model
    if os.path.exists(f'{name}_best_model.pt'):
        model.load_state_dict(torch.load(f'{name}_best_model.pt'))

    # Final Evaluation
    test_loss, test_acc, test_f1, y_pred, y_true = evaluate(model, test_loader, criterion)

    print(f"\nFINAL PERFORMANCE ({name}):")
    print(f"Accuracy: {test_acc:.4f}, Weighted F1-Score: {test_f1:.4f}")
    print("\nClassification Report:")
    print(classification_report(y_true, y_pred, target_names=target_names))

    # The parameters total_params_gru and total_params_bi_gru are calculated
    # and stored in the next cells. We return only the metrics here.
    return {'Accuracy': test_acc, 'F1-Score': test_f1}
# %%


### 4.1. Classifier 2: Unidirectional GRU


In [14]:
# %%
# --- GRU Model Hyperparameters ---
VOCAB_SIZE = len(token_to_idx)
EMBEDDING_DIM = 128
HIDDEN_DIM = 256
OUTPUT_DIM = 2 # TWO CLASSES: Negative (0) and Positive (1)
NUM_LAYERS = 2
DROPOUT = 0.15
NUM_EPOCHS = 25
LEARNING_RATE = 0.001
GRAD_THRESH = 1.0

# Store results for comparison
results = {}

# 1. Instantiate the Unidirectional GRU Model
gru_model = TextClassifierRNN(
    vocab_size=VOCAB_SIZE,
    embedding_dim=EMBEDDING_DIM,
    hidden_dim=HIDDEN_DIM,
    output_dim=OUTPUT_DIM,
    num_layers=NUM_LAYERS,
    dropout=DROPOUT,
    bidirectional=False # UNIDIRECTIONAL
).to(DEVICE)


# Check total trainable parameters
total_params_gru = sum(p.numel() for p in gru_model.parameters() if p.requires_grad)
print(f"Unidirectional GRU Model Parameters: {total_params_gru:,}")

# 2. Run the Experiment
gru_results = run_experiment(
    gru_model,
    "GRU",
    train_loader,
    test_loader,
    NUM_EPOCHS,
    LEARNING_RATE,
    GRAD_THRESH
)
results['GRU'] = gru_results
# %%


Unidirectional GRU Model Parameters: 1,971,714

--- Running Experiment: GRU ---


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 01 | Train Loss: 0.6945 | Test Loss: 0.6948 | Test Acc: 0.4960 | Test F1: 0.3449


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 02 | Train Loss: 0.6923 | Test Loss: 0.6967 | Test Acc: 0.4920 | Test F1: 0.3278


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 03 | Train Loss: 0.6920 | Test Loss: 0.6969 | Test Acc: 0.5010 | Test F1: 0.3461


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 04 | Train Loss: 0.6885 | Test Loss: 0.7008 | Test Acc: 0.5060 | Test F1: 0.3600


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 05 | Train Loss: 0.6884 | Test Loss: 0.6961 | Test Acc: 0.4940 | Test F1: 0.3406


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 06 | Train Loss: 0.6855 | Test Loss: 0.7011 | Test Acc: 0.5080 | Test F1: 0.3594


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 07 | Train Loss: 0.6849 | Test Loss: 0.7033 | Test Acc: 0.4920 | Test F1: 0.3380


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 08 | Train Loss: 0.6810 | Test Loss: 0.7018 | Test Acc: 0.5040 | Test F1: 0.3574


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 09 | Train Loss: 0.6818 | Test Loss: 0.6996 | Test Acc: 0.4940 | Test F1: 0.3390


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 10 | Train Loss: 0.6808 | Test Loss: 0.7060 | Test Acc: 0.5030 | Test F1: 0.3504


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 11 | Train Loss: 0.6763 | Test Loss: 0.7076 | Test Acc: 0.5050 | Test F1: 0.3513


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 12 | Train Loss: 0.6787 | Test Loss: 0.7126 | Test Acc: 0.4930 | Test F1: 0.3368


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 13 | Train Loss: 0.6795 | Test Loss: 0.7064 | Test Acc: 0.5050 | Test F1: 0.3563


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 14 | Train Loss: 0.6765 | Test Loss: 0.7090 | Test Acc: 0.5050 | Test F1: 0.3513


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 15 | Train Loss: 0.6827 | Test Loss: 0.7009 | Test Acc: 0.4900 | Test F1: 0.4878


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 16 | Train Loss: 0.6823 | Test Loss: 0.6971 | Test Acc: 0.5090 | Test F1: 0.3598


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 17 | Train Loss: 0.6793 | Test Loss: 0.6978 | Test Acc: 0.5110 | Test F1: 0.3641


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 18 | Train Loss: 0.6974 | Test Loss: 0.7095 | Test Acc: 0.5080 | Test F1: 0.3610


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 19 | Train Loss: 0.6109 | Test Loss: 0.4925 | Test Acc: 0.7730 | Test F1: 0.7729


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 20 | Train Loss: 0.3890 | Test Loss: 0.3906 | Test Acc: 0.8280 | Test F1: 0.8269


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 21 | Train Loss: 0.2196 | Test Loss: 0.4487 | Test Acc: 0.8180 | Test F1: 0.8153


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 22 | Train Loss: 0.1236 | Test Loss: 0.3972 | Test Acc: 0.8570 | Test F1: 0.8568


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 23 | Train Loss: 0.0747 | Test Loss: 0.6650 | Test Acc: 0.8190 | Test F1: 0.8157


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 24 | Train Loss: 0.0439 | Test Loss: 0.5732 | Test Acc: 0.8510 | Test F1: 0.8506


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 25 | Train Loss: 0.0392 | Test Loss: 0.5263 | Test Acc: 0.8740 | Test F1: 0.8739
Loading best model weights for GRU...


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]


FINAL PERFORMANCE (GRU):
Accuracy: 0.8740, Weighted F1-Score: 0.8739

Classification Report:
              precision    recall  f1-score   support

    negative       0.89      0.85      0.87       497
    positive       0.86      0.90      0.88       503

    accuracy                           0.87      1000
   macro avg       0.87      0.87      0.87      1000
weighted avg       0.87      0.87      0.87      1000



4.2. Classifier 3: Bi-directional GRU (Binary)

In [15]:
# %%
# --- Bi-directional GRU Model Hyperparameters ---
# All hyperparameters remain the same, only 'bidirectional' changes.
# --- GRU Model Hyperparameters ---
VOCAB_SIZE = len(token_to_idx)
EMBEDDING_DIM = 128
HIDDEN_DIM = 256
OUTPUT_DIM = 2 # TWO CLASSES: Negative (0) and Positive (1)
NUM_LAYERS = 2
DROPOUT = 0.15
NUM_EPOCHS = 15
LEARNING_RATE = 0.001
GRAD_THRESH = 1.0

# 1. Instantiate the Bi-directional GRU Model
bi_gru_model = TextClassifierRNN(
    vocab_size=VOCAB_SIZE,
    embedding_dim=EMBEDDING_DIM,
    hidden_dim=HIDDEN_DIM,
    output_dim=OUTPUT_DIM,
    num_layers=NUM_LAYERS,
    dropout=DROPOUT,
    bidirectional=True # BI-DIRECTIONAL
).to(DEVICE)

total_params_bi_gru = sum(p.numel() for p in bi_gru_model.parameters() if p.requires_grad)
print(f"Bi-directional GRU Model Parameters: {total_params_bi_gru:,}")

# 2. Run the Experiment
bi_gru_results = run_experiment(
    bi_gru_model,
    "BiGRU",
    train_loader,
    test_loader,
    NUM_EPOCHS,
    LEARNING_RATE,
    GRAD_THRESH
)
results['Bi-GRU'] = bi_gru_results
# %%


Bi-directional GRU Model Parameters: 3,056,642

--- Running Experiment: BiGRU ---


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 01 | Train Loss: 0.6739 | Test Loss: 0.6015 | Test Acc: 0.6900 | Test F1: 0.6892


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 02 | Train Loss: 0.6080 | Test Loss: 1.1008 | Test Acc: 0.6660 | Test F1: 0.6399


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 03 | Train Loss: 0.4239 | Test Loss: 0.3564 | Test Acc: 0.8400 | Test F1: 0.8389


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 04 | Train Loss: 0.2761 | Test Loss: 0.3208 | Test Acc: 0.8680 | Test F1: 0.8678


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 05 | Train Loss: 0.1824 | Test Loss: 0.3316 | Test Acc: 0.8760 | Test F1: 0.8759


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 06 | Train Loss: 0.1249 | Test Loss: 0.3687 | Test Acc: 0.8720 | Test F1: 0.8718


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 07 | Train Loss: 0.0740 | Test Loss: 0.4083 | Test Acc: 0.8830 | Test F1: 0.8828


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 08 | Train Loss: 0.0660 | Test Loss: 0.4377 | Test Acc: 0.8690 | Test F1: 0.8687


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 09 | Train Loss: 0.0323 | Test Loss: 0.4745 | Test Acc: 0.8710 | Test F1: 0.8710


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 10 | Train Loss: 0.0399 | Test Loss: 0.6130 | Test Acc: 0.8660 | Test F1: 0.8649


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 11 | Train Loss: 0.0247 | Test Loss: 0.5925 | Test Acc: 0.8760 | Test F1: 0.8760


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 12 | Train Loss: 0.0184 | Test Loss: 0.6284 | Test Acc: 0.8680 | Test F1: 0.8680


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 13 | Train Loss: 0.0149 | Test Loss: 0.6866 | Test Acc: 0.8640 | Test F1: 0.8639


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 14 | Train Loss: 0.0207 | Test Loss: 0.6842 | Test Acc: 0.8740 | Test F1: 0.8740


Training:   0%|          | 0/282 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Epoch: 15 | Train Loss: 0.0225 | Test Loss: 0.6108 | Test Acc: 0.8710 | Test F1: 0.8710
Loading best model weights for BiGRU...


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]


FINAL PERFORMANCE (BiGRU):
Accuracy: 0.8830, Weighted F1-Score: 0.8828

Classification Report:
              precision    recall  f1-score   support

    negative       0.91      0.84      0.88       497
    positive       0.86      0.92      0.89       503

    accuracy                           0.88      1000
   macro avg       0.89      0.88      0.88      1000
weighted avg       0.89      0.88      0.88      1000



In [16]:
# Print the Final Results
print("\nFINAL RESULTS:")
for model_name, metrics in results.items():
    print(f"{model_name}:")
    for metric_name, metric_value in metrics.items():
        print(f"  {metric_name}: {metric_value:.4f}")


FINAL RESULTS:
GRU:
  Accuracy: 0.8740
  F1-Score: 0.8739
Bi-GRU:
  Accuracy: 0.8830
  F1-Score: 0.8828
