**Introduction**

This project addresses the challenge of toxic comment classification by implementing and comparing three deep learning models: LSTM, GRU, and BERT. The goal is to identify and categorize toxic comments based on six categories: toxic, severe_toxic, obscene, threat, insult, and identity_hate.

**Dataset**

The project utilizes the "Toxic Comment Classification Challenge" dataset by Kaggle (Jigsaw toxic data), which consists of a large collection of Wikipedia comments labeled with binary values for each toxicity category. The dataset is provided in two separate files: `train.csv` and `test.csv`. The `train.csv` file contains both the comments and their corresponding labels, and is used for training and validating the models. The `test.csv` file contains the comments without labels, and is used to evaluate the final performance of the trained models on unseen data.

**Models**

Three distinct models are employed for the classification task:

**LSTM:** The LSTM model leverages its ability to capture long-range dependencies in text. Its architecture comprises a bidirectional LSTM with multiple layers, designed for handling sequential data with complex relationships. The training process uses AdamW optimization with learning rate scheduling and monitors both loss and Macro-F1 score for evaluation. Early stopping is implemented to prevent overfitting, while model checkpoints and learning curves aid in performance analysis.

**GRU**: The GRU model tackles toxic comment classification by leveraging its efficient Gated Recurrent Unit architecture. It captures contextual information effectively through bidirectional processing and employs techniques like layer normalization and attention mechanisms. AdamW optimization with learning rate scheduling guides the training, with a focus on minimizing loss and maximizing Macro-F1 score. Early stopping helps to avoid overfitting, and model checkpoints and learning curves provide insights into performance.

**BERT**: The process involves freezing initial model layers, adapting the final classification head, and utilizing AdamW optimization with learning rate scheduling. Performance is tracked with training/validation loss and Macro-F1 score, employing early stopping to prevent overfitting. Detailed error analysis is conducted by saving false positive/negative cases for review. Model checkpoints, learning curves, and error analysis summaries are generated for comprehensive evaluation.

**Training and Evaluation** (/learning_curves, /checkpoints)

Each model is trained on the training data and evaluated on a validation set using metrics such as loss and Macro-F1 score. Early stopping is implemented to prevent overfitting. Model checkpoints are saved to preserve the best-performing models. Learning curves are generated to visualize the training progress and saved. Models are tested on test data and predictions are saved.

**Error Analysis** (/error_analysis)

Detailed error analysis is performed, by saving false positive and false-negative cases for review. This analysis helps to understand the model's weaknesses and potential biases.

**Annotations**

Manually annotated 10 comments are tested. These annotations are not meant to provide a comprehensive evaluation.

**Results and Comparison**

The performance of the three models is compared, the macro-averaged F1 is prioritized as dataset is not balanced and it does not take class imbalance into account, which ensures that every class is given equal weight independently of its proportion. A comparison plot is generated to visualize the results. The evaluation process is thorough, implementing cross-validation with detailed metrics tracking. The system compares the performance of all three models, generating comparative visualizations and statistical analyses of their performance differences.

**Steps done:**

-> Tokenization using the transformers library including adding special tokens, padding/truncation, and creating attention masks. ✅:

1- The raw text is split into individual words or subwords (tokens) using the BERT vocabulary.

2- Special tokens are added.

3- The tokens are converted to numerical IDs.

4- The sequences are padded or truncated to a fixed length.

5- An attention mask is created to identify the valid tokens.


---

-> Train/validation/test split and evaluation ✅

Models are trained on train set

Evaluation on validation set during training

Final testing on test set

Cross-validation ensures robust evaluation

---


-> Curves and metrics ✅

Learning curves are plotted for both loss and F1 score

Training and validation metrics are tracked

Classification reports with precision, recall, F1

Plots saved as PNG files

---

-> Overfitting control ✅

Dropout implemented (LSTM and GRU)

Early stopping with patience=5

Weight initialization strategies

Gradient clipping

---

-> Hyperparameter experimentation ✅

AdamW optimizer with configurable learning rate

Learning rate scheduling with ReduceLROnPlateau

Early stopping criteria

---

-> Transfer learning/pretraining ✅

BERT model uses pretrained weights

Fine-tuning approach for BERT

Custom embedding layer for LSTM/GRU

---

-> Checkpointing ✅

Checkpoint saving for models

Early stopping implemented

---

-> Error analysis ✅

Detailed error analysis in analyze_errors function, false positives and negatives tracked, error analysis reports saved for each model.

In [2]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Create base directory for model outputs
base_output_dir = '/content/drive/My Drive/project_toxic_classifier'

Mounted at /content/drive


In [3]:
"""
Toxic Comment Classification

This module implements toxic comment classification using three different models:

- LSTM: Bidirectional with 2 layers, 256 hidden units
- GRU:  256 hidden units with a self-attention layer
- BERT: Pre-trained BERT-base with classification head

"""

import os
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import KFold
from sklearn.metrics import f1_score, classification_report
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    get_linear_schedule_with_warmup
)
from tqdm import tqdm
import matplotlib.pyplot as plt
from collections import defaultdict
import torch.nn.functional as F

class ToxicDataset(Dataset):
    """Custom Dataset for toxic comment classification.

    Attributes:
        texts (pd.Series): Input text data
        labels (pd.DataFrame): Binary labels for toxic categories
        tokenizer: BERT tokenizer
        max_len (int): Maximum sequence length
    """
    def __init__(self, texts, labels, tokenizer, max_len=128):
        self.texts = texts.reset_index(drop=True)
        self.labels = labels.reset_index(drop=True)
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        # Get text and convert to string
        text = str(self.texts.iloc[idx])
        # Convert labels to tensor
        label = torch.tensor(self.labels.iloc[idx].values, dtype=torch.float)

        # Tokenize text with padding and truncation
        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,  # Add [CLS] and [SEP]
            max_length=self.max_len,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': label,
            'text': text
        }

class LSTMClassifier(nn.Module):
    """LSTM model for toxic comment classification.

    Architecture:
        1. Embedding layer
        2. Bidirectional LSTM
        3. Fully connected output layer
    """
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout):
        super().__init__()

        # Embedding layer
        self.embedding = nn.Embedding(vocab_size, embedding_dim)

        # LSTM layer
        self.lstm = nn.LSTM(
            embedding_dim,
            hidden_dim,
            num_layers=n_layers,
            bidirectional=bidirectional,
            dropout=dropout if n_layers > 1 else 0,
            batch_first=True
        )

        # Initialize LSTM weights
        for name, param in self.lstm.named_parameters():
            if 'weight' in name:
                # Xavier initialization for weights
                nn.init.xavier_uniform_(param)
            elif 'bias' in name:
                # Zero initialization for biases
                nn.init.zeros_(param)

        # Output layer
        self.fc = nn.Linear(hidden_dim * 2 if bidirectional else hidden_dim, output_dim)
        nn.init.xavier_uniform_(self.fc.weight)
        nn.init.zeros_(self.fc.bias)

        # Dropout layer
        self.dropout = nn.Dropout(dropout)

    def forward(self, input_ids, attention_mask):
        # Apply embedding and dropout
        embedded = self.dropout(self.embedding(input_ids))

        # Pass through LSTM
        packed_output, (hidden, cell) = self.lstm(embedded)

        # Handle bidirectional output
        if self.lstm.bidirectional:
            # Concatenate forward and backward hidden states
            hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        else:
            hidden = hidden[-1,:,:]

        # Apply output layer with sigmoid activation
        return torch.sigmoid(self.fc(self.dropout(hidden)))

class GRUClassifier(nn.Module):
    """GRU model for toxic comment classification.

    1. GRU cells instead of vanilla RNN for better gradient flow
    2. Residual connections between layers
    3. Layer normalization
    4. Improved dropout strategy
    5. Attention mechanism
    6. Better weight initialization
    """
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, dropout):
        super().__init__()

        # Embedding layer with improved initialization
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        nn.init.xavier_uniform_(self.embedding.weight)

        # GRU instead of RNN for better gradient flow
        self.gru = nn.GRU(
            embedding_dim,
            hidden_dim,
            num_layers=n_layers,
            dropout=dropout if n_layers > 1 else 0,
            batch_first=True,
            bidirectional=True  # Enable bidirectional for better context capture
        )

        # Layer normalization for stable training
        self.layer_norm = nn.LayerNorm(hidden_dim * 2)

        # Attention mechanism
        self.attention = nn.Sequential(
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.Tanh(),
            nn.Linear(hidden_dim, 1),
            nn.Softmax(dim=1)
        )

        # Output layers with residual connection
        self.fc1 = nn.Linear(hidden_dim * 2, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

        # Improved dropout with different rates
        self.dropout_embed = nn.Dropout(dropout)
        self.dropout_hidden = nn.Dropout(dropout * 0.5)

        # Initialize GRU weights
        for name, param in self.gru.named_parameters():
            if 'weight' in name:
                nn.init.orthogonal_(param)  # Orthogonal initialization for better gradient flow
            elif 'bias' in name:
                nn.init.zeros_(param)

        # Initialize attention and FC layers
        for layer in [self.fc1, self.fc2] + list(self.attention.children()):
            if isinstance(layer, nn.Linear):
                nn.init.xavier_uniform_(layer.weight)
                nn.init.zeros_(layer.bias)

    def forward(self, input_ids, attention_mask):
        # Apply embedding and dropout
        embedded = self.dropout_embed(self.embedding(input_ids))

        # Pass through GRU
        output, hidden = self.gru(embedded)

        # Apply attention
        attention_weights = self.attention(output)
        context = torch.sum(attention_weights * output, dim=1)

        # Apply layer normalization
        context = self.layer_norm(context)

        # First dense layer with residual connection
        dense1 = self.fc1(context)
        dense1 = F.relu(dense1)
        dense1 = self.dropout_hidden(dense1)

        # Residual connection
        if dense1.shape == context.shape:
            dense1 = dense1 + context

        # Output layer
        output = self.fc2(dense1)
        return torch.sigmoid(output)


def train_epoch(model, train_loader, criterion, optimizer, device, epoch, total_epochs):
    """Train model for one epoch.

    Args:
        model: Neural network model
        train_loader: DataLoader for training data
        criterion: Loss function
        optimizer: Optimization algorithm
        device: Device to run on (cuda/cpu)
        epoch: Current epoch number
        total_epochs: Total number of epochs

    Returns:
        tuple: (average loss, macro F1 score)
    """
    model.train()
    total_loss = 0
    all_predictions = []
    all_labels = []

    # Progress bar for training
    progress_bar = tqdm(train_loader, desc=f'Epoch {epoch}/{total_epochs} [Train]')

    for batch in progress_bar:
        # Move batch to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        # Forward pass
        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask)
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        optimizer.step()

        # Track metrics
        total_loss += loss.item()
        predictions = (outputs.detach().cpu().numpy() > 0.5).astype(int)
        all_predictions.extend(predictions)
        all_labels.extend(labels.cpu().numpy())

        # Update progress bar
        progress_bar.set_postfix({'loss': f'{loss.item():.4f}'})

    # Calculate average metrics
    avg_loss = total_loss / len(train_loader)
    macro_f1 = f1_score(np.array(all_labels), np.array(all_predictions), average='macro')

    return avg_loss, macro_f1

def evaluate(model, val_loader, criterion, device):
    """Evaluate model on validation data.

    Args:
        model: Neural network model
        val_loader: DataLoader for validation data
        criterion: Loss function
        device: Device to run on (cuda/cpu)

    Returns:
        tuple: (average loss, macro F1 score, predictions, true labels)
    """
    model.eval()
    total_loss = 0
    all_predictions = []
    all_labels = []

    with torch.no_grad():
        for batch in tqdm(val_loader, desc='Validation'):
            # Move batch to device
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            # Forward pass
            outputs = model(input_ids, attention_mask)
            loss = criterion(outputs, labels)

            # Track metrics
            total_loss += loss.item()
            predictions = (outputs.cpu().numpy() > 0.5).astype(int)
            all_predictions.extend(predictions)
            all_labels.extend(labels.cpu().numpy())

    # Calculate average metrics
    avg_loss = total_loss / len(val_loader)
    macro_f1 = f1_score(np.array(all_labels), np.array(all_predictions), average='macro')

    return avg_loss, macro_f1, all_predictions, all_labels

def plot_learning_curves(train_losses, val_losses, train_f1s, val_f1s, model_name):
    """Plot training and validation learning curves.

    Args:
        train_losses: List of training losses
        val_losses: List of validation losses
        train_f1s: List of training F1 scores
        val_f1s: List of validation F1 scores
        model_name: Name of the model for plot title
    """
    plt.figure(figsize=(12, 5))

    # Plot loss curves
    plt.subplot(1, 2, 1)
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.title(f'{model_name} Learning Curves - Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()

    # Plot F1 score curves
    plt.subplot(1, 2, 2)
    plt.plot(train_f1s, label='Train F1')
    plt.plot(val_f1s, label='Validation F1')
    plt.title(f'{model_name} Learning Curves - F1 Score')
    plt.xlabel('Epoch')
    plt.ylabel('Macro F1')
    plt.legend()

    plt.tight_layout()
    plt.savefig(f'{model_name.lower()}_learning_curves.png')
    plt.close()


def train_model(model_class, model_params, train_data, labels, tokenizer,
                   epochs=5, batch_size=32, device='cuda'):
    """Train model and save to output directory."""

    # Create model-specific directory
    model_dir = os.path.join(base_output_dir, model_class.__name__)
    os.makedirs(model_dir, exist_ok=True)
    checkpoints_dir = os.path.join(model_dir, 'checkpoints')
    os.makedirs(checkpoints_dir, exist_ok=True)
    plots_dir = os.path.join(model_dir, 'learning_curves')
    os.makedirs(plots_dir, exist_ok=True)

    # Split data 80/20
    train_size = int(0.8 * len(train_data))
    train_idx = list(range(train_size))
    val_idx = list(range(train_size, len(train_data)))

    # Prepare data
    X_train = train_data.iloc[train_idx]
    y_train = labels.iloc[train_idx]
    X_val = train_data.iloc[val_idx]
    y_val = labels.iloc[val_idx]

    # Create data loaders
    train_dataset = ToxicDataset(X_train, y_train, tokenizer)
    val_dataset = ToxicDataset(X_val, y_val, tokenizer)

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size)

    # Initialize model and training components
    model = model_class(**model_params).to(device)
    criterion = nn.BCELoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=2)

    # Training tracking variables
    best_val_f1 = 0
    patience = 5
    patience_counter = 0
    train_losses = []
    val_losses = []
    train_f1s = []
    val_f1s = []

    # Training loop
    for epoch in range(epochs):
        # Train for one epoch
        train_loss, train_f1 = train_epoch(model, train_loader, criterion, optimizer,
                                       device, epoch + 1, epochs)
        train_losses.append(train_loss)
        train_f1s.append(train_f1)

        # Evaluate on validation set
        val_loss, val_f1, predictions, true_labels = evaluate(model, val_loader,
                                                          criterion, device)
        val_losses.append(val_loss)
        val_f1s.append(val_f1)

        # Learning rate scheduling
        scheduler.step(val_loss)

        # Print epoch results
        print(f'\nEpoch {epoch + 1}/{epochs}:')
        print(f'Train Loss: {train_loss:.4f}, Train Macro-F1: {train_f1:.4f}')
        print(f'Val Loss: {val_loss:.4f}, Val Macro-F1: {val_f1:.4f}')

        # Check for improvement
        if val_f1 > best_val_f1:
            best_val_f1 = val_f1
            # Save best model
            checkpoint_path = os.path.join(
                checkpoints_dir,
                f'{model_class.__name__.lower()}.pt'
            )
            torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'val_f1': val_f1,
            }, checkpoint_path)
            patience_counter = 0

            # Print detailed metrics
            print('\nClassification Report:')
            print(classification_report(true_labels, predictions,
                                   target_names=toxic_columns))
        else:
            patience_counter += 1

        # Early stopping check
        if patience_counter >= patience:
            print(f'\nEarly stopping triggered after {epoch + 1} epochs')
            break

    # Plot learning curves
    plot_path = os.path.join(
        plots_dir,
        f"{model_class.__name__}_learning_curves.png"
    )
    plt.figure(figsize=(12, 5))

    # Plot loss curves
    plt.subplot(1, 2, 1)
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.title(f'{model_class.__name__} Learning Curves - Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()

    # Plot F1 score curves
    plt.subplot(1, 2, 2)
    plt.plot(train_f1s, label='Train F1')
    plt.plot(val_f1s, label='Validation F1')
    plt.title(f'{model_class.__name__} Learning Curves - F1 Score')
    plt.xlabel('Epoch')
    plt.ylabel('Macro F1')
    plt.legend()

    plt.tight_layout()
    plt.savefig(plot_path)
    plt.close()

    return best_val_f1

In [6]:
def train_bert(train_texts, train_labels):
    """Finetune BERT model and save to output directory."""

    # Create BERT-specific directory
    bert_dir = os.path.join(base_output_dir, 'BERT')
    os.makedirs(bert_dir, exist_ok=True)
    checkpoints_dir = os.path.join(bert_dir, 'checkpoints')
    os.makedirs(checkpoints_dir, exist_ok=True)
    plots_dir = os.path.join(bert_dir, 'learning_curves')
    os.makedirs(plots_dir, exist_ok=True)
    error_analysis_dir = os.path.join(bert_dir, 'error_analysis')
    os.makedirs(error_analysis_dir, exist_ok=True)

    # Split data 80/20 validation
    train_size = int(0.8 * len(train_texts))
    train_idx = list(range(train_size))
    val_idx = list(range(train_size, len(train_texts)))

    # Prepare data
    X_train = train_texts.iloc[train_idx]
    y_train = train_labels.iloc[train_idx]
    X_val = train_texts.iloc[val_idx]
    y_val = train_labels.iloc[val_idx]

    # Create data loaders
    train_dataset = ToxicDataset(X_train, y_train, tokenizer)
    val_dataset = ToxicDataset(X_val, y_val, tokenizer)

    train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=16)

    # Initialize BERT model with custom classification head
    model = AutoModelForSequenceClassification.from_pretrained(
      'distilbert-base-uncased',
      num_labels=len(toxic_columns),
      problem_type="multi_label_classification"
        )

    # Freeze base model layers and only train classification head
    for param in model.base_model.parameters():
      param.requires_grad = False

    for param in model.pre_classifier.parameters():
      param.requires_grad = True
    for param in model.classifier.parameters():
        param.requires_grad = True

    model.to(device)

    # Initialize training components
    optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5, weight_decay=0.01)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=2)
    criterion = nn.BCEWithLogitsLoss()

    # Training tracking variables
    best_val_f1 = 0
    patience = 5
    patience_counter = 0
    num_epochs = 5
    train_losses = []
    val_losses = []
    train_f1s = []
    val_f1s = []

    # Training loop
    for epoch in range(num_epochs):
        # Training phase
        model.train()
        total_loss = 0
        all_train_predictions = []
        all_train_labels = []

        for batch in tqdm(train_loader, desc=f'Epoch {epoch + 1}/{num_epochs} [Train]'):
            # Get batch data
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            # Forward pass
            optimizer.zero_grad()
            outputs = model(input_ids, attention_mask=attention_mask).logits
            loss = criterion(outputs, labels)

            # Backward pass
            loss.backward()
            optimizer.step()

            # Track metrics
            total_loss += loss.item()
            predictions = (torch.sigmoid(outputs.detach().cpu()) > 0.5).float()
            all_train_predictions.extend(predictions.numpy())
            all_train_labels.extend(labels.cpu().numpy())

        # Calculate training metrics
        avg_train_loss = total_loss / len(train_loader)
        train_f1 = f1_score(np.array(all_train_labels), np.array(all_train_predictions), average='macro')
        train_losses.append(avg_train_loss)
        train_f1s.append(train_f1)

        # Validation phase
        model.eval()
        total_val_loss = 0
        all_val_predictions = []
        all_val_labels = []

        with torch.no_grad():
            for batch in tqdm(val_loader, desc='Validation'):
                # Get batch data
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['labels'].to(device)

                # Forward pass
                outputs = model(input_ids, attention_mask=attention_mask).logits
                loss = criterion(outputs, labels)
                total_val_loss += loss.item()

                # Track metrics
                predictions = (torch.sigmoid(outputs.cpu()) > 0.5).float()
                all_val_predictions.extend(predictions.numpy())
                all_val_labels.extend(labels.cpu().numpy())

        # Calculate validation metrics
        avg_val_loss = total_val_loss / len(val_loader)
        val_f1 = f1_score(np.array(all_val_labels), np.array(all_val_predictions), average='macro')
        val_losses.append(avg_val_loss)
        val_f1s.append(val_f1)

        # Print epoch results
        print(f'\nEpoch {epoch + 1}/{num_epochs}:')
        print(f'Train Loss: {avg_train_loss:.4f}, Train Macro-F1: {train_f1:.4f}')
        print(f'Val Loss: {avg_val_loss:.4f}, Val Macro-F1: {val_f1:.4f}')

        # Learning rate scheduling
        scheduler.step(avg_val_loss)

        # Check for improvement
        if val_f1 > best_val_f1:
            best_val_f1 = val_f1
            # Save best model
            checkpoint_path = os.path.join(
                checkpoints_dir,
                'bert_model.pt'
            )
            torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'val_f1': val_f1,
            }, checkpoint_path)
            patience_counter = 0

            # Print detailed metrics
            print('\nClassification Report:')
            print(classification_report(np.array(all_val_labels),
                                   np.array(all_val_predictions),
                                   target_names=toxic_columns))

            # Perform error analysis
            analyze_errors(model, val_loader, toxic_columns, device, model_type='bert')
        else:
            patience_counter += 1

        # Early stopping check
        if patience_counter >= patience:
            print(f'\nEarly stopping triggered after {epoch + 1} epochs')
            break

    # Plot learning curves
    plot_path = os.path.join(
        plots_dir,
        "BERT_learning_curves.png"
    )
    plt.figure(figsize=(12, 5))

    # Plot loss curves
    plt.subplot(1, 2, 1)
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.title('BERT Learning Curves - Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()

    # Plot F1 score curves
    plt.subplot(1, 2, 2)
    plt.plot(train_f1s, label='Train F1')
    plt.plot(val_f1s, label='Validation F1')
    plt.title('BERT Learning Curves - F1 Score')
    plt.xlabel('Epoch')
    plt.ylabel('Macro F1')
    plt.legend()

    plt.tight_layout()
    plt.savefig(plot_path)
    plt.close()

    return best_val_f1



In [7]:
if __name__ == "__main__":
    # Load data
    print("Loading data...")
    train = pd.read_csv("train.csv")
    test = pd.read_csv("test.csv")

    print("Training data shape:", train.shape)
    print("Test data shape:", test.shape)

    # Analyze class distribution
    toxic_columns = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']

    for column in toxic_columns:
        print(f"\nDistribution for {column}:")
        print(train[column].value_counts())
        print(f"Percentage of positive class: {train[column].mean()*100:.2f}%")

    # Device configuration
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

    # Initialize tokenizer
    tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

    # Model parameters
    model_params = {
        'vocab_size': tokenizer.vocab_size,
        'embedding_dim': 300,
        'hidden_dim': 256,
        'output_dim': len(toxic_columns),
        'n_layers': 2,
        'dropout': 0.5
    }

    # Add bidirectional parameter for LSTM
    lstm_params = {**model_params, 'bidirectional': True}



Loading data...
Training data shape: (159571, 8)
Test data shape: (153164, 2)

Distribution for toxic:
toxic
0    144277
1     15294
Name: count, dtype: int64
Percentage of positive class: 9.58%

Distribution for severe_toxic:
severe_toxic
0    157976
1      1595
Name: count, dtype: int64
Percentage of positive class: 1.00%

Distribution for obscene:
obscene
0    151122
1      8449
Name: count, dtype: int64
Percentage of positive class: 5.29%

Distribution for threat:
threat
0    159093
1       478
Name: count, dtype: int64
Percentage of positive class: 0.30%

Distribution for insult:
insult
0    151694
1      7877
Name: count, dtype: int64
Percentage of positive class: 4.94%

Distribution for identity_hate:
identity_hate
0    158166
1      1405
Name: count, dtype: int64
Percentage of positive class: 0.88%
Using device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

LSTM: The LSTM model addresses toxic comment classification with its ability to capture long-range dependencies in text. Its architecture features a bidirectional LSTM with multiple layers, designed for handling sequential data with complex relationships. The training process uses AdamW optimization with learning rate scheduling and monitors both loss and Macro-F1 score for evaluation. Early stopping is implemented to prevent overfitting, while model checkpoints and learning curves aid in performance analysis.

In [None]:
    # Train models
    print("\nTraining LSTM model...")
    lstm_score = train_model(LSTMClassifier, lstm_params,
                         train['comment_text'], train[toxic_columns],
                         tokenizer)



Training LSTM model...


Epoch 1/5 [Train]: 100%|██████████| 3990/3990 [03:47<00:00, 17.51it/s, loss=0.0246]
Validation: 100%|██████████| 998/998 [00:39<00:00, 25.13it/s]



Epoch 1/5:
Train Loss: 0.1289, Train Macro-F1: 0.0840
Val Loss: 0.0967, Val Macro-F1: 0.1963


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.85      0.29      0.44      3037
 severe_toxic       0.00      0.00      0.00       311
      obscene       0.80      0.24      0.37      1669
       threat       0.00      0.00      0.00        92
       insult       0.76      0.24      0.37      1582
identity_hate       0.00      0.00      0.00       305

    micro avg       0.82      0.24      0.37      6996
    macro avg       0.40      0.13      0.20      6996
 weighted avg       0.73      0.24      0.36      6996
  samples avg       0.03      0.02      0.02      6996



Epoch 2/5 [Train]: 100%|██████████| 3990/3990 [03:46<00:00, 17.63it/s, loss=0.0574]
Validation: 100%|██████████| 998/998 [00:39<00:00, 24.98it/s]



Epoch 2/5:
Train Loss: 0.0936, Train Macro-F1: 0.2413
Val Loss: 0.0875, Val Macro-F1: 0.2891


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.94      0.33      0.49      3037
 severe_toxic       0.71      0.08      0.14       311
      obscene       0.87      0.43      0.57      1669
       threat       0.00      0.00      0.00        92
       insult       0.80      0.39      0.52      1582
identity_hate       0.00      0.00      0.00       305

    micro avg       0.88      0.34      0.49      6996
    macro avg       0.55      0.21      0.29      6996
 weighted avg       0.83      0.34      0.48      6996
  samples avg       0.03      0.02      0.03      6996



Epoch 3/5 [Train]: 100%|██████████| 3990/3990 [03:51<00:00, 17.24it/s, loss=0.0976]
Validation: 100%|██████████| 998/998 [00:45<00:00, 21.72it/s]



Epoch 3/5:
Train Loss: 0.0844, Train Macro-F1: 0.3116
Val Loss: 0.0820, Val Macro-F1: 0.3571


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.95      0.41      0.57      3037
 severe_toxic       0.63      0.20      0.31       311
      obscene       0.88      0.54      0.67      1669
       threat       0.00      0.00      0.00        92
       insult       0.79      0.49      0.60      1582
identity_hate       0.00      0.00      0.00       305

    micro avg       0.87      0.42      0.57      6996
    macro avg       0.54      0.27      0.36      6996
 weighted avg       0.83      0.42      0.56      6996
  samples avg       0.04      0.03      0.03      6996



Epoch 4/5 [Train]: 100%|██████████| 3990/3990 [03:58<00:00, 16.70it/s, loss=0.0030]
Validation: 100%|██████████| 998/998 [00:39<00:00, 25.19it/s]



Epoch 4/5:
Train Loss: 0.0769, Train Macro-F1: 0.3476
Val Loss: 0.0766, Val Macro-F1: 0.3836


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.94      0.47      0.63      3037
 severe_toxic       0.64      0.18      0.28       311
      obscene       0.84      0.65      0.73      1669
       threat       0.00      0.00      0.00        92
       insult       0.76      0.59      0.66      1582
identity_hate       0.00      0.00      0.00       305

    micro avg       0.85      0.50      0.63      6996
    macro avg       0.53      0.31      0.38      6996
 weighted avg       0.81      0.50      0.61      6996
  samples avg       0.04      0.04      0.04      6996



Epoch 5/5 [Train]: 100%|██████████| 3990/3990 [03:48<00:00, 17.46it/s, loss=0.0860]
Validation: 100%|██████████| 998/998 [00:40<00:00, 24.75it/s]



Epoch 5/5:
Train Loss: 0.0713, Train Macro-F1: 0.3688
Val Loss: 0.0698, Val Macro-F1: 0.4225

Classification Report:
               precision    recall  f1-score   support

        toxic       0.92      0.53      0.67      3037
 severe_toxic       0.60      0.31      0.41       311
      obscene       0.81      0.72      0.76      1669
       threat       0.00      0.00      0.00        92
       insult       0.72      0.65      0.69      1582
identity_hate       0.00      0.00      0.00       305

    micro avg       0.82      0.56      0.67      6996
    macro avg       0.51      0.37      0.42      6996
 weighted avg       0.78      0.56      0.65      6996
  samples avg       0.04      0.05      0.04      6996



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


GRU: The model leverages GRU cells for better gradient flow and incorporates bidirectional processing for enhanced context understanding. It captures contextual information effectively through bidirectional processing and employs techniques like layer normalization and attention mechanisms. AdamW optimization with learning rate scheduling guides the training, with a focus on minimizing loss and maximizing Macro-F1 score. Early stopping helps to avoid overfitting, and model checkpoints and learning curves provide insights into performance

In [None]:
    print("\nTraining GRU model...")
    gru_score = train_model(GRUClassifier, model_params,
                        train['comment_text'], train[toxic_columns],
                        tokenizer)



Training GRU model...


Epoch 1/5 [Train]: 100%|██████████| 3990/3990 [03:48<00:00, 17.43it/s, loss=0.0025]
Validation: 100%|██████████| 998/998 [00:38<00:00, 25.92it/s]



Epoch 1/5:
Train Loss: 0.0807, Train Macro-F1: 0.3390
Val Loss: 0.0504, Val Macro-F1: 0.3822


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.85      0.73      0.79      3037
 severe_toxic       0.50      0.01      0.01       311
      obscene       0.83      0.74      0.79      1669
       threat       0.00      0.00      0.00        92
       insult       0.72      0.70      0.71      1582
identity_hate       0.00      0.00      0.00       305

    micro avg       0.81      0.65      0.72      6996
    macro avg       0.48      0.36      0.38      6996
 weighted avg       0.75      0.65      0.69      6996
  samples avg       0.06      0.06      0.06      6996



Epoch 2/5 [Train]: 100%|██████████| 3990/3990 [03:38<00:00, 18.23it/s, loss=0.0503]
Validation: 100%|██████████| 998/998 [00:40<00:00, 24.91it/s]



Epoch 2/5:
Train Loss: 0.0492, Train Macro-F1: 0.4422
Val Loss: 0.0476, Val Macro-F1: 0.4580


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.88      0.71      0.79      3037
 severe_toxic       0.61      0.23      0.34       311
      obscene       0.85      0.75      0.80      1669
       threat       0.00      0.00      0.00        92
       insult       0.73      0.73      0.73      1582
identity_hate       0.73      0.05      0.10       305

    micro avg       0.83      0.66      0.74      6996
    macro avg       0.63      0.41      0.46      6996
 weighted avg       0.81      0.66      0.72      6996
  samples avg       0.06      0.06      0.06      6996



Epoch 3/5 [Train]: 100%|██████████| 3990/3990 [03:39<00:00, 18.20it/s, loss=0.0023]
Validation: 100%|██████████| 998/998 [00:38<00:00, 25.63it/s]



Epoch 3/5:
Train Loss: 0.0447, Train Macro-F1: 0.4891
Val Loss: 0.0453, Val Macro-F1: 0.4680


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.84      0.77      0.81      3037
 severe_toxic       0.63      0.16      0.26       311
      obscene       0.83      0.79      0.81      1669
       threat       1.00      0.01      0.02        92
       insult       0.79      0.66      0.72      1582
identity_hate       0.76      0.11      0.19       305

    micro avg       0.82      0.69      0.75      6996
    macro avg       0.81      0.42      0.47      6996
 weighted avg       0.82      0.69      0.73      6996
  samples avg       0.07      0.06      0.06      6996



Epoch 4/5 [Train]: 100%|██████████| 3990/3990 [03:39<00:00, 18.17it/s, loss=0.0002]
Validation: 100%|██████████| 998/998 [00:38<00:00, 25.73it/s]



Epoch 4/5:
Train Loss: 0.0418, Train Macro-F1: 0.5257
Val Loss: 0.0476, Val Macro-F1: 0.5210


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Classification Report:
               precision    recall  f1-score   support

        toxic       0.86      0.74      0.80      3037
 severe_toxic       0.52      0.42      0.47       311
      obscene       0.84      0.78      0.81      1669
       threat       0.80      0.04      0.08        92
       insult       0.77      0.68      0.72      1582
identity_hate       0.79      0.15      0.25       305

    micro avg       0.82      0.69      0.75      6996
    macro avg       0.76      0.47      0.52      6996
 weighted avg       0.81      0.69      0.74      6996
  samples avg       0.06      0.06      0.06      6996



Epoch 5/5 [Train]: 100%|██████████| 3990/3990 [03:39<00:00, 18.20it/s, loss=0.0009]
Validation: 100%|██████████| 998/998 [00:38<00:00, 25.79it/s]



Epoch 5/5:
Train Loss: 0.0390, Train Macro-F1: 0.5689
Val Loss: 0.0503, Val Macro-F1: 0.5331

Classification Report:
               precision    recall  f1-score   support

        toxic       0.89      0.71      0.79      3037
 severe_toxic       0.64      0.16      0.25       311
      obscene       0.88      0.74      0.80      1669
       threat       0.82      0.15      0.26        92
       insult       0.78      0.67      0.72      1582
identity_hate       0.69      0.26      0.38       305

    micro avg       0.85      0.65      0.74      6996
    macro avg       0.78      0.45      0.53      6996
 weighted avg       0.84      0.65      0.73      6996
  samples avg       0.06      0.06      0.06      6996



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Distilbert: The process involves freezing initial model layers, adapting the final classification head, and utilizing AdamW optimization with learning rate scheduling. Performance is tracked with training/validation loss and Macro-F1 score, employing early stopping to prevent overfitting. Detailed error analysis is conducted by saving false positive/negative cases for review. Model checkpoints, learning curves, and error analysis summaries are generated for comprehensive evaluation.

In [None]:
    print("\nFine tuning DistilBERT model...")
    bert_score = train_bert(train['comment_text'], train[toxic_columns])


Fine tuning DistilBERT model...


config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/5 [Train]: 100%|██████████| 7979/7979 [08:33<00:00, 15.55it/s]
Validation: 100%|██████████| 1995/1995 [02:04<00:00, 16.05it/s]



Epoch 1/5:
Train Loss: 0.0888, Train Macro-F1: 0.2626
Val Loss: 0.0652, Val Macro-F1: 0.3479

Classification Report:
               precision    recall  f1-score   support

        toxic       0.85      0.55      0.67      3037
 severe_toxic       0.67      0.11      0.18       311
      obscene       0.82      0.52      0.64      1669
       threat       0.00      0.00      0.00        92
       insult       0.78      0.48      0.59      1582
identity_hate       0.00      0.00      0.00       305

    micro avg       0.82      0.48      0.60      6996
    macro avg       0.52      0.28      0.35      6996
 weighted avg       0.77      0.48      0.59      6996
  samples avg       0.05      0.04      0.04      6996



Analyzing errors: 100%|██████████| 1995/1995 [02:07<00:00, 15.69it/s]
Epoch 2/5 [Train]: 100%|██████████| 7979/7979 [08:23<00:00, 15.85it/s]
Validation: 100%|██████████| 1995/1995 [02:04<00:00, 16.08it/s]



Epoch 2/5:
Train Loss: 0.0642, Train Macro-F1: 0.3666
Val Loss: 0.0618, Val Macro-F1: 0.3750

Classification Report:
               precision    recall  f1-score   support

        toxic       0.85      0.57      0.69      3037
 severe_toxic       0.68      0.15      0.25       311
      obscene       0.81      0.59      0.68      1669
       threat       0.00      0.00      0.00        92
       insult       0.76      0.53      0.62      1582
identity_hate       1.00      0.00      0.01       305

    micro avg       0.81      0.52      0.63      6996
    macro avg       0.68      0.31      0.38      6996
 weighted avg       0.81      0.52      0.61      6996
  samples avg       0.05      0.04      0.05      6996



Analyzing errors: 100%|██████████| 1995/1995 [02:11<00:00, 15.16it/s]
Epoch 3/5 [Train]: 100%|██████████| 7979/7979 [09:13<00:00, 14.41it/s]
Validation: 100%|██████████| 1995/1995 [02:18<00:00, 14.45it/s]



Epoch 3/5:
Train Loss: 0.0619, Train Macro-F1: 0.3783
Val Loss: 0.0599, Val Macro-F1: 0.3966

Classification Report:
               precision    recall  f1-score   support

        toxic       0.82      0.62      0.71      3037
 severe_toxic       0.63      0.20      0.31       311
      obscene       0.80      0.61      0.70      1669
       threat       0.00      0.00      0.00        92
       insult       0.74      0.58      0.65      1582
identity_hate       1.00      0.01      0.02       305

    micro avg       0.79      0.56      0.65      6996
    macro avg       0.67      0.34      0.40      6996
 weighted avg       0.79      0.56      0.63      6996
  samples avg       0.05      0.05      0.05      6996



Analyzing errors: 100%|██████████| 1995/1995 [02:16<00:00, 14.65it/s]
Epoch 4/5 [Train]: 100%|██████████| 7979/7979 [08:37<00:00, 15.43it/s]
Validation: 100%|██████████| 1995/1995 [02:09<00:00, 15.41it/s]



Epoch 4/5:
Train Loss: 0.0608, Train Macro-F1: 0.3860
Val Loss: 0.0589, Val Macro-F1: 0.4069

Classification Report:
               precision    recall  f1-score   support

        toxic       0.83      0.61      0.71      3037
 severe_toxic       0.62      0.24      0.35       311
      obscene       0.83      0.59      0.69      1669
       threat       0.00      0.00      0.00        92
       insult       0.77      0.55      0.64      1582
identity_hate       0.90      0.03      0.06       305

    micro avg       0.81      0.54      0.65      6996
    macro avg       0.66      0.34      0.41      6996
 weighted avg       0.80      0.54      0.63      6996
  samples avg       0.05      0.05      0.05      6996



Analyzing errors: 100%|██████████| 1995/1995 [02:06<00:00, 15.74it/s]
Epoch 5/5 [Train]: 100%|██████████| 7979/7979 [08:28<00:00, 15.71it/s]
Validation: 100%|██████████| 1995/1995 [02:05<00:00, 15.95it/s]



Epoch 5/5:
Train Loss: 0.0601, Train Macro-F1: 0.4013
Val Loss: 0.0584, Val Macro-F1: 0.4038


We can see the report and macro f1 values for each model. Models, validation/train graphs are also saved in model specific folders.

**Test and cross validation**

In [8]:
# Load models
lstm_model_path = '/content/drive/My Drive/project_toxic_classifier/LSTMClassifier/checkpoints/lstmclassifier.pt'
gru_model_path = '/content/drive/My Drive/project_toxic_classifier/GRUClassifier/checkpoints/gruclassifier.pt'
bert_model_path = '/content/drive/My Drive/project_toxic_classifier/BERT/checkpoints/bert_model.pt'

# Remove 'bidirectional' for GRU as it's already handled in the GRUClassifier
gru_params = {k: v for k, v in model_params.items() if k != 'bidirectional'}

# Initialize models
lstm_model = LSTMClassifier(**lstm_params).to(device)
gru_model = GRUClassifier(**gru_params).to(device)  # Use gru_params
bert_model = AutoModelForSequenceClassification.from_pretrained(
    'distilbert-base-uncased', num_labels=6, problem_type="multi_label_classification"
).to(device)

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [9]:
# Load saved model states
lstm_model.load_state_dict(torch.load(lstm_model_path)['model_state_dict'])
gru_model.load_state_dict(torch.load(gru_model_path)['model_state_dict'])
bert_model.load_state_dict(torch.load(bert_model_path)['model_state_dict'])


  lstm_model.load_state_dict(torch.load(lstm_model_path)['model_state_dict'])
  gru_model.load_state_dict(torch.load(gru_model_path)['model_state_dict'])
  bert_model.load_state_dict(torch.load(bert_model_path)['model_state_dict'])


<All keys matched successfully>

In [10]:
class ToxicPredictionDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len=128):
        self.texts = texts.reset_index(drop=True)
        self.labels = labels.reset_index(drop=True) if labels is not None else None
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts.iloc[idx])

        if self.labels is not None:
            label = torch.tensor(self.labels.iloc[idx].values, dtype=torch.float)
        else:
            label = torch.zeros(6)  #  6 labels

        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_len,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': label,
            'text': text
        }

In [None]:
def predict(model, test_loader, device, threshold=0.6):
    """
    Makes predictions where each text is assigned to exactly one category
    if the confidence exceeds the threshold.
    """
    model.eval()
    all_predictions = []
    all_ids = []

    with torch.no_grad():
        for batch in tqdm(test_loader):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)

            # Handle different model architectures
            if isinstance(model, (LSTMClassifier, GRUClassifier)):
                outputs = model(input_ids, attention_mask)
            else:  # BERT
                outputs = model(input_ids, attention_mask=attention_mask).logits

            # Get probabilities
            probabilities = torch.sigmoid(outputs.cpu()).numpy()

            # Initialize predictions array
            batch_predictions = np.zeros_like(probabilities, dtype=int)

            # For each example, find highest probability class
            for i in range(len(probabilities)):
                max_prob_idx = np.argmax(probabilities[i])
                # Only assign class if probability exceeds threshold
                if probabilities[i][max_prob_idx] > threshold:
                    batch_predictions[i][max_prob_idx] = 1

            all_predictions.extend(batch_predictions)

    return np.array(all_predictions)

# Main evaluation code
if __name__ == "__main__":
    # Device configuration
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

    # Load test data
    test_data = pd.read_csv("test.csv")
    print(f"Test data shape: {test_data.shape}")

    # Create test dataset
    test_dataset = ToxicPredictionDataset(
        test_data['comment_text'],
        None,  # No labels for test set
        tokenizer
    )
    test_loader = DataLoader(test_dataset, batch_size=32)

    # Evaluate each model
    for model_name, model in [('LSTM', lstm_model), ('GRU', gru_model), ('BERT', bert_model)]:
        print(f"\nMaking predictions with {model_name}...")

        # Get predictions
        predictions = predict(model, test_loader, device)

        # Create prediction DataFrame
        prediction_df = pd.DataFrame(predictions, columns=toxic_columns)
        prediction_df['id'] = test_data['id']
        prediction_df['comment_text'] = test_data['comment_text']
        prediction_df = prediction_df[['id','comment_text'] + toxic_columns]

        # Save predictions
        predictions2_dir = os.path.join(base_output_dir, 'Predictions')
        os.makedirs(predictions2_dir, exist_ok=True)
        output_path = os.path.join(predictions2_dir, f'{model_name}_predictions.csv')
        prediction_df.to_csv(output_path, index=False)
        print(f"Saved predictions to {output_path}")

        # Print some statistics
        print("\nPrediction statistics:")
        for col in toxic_columns:
            pos_count = prediction_df[col].sum()
            print(f"{col}: {pos_count} positive predictions ({pos_count/len(prediction_df)*100:.2f}%)")

Using device: cuda
Test data shape: (153164, 2)

Making predictions with LSTM...


100%|██████████| 4787/4787 [02:37<00:00, 30.44it/s]


Saved predictions to /content/drive/My Drive/project_toxic_classifier/Predictions/LSTM_predictions.csv

Prediction statistics:
toxic: 19202 positive predictions (12.54%)
severe_toxic: 0 positive predictions (0.00%)
obscene: 0 positive predictions (0.00%)
threat: 0 positive predictions (0.00%)
insult: 0 positive predictions (0.00%)
identity_hate: 0 positive predictions (0.00%)

Making predictions with GRU...


100%|██████████| 4787/4787 [02:43<00:00, 29.27it/s]


Saved predictions to /content/drive/My Drive/project_toxic_classifier/Predictions/GRU_predictions.csv

Prediction statistics:
toxic: 28745 positive predictions (18.77%)
severe_toxic: 0 positive predictions (0.00%)
obscene: 1760 positive predictions (1.15%)
threat: 30 positive predictions (0.02%)
insult: 3 positive predictions (0.00%)
identity_hate: 35 positive predictions (0.02%)

Making predictions with BERT...


100%|██████████| 4787/4787 [08:53<00:00,  8.96it/s]


Saved predictions to /content/drive/My Drive/project_toxic_classifier/Predictions/BERT_predictions.csv

Prediction statistics:
toxic: 18852 positive predictions (12.31%)
severe_toxic: 0 positive predictions (0.00%)
obscene: 62 positive predictions (0.04%)
threat: 0 positive predictions (0.00%)
insult: 0 positive predictions (0.00%)
identity_hate: 3 positive predictions (0.00%)


Due to the imbalanced dataset, where 'toxic' labels are prevalent, models tend to predict accordingly.

The LSTM model only predicts the "toxic" class (12.54%) and nothing else

The GRU model shows slightly better distribution

BERT also shows similar issues with heavy bias toward the "toxic" class

The main reason is about training data which is heavily imbalanced.

Class Distribution on train data:

toxic: 9.58% positive

severe_toxic: 1.00% positive

obscene: 5.29% positive

threat: 0.30% positive

insult: 4.94% positive

identity_hate: ~0.88% positive

Predictions can be found in Predictions folder for each model.



**Error analysis**

The analyze_errors function evaluates the performance of a trained model on a validation dataset, identifying and documenting specific instances of misclassification for all models individually in corresponding "error_analysis.txt" files.




In [4]:
import os
import pandas as pd
import numpy as np
from collections import defaultdict
from tqdm import tqdm
import torch
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, accuracy_score
from torch.utils.data import DataLoader

In [None]:
train_size = int(0.8 * len(train))
X_val = train['comment_text'].iloc[train_size:]
y_val = train[toxic_columns].iloc[train_size:]


val_dataset = ToxicDataset(X_val, y_val, tokenizer)

val_loader = DataLoader(val_dataset, batch_size=32)

In [None]:

def analyze_errors(model, val_loader, toxic_columns, device, model_type):
    """
    Enhanced error analysis for all model types (BERT, LSTM, GRU) with improved output handling.

    Args:
        model: The trained model (LSTM, GRU, or BERT)
        val_loader: DataLoader for the validation data
        toxic_columns: List of toxic category names
        device: Device to run on (cuda/cpu)
        model_type: Type of model ('lstm', 'gru', or 'bert')

    Returns:
        dict: Category-wise metrics
    """
    model.eval()
    errors = {
        'false_positives': defaultdict(list),
        'false_negatives': defaultdict(list)
    }
    all_true_labels = []
    all_predictions = []

    with torch.no_grad():
        for batch in tqdm(val_loader, desc=f"Analyzing errors for {model_type}"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].cpu().numpy()
            texts = batch['text']

            # Model forward pass
            outputs = model(input_ids, attention_mask)

            # Handle different model outputs
            if isinstance(outputs, torch.Tensor):  # LSTM/GRU
                raw_output = outputs.cpu()
                if raw_output.min() < 0 or raw_output.max() > 1:
                    predictions = (torch.sigmoid(raw_output) > 0.5).int().numpy()
                else:
                    predictions = (raw_output > 0.5).int().numpy()
            else:  # BERT
                raw_output = outputs.logits.cpu()
                predictions = (torch.sigmoid(raw_output) > 0.5).int().numpy()

            all_true_labels.extend(labels)
            all_predictions.extend(predictions)

            # Error analysis
            for idx, (pred, true, text) in enumerate(zip(predictions, labels, texts)):
                for col_idx, col in enumerate(toxic_columns):
                    error_info = {
                        'text': text,
                        'predicted': pred.tolist(),
                        'true': true.tolist()
                    }

                    if pred[col_idx] == 1 and true[col_idx] == 0:
                        errors['false_positives'][col].append(error_info)
                    elif pred[col_idx] == 0 and true[col_idx] == 1:
                        errors['false_negatives'][col].append(error_info)

    # Calculate metrics
    all_true_labels = np.array(all_true_labels)
    all_predictions = np.array(all_predictions)

    category_metrics = {}
    for col_idx, col in enumerate(toxic_columns):
        true_labels_col = all_true_labels[:, col_idx]
        predictions_col = all_predictions[:, col_idx]

        tn, fp, fn, tp = confusion_matrix(true_labels_col, predictions_col).ravel()
        precision = precision_score(true_labels_col, predictions_col, zero_division=0)
        recall = recall_score(true_labels_col, predictions_col, zero_division=0)
        f1 = f1_score(true_labels_col, predictions_col, zero_division=0)

        category_metrics[col] = {
            'TP': tp,
            'TN': tn,
            'FP': fp,
            'FN': fn,
            'Precision': precision,
            'Recall': recall,
            'F1-Score': f1
        }

    # Save results
    model_folder_name = model.__class__.__name__
    output_dir = os.path.join(base_output_dir, model_folder_name, 'error_analysis')
    os.makedirs(output_dir, exist_ok=True)
    output_filename = os.path.join(output_dir, f'{model_type}_analysis.txt')

    # Generate summary
    summary = [f"=== Error Analysis for {model_type} ===\n"]

    for category in toxic_columns:
        fp = errors['false_positives'][category]
        fn = errors['false_negatives'][category]
        metrics = category_metrics[category]

        summary.append(f"\n{category.upper()}:")
        summary.append(f"Total false positives: {len(fp)} ({metrics['FP']})")
        summary.append(f"Total false negatives: {len(fn)} ({metrics['FN']})")
        summary.append(f"True positives: {metrics['TP']}")
        summary.append(f"True negatives: {metrics['TN']}")
        summary.append(f"Precision: {metrics['Precision']:.4f}")
        summary.append(f"Recall: {metrics['Recall']:.4f}")
        summary.append(f"F1-Score: {metrics['F1-Score']:.4f}")

        if fp:
            summary.append("\nFalse Positive Examples (up to 3):")
            for example in fp[:3]:
                summary.append(f"- Text: {example['text'][:100]}...")

        if fn:
            summary.append("\nFalse Negative Examples (up to 3):")
            for example in fn[:3]:
                summary.append(f"- Text: {example['text'][:100]}...")

    with open(output_filename, 'w') as f:
        f.write("\n".join(summary))

    return category_metrics

In [None]:
# LSTM:
analyze_errors(lstm_model, val_loader, toxic_columns, device, model_type='lstm')

# GRU:
analyze_errors(gru_model, val_loader, toxic_columns, device, model_type='gru')

# BERT:
analyze_errors(bert_model, val_loader, toxic_columns, device, model_type='bert')

Analyzing errors for lstm: 100%|██████████| 998/998 [00:50<00:00, 19.65it/s]
Analyzing errors for gru: 100%|██████████| 998/998 [00:50<00:00, 19.93it/s]
Analyzing errors for bert: 100%|██████████| 998/998 [01:58<00:00,  8.45it/s]


{'toxic': {'TP': 1866,
  'TN': 28491,
  'FP': 387,
  'FN': 1171,
  'Precision': 0.8282290279627164,
  'Recall': 0.6144221270991109,
  'F1-Score': 0.7054820415879017},
 'severe_toxic': {'TP': 75,
  'TN': 31558,
  'FP': 46,
  'FN': 236,
  'Precision': 0.6198347107438017,
  'Recall': 0.24115755627009647,
  'F1-Score': 0.3472222222222222},
 'obscene': {'TP': 980,
  'TN': 30049,
  'FP': 197,
  'FN': 689,
  'Precision': 0.832625318606627,
  'Recall': 0.5871779508687837,
  'F1-Score': 0.6886858749121574},
 'threat': {'TP': 0,
  'TN': 31823,
  'FP': 0,
  'FN': 92,
  'Precision': 0.0,
  'Recall': 0.0,
  'F1-Score': 0.0},
 'insult': {'TP': 874,
  'TN': 30071,
  'FP': 262,
  'FN': 708,
  'Precision': 0.7693661971830986,
  'Recall': 0.5524652338811631,
  'F1-Score': 0.6431199411331862},
 'identity_hate': {'TP': 9,
  'TN': 31609,
  'FP': 1,
  'FN': 296,
  'Precision': 0.9,
  'Recall': 0.029508196721311476,
  'F1-Score': 0.05714285714285714}}

**Annotations**

10 comments are annotated and then classified by the three models.

In [11]:
# Sample data
data = {
    'comment_text': [
        "Yo bitch Ja Rule is more succesful then you'll ever be whats up with you and hating you sad motherfuc...",
        "== From RfC == \n\n The title is fine as it is, IMO.",
        "COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK",
        "Bye! \n\nDon't look, we're not your friends anymore.",
        "REDIRECT Talk:Voydan Pop Georgiev- Chernomorets",
        "I hope you die a slow and painful death, you worthless...",
        "Don't argue with me or I will kill you!",
        "FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!",
        "Your kind is ruining this country, you worthless gay!",
        "You are a stupid idiot.",
    ],
    'toxic': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    'severe_toxic': [0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    'obscene': [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
    'threat': [0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    'insult': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    'identity_hate': [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
}

# Create a Pandas DataFrame
df = pd.DataFrame(data)

# Save the DataFrame to a CSV file
df.to_csv('annotated_test_10.csv', index=False)

print("CSV file 'annotated_test_10.csv' created successfully.")

CSV file 'annotated_test_10.csv' created successfully.


In [22]:
annotated_data = pd.read_csv('annotated_test_10.csv')
display(annotated_data)


Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,Yo bitch Ja Rule is more succesful then you'll...,1,0,0,0,0,0
1,== From RfC == \n\n The title is fine as it is...,0,0,0,0,0,0
2,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,0,0,1,0,0,0
3,"Bye! \n\nDon't look, we're not your friends an...",0,0,0,0,0,0
4,REDIRECT Talk:Voydan Pop Georgiev- Chernomorets,0,0,0,0,0,0
5,"I hope you die a slow and painful death, you w...",0,1,0,0,0,0
6,Don't argue with me or I will kill you!,0,0,0,1,0,0
7,"FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!",0,0,0,0,0,0
8,"Your kind is ruining this country, you worthle...",0,0,0,0,0,1
9,You are a stupid idiot.,0,0,0,0,1,0


In [25]:
# Create dataset for prediction
annotated_dataset = ToxicPredictionDataset(
    annotated_data['comment_text'],
    None,  # No need to provide labels for prediction
    tokenizer
)
annotated_loader = DataLoader(annotated_dataset, batch_size=32)

# Function to make predictions with single category assignment
def predict(model, test_loader, device, threshold=0.6):
    """
    Makes predictions where each text is assigned to exactly one category
    if the confidence exceeds the threshold, and outputs in the format
    of annotated_test.csv.
    """
    model.eval()
    all_predictions = []
    all_comment_texts = []  # To store comment texts

    with torch.no_grad():
        for batch in tqdm(test_loader):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            comment_texts = batch['text']  # Get comment texts from the batch

            # Handle different model architectures
            if isinstance(model, (LSTMClassifier, GRUClassifier)):
                outputs = model(input_ids, attention_mask)
            else:  # BERT
                outputs = model(input_ids, attention_mask=attention_mask).logits

            # Get probabilities
            probabilities = torch.sigmoid(outputs.cpu()).numpy()

            # Initialize predictions array
            batch_predictions = np.zeros_like(probabilities, dtype=int)

            # For each example, find highest probability class
            for i in range(len(probabilities)):
                max_prob_idx = np.argmax(probabilities[i])
                # Only assign class if probability exceeds threshold
                if probabilities[i][max_prob_idx] > threshold:
                    batch_predictions[i][max_prob_idx] = 1
                else:
                    # If no class exceeds the threshold, assign all to 0
                    batch_predictions[i] = np.zeros(len(toxic_columns), dtype=int)

            all_predictions.extend(batch_predictions)
            all_comment_texts.extend(comment_texts)

    # Create prediction DataFrame with comment_text first
    results = {
        'comment_text': all_comment_texts
    }
    # Add prediction columns
    for i, col in enumerate(toxic_columns):
        results[col] = [pred[i] for pred in all_predictions]

    prediction_df = pd.DataFrame(results)

    # Ensure only one category prediction per row (if any)
    for index, row in prediction_df.iterrows():
        if row[toxic_columns].sum() > 1:
            # Get probabilities for this row
            row_probs = probabilities[index]
            max_prob_col = toxic_columns[np.argmax(row_probs)]
            prediction_df.loc[index, toxic_columns] = 0
            prediction_df.loc[index, max_prob_col] = 1

    return prediction_df

# Make predictions for each model
models = {'LSTM': lstm_model, 'GRU': gru_model, 'BERT': bert_model}
for model_name, model in models.items():
    print(f"\nMaking predictions with {model_name}...")
    predictions = predict(model, annotated_loader, device)

    # Display predictions (comment_text will already be first)
    print(f"\n{model_name} Predictions:")
    display(predictions)


Making predictions with LSTM...


100%|██████████| 1/1 [00:00<00:00, 44.92it/s]


LSTM Predictions:





Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,Yo bitch Ja Rule is more succesful then you'll...,1,0,0,0,0,0
1,== From RfC == \n\n The title is fine as it is...,0,0,0,0,0,0
2,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,1,0,0,0,0,0
3,"Bye! \n\nDon't look, we're not your friends an...",0,0,0,0,0,0
4,REDIRECT Talk:Voydan Pop Georgiev- Chernomorets,0,0,0,0,0,0
5,"I hope you die a slow and painful death, you w...",1,0,0,0,0,0
6,Don't argue with me or I will kill you!,1,0,0,0,0,0
7,"FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!",1,0,0,0,0,0
8,"Your kind is ruining this country, you worthle...",1,0,0,0,0,0
9,You are a stupid idiot.,1,0,0,0,0,0



Making predictions with GRU...


100%|██████████| 1/1 [00:00<00:00, 60.29it/s]


GRU Predictions:





Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,Yo bitch Ja Rule is more succesful then you'll...,1,0,0,0,0,0
1,== From RfC == \n\n The title is fine as it is...,0,0,0,0,0,0
2,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,1,0,0,0,0,0
3,"Bye! \n\nDon't look, we're not your friends an...",0,0,0,0,0,0
4,REDIRECT Talk:Voydan Pop Georgiev- Chernomorets,0,0,0,0,0,0
5,"I hope you die a slow and painful death, you w...",1,0,0,0,0,0
6,Don't argue with me or I will kill you!,0,0,0,1,0,0
7,"FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!",1,0,0,0,0,0
8,"Your kind is ruining this country, you worthle...",1,0,0,0,0,0
9,You are a stupid idiot.,1,0,0,0,0,0



Making predictions with BERT...


100%|██████████| 1/1 [00:00<00:00, 18.46it/s]



BERT Predictions:


Unnamed: 0,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,Yo bitch Ja Rule is more succesful then you'll...,1,0,0,0,0,0
1,== From RfC == \n\n The title is fine as it is...,0,0,0,0,0,0
2,COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK,1,0,0,0,0,0
3,"Bye! \n\nDon't look, we're not your friends an...",0,0,0,0,0,0
4,REDIRECT Talk:Voydan Pop Georgiev- Chernomorets,0,0,0,0,0,0
5,"I hope you die a slow and painful death, you w...",1,0,0,0,0,0
6,Don't argue with me or I will kill you!,1,0,0,0,0,0
7,"FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!",1,0,0,0,0,0
8,"Your kind is ruining this country, you worthle...",1,0,0,0,0,0
9,You are a stupid idiot.,1,0,0,0,0,0
