# Toxic Comment Detector

This repository contains the final project for CSCI 4050U – Machine Learning.
The goal is to detect toxic online comments and assign one or more toxicity labels to each comment.

Given a raw text comment, the models predict six binary labels:

- `toxic`
- `severe_toxic`
- `obscene`
- `threat`
- `insult`
- `identity_hate`

We compare a classical machine-learning baseline to two neural networks, then use the best model in an interactive demo script.

# Project Overview

Online platforms are filled with user-generated comments, some of which can be toxic or harmful.
This project builds and compares several models for multi-label toxic comment classification:

- A classical baseline using bag-of-words style features (TF-IDF) + Logistic Regression.
- A simple neural network with word embeddings and an MLP classifier.
- A Bi-LSTM neural network that captures word order and context.

Each model outputs six binary labels for a given comment, allowing a single comment to be tagged with multiple toxicity types at once.



# Dataset

This project is designed around the Kaggle “Toxic Comment Classification Challenge” style dataset:

- Each row: a user comment (text) + up to 6 binary labels.
- Labels match the six classes listed above.
- Train/validation/test splits are created from the original training data.

# Models

We compare three main approaches:

1. Model 0 – Baseline (TF-IDF + Logistic Regression)
    - Vectorizes text into TF-IDF features.
    - Trains a One-vs-Rest Logistic Regression classifier for each label.
    - Fast to train and serves as a strong, interpretable baseline. 
<p></p>
2. Model 1 – Embedding + Average Pooling + MLP
    - Uses a learned embedding layer to map tokens to dense vectors.
    - Averages token embeddings over the sequence.
    - Feeds the pooled vector into a small MLP (fully-connected network) for prediction.
    - Captures more semantic information than TF-IDF while staying simple and efficient.
<p></p>
3. Model 2 – Embedding + Bi-LSTM (Best Model)
    - Uses embeddings followed by a Bi-LSTM to capture word order and context from both directions.
    - The final hidden states (or a pooled representation) go through a fully-connected layer for multi-label classification.
    - Typically achieves the best performance among the three.

# Model 0 - Baseline (TF-IDF + Logistic Regression)

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import f1_score, classification_report

df = pd.read_csv("data/raw/train.csv")

label_columns = [
    "toxic",
    "severe_toxic",
    "obscene",
    "threat",
    "insult",
    "identity_hate",
]

texts = df["comment_text"].fillna("").tolist()
y = df[label_columns].values

X_train_texts, X_val_texts, y_train, y_val = train_test_split(
    texts,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y[:, 0],
)

vectorizer = TfidfVectorizer(
    max_features=100_000,      
    ngram_range=(1, 2),
    stop_words="english",
    lowercase=True,
)

X_train = vectorizer.fit_transform(X_train_texts)  
X_val = vectorizer.transform(X_val_texts)

base_clf = LogisticRegression(
    solver="liblinear",
    max_iter=1000,
    class_weight="balanced",
)

clf = OneVsRestClassifier(base_clf)

clf.fit(X_train, y_train)

y_proba = clf.predict_proba(X_val)
y_pred = (y_proba >= 0.5).astype(int)

print("F1 micro:", f1_score(y_val, y_pred, average="micro"))
print("F1 macro:", f1_score(y_val, y_pred, average="macro"))
print(classification_report(y_val, y_pred, target_names=label_columns))

# Model 1 – Embedding + Average Pooling + MLP

In [None]:
import os
import re
import collections
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report

DATA_CSV_PATH = os.path.join("data", "raw", "train.csv")

MAX_VOCAB_SIZE = 20000
MAX_LEN = 100
EMBED_DIM = 128
HIDDEN_DIM = 128
BATCH_SIZE = 64
EPOCHS = 5
LEARNING_RATE = 1e-3

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

LABEL_COLUMNS = [
    "toxic",
    "severe_toxic",
    "obscene",
    "threat",
    "insult",
    "identity_hate",
]


if not os.path.exists(DATA_CSV_PATH):
    raise FileNotFoundError(
        f"Could not find {DATA_CSV_PATH}. "
        f"Make sure train.csv is at data/raw/train.csv"
    )

df = pd.read_csv(DATA_CSV_PATH)

texts = df["comment_text"].fillna("").tolist()
y = df[LABEL_COLUMNS].values.astype(np.float32)

print("Total samples:", len(texts))
print("Label matrix shape:", y.shape)

X_train_texts, X_val_texts, y_train, y_val = train_test_split(
    texts,
    y,
    test_size = 0.2,
    random_state = 42,
    stratify = y[:,0],
)

print("Train samples:", len(X_train_texts))
print("Val samples:", len(X_val_texts))


def simple_tokenize(text):
    """
    Very simple whitespace + punctuation tokenizer.
    Lowercases and splits on non-letter characters.
    """
    text = text.lower()
    # replace non-letters with space
    text = re.sub(r"[^a-z]+", " ", text)
    tokens = text.strip().split()
    return tokens

counter = collections.Counter()
for t in X_train_texts:
    counter.update(simple_tokenize(t))

most_common = counter.most_common(MAX_VOCAB_SIZE - 2)
itos = ["<pad>", "<unk>"] + [w  for (w,_) in most_common]
stoi = {w : i for i, w in enumerate(itos)}

PAD_IDX = stoi["<pad>"]
UNK_IDX = stoi["<unk>"]

vocab_size = len(itos)
print("Vocab size:", vocab_size)


def encode_text(text, max_len=MAX_LEN):
    """
    Convert raw text to a fixed-length list of token IDs.
    Unknown words -> UNK_IDX, pad/truncate to max_len.
    """
    tokens = simple_tokenize(text)
    ids = [stoi.get(tok, UNK_IDX) for tok in tokens]

    if len(ids) < max_len:
        ids = ids + [PAD_IDX] * (max_len - len(ids))
    else:
        ids = ids[:max_len]

    return np.array(ids, dtype = np.int64)


class ToxicCommentsDataset(Dataset):
    def __init__(self, texts, labels, max_len=MAX_LEN):
        self.texts = texts
        self.labels = labels
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        input_ids = encode_text(self.texts[idx], self.max_len)
        label_vec = self.labels[idx]
        return torch.tensor(input_ids, dtype=torch.long), \
               torch.tensor(label_vec, dtype=torch.float32)
    
train_ds = ToxicCommentsDataset(X_train_texts, y_train, max_len = MAX_LEN)
val_ds = ToxicCommentsDataset(X_val_texts, y_val, max_len = MAX_LEN)

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle = True)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE)

print("Batches (train):", len(train_loader))
print("Batches (val):", len(val_loader))

class ToxicAvgEmbModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_labels, pad_idx=0):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
        self.fc1 = nn.Linear(embed_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(hidden_dim, num_labels)

    def forward(self, input_ids):
        emb = self.embedding(input_ids)
        avg_emb = emb.mean(dim = 1)
        x = self.fc1(avg_emb)
        x = self.relu(x)
        x = self.dropout(x)
        logits = self.fc2(x)
        return logits
    
model = ToxicAvgEmbModel(
    vocab_size=vocab_size,
    embed_dim=EMBED_DIM,
    hidden_dim=HIDDEN_DIM,
    num_labels=len(LABEL_COLUMNS),
    pad_idx=PAD_IDX,
).to(DEVICE)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)

def train_one_epoch(model, loader, optimizer, criterion, device):
    model.train()
    total_loss = 0.0

    for input_ids, labels in loader:
        input_ids = input_ids.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        logits = model(input_ids)
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * input_ids.size(0)
    
    avg_loss = total_loss / len(loader.dataset)
    return avg_loss

def evaluate(model, loader, criterion, device, threshold = 0.5):
    model.eval()
    total_loss  = 0.0
    all_true = []
    all_pred = []

    with torch.no_grad():
        for input_ids, labels in loader:
            input_ids = input_ids.to(device)
            labels = labels.to(device)

            logits = model(input_ids)
            loss = criterion(logits, labels)
            total_loss += loss.item() * input_ids.size(0)

            probs = torch.sigmoid(logits).cpu().numpy()
            all_pred.append(probs)
            all_true.append(labels.cpu().numpy())

    avg_loss = total_loss / len(loader.dataset)

    y_true = np.vstack(all_true)
    y_scores = np.vstack(all_pred)
    y_hat = (y_scores >= threshold).astype(int)

    f1_micro = f1_score(y_true, y_hat, average = "micro", zero_division=0)
    f1_macro = f1_score(y_true, y_hat, average="macro", zero_division=0)

    return avg_loss, f1_micro, f1_macro, y_hat, y_true

best_val_f1 = 0.0

for epoch in range(1,EPOCHS + 1):
    train_loss = train_one_epoch(model,train_loader, optimizer, criterion, DEVICE)
    val_loss, f1_micro, f1_macro, y_true, y_hat = evaluate(
        model, val_loader, criterion, DEVICE 
    )
    print(
        f"Epoch {epoch:02d} | "
        f"train loss: {train_loss:.4f} | "
        f"val loss: {val_loss:.4f} | "
        f"F1 micro: {f1_micro:.4f} | "
        f"F1 macro: {f1_macro:.4f}"
    )
if f1_micro > best_val_f1:
    best_val_f1 = f1_micro
    os.makedirs("saved_models", exist_ok=True)
    save_path = os.path.join("saved_models", " model1_avgemb.pt")
    torch.save(model.state_dict(), save_path)
    print(f"  -> New best model saved to {save_path}")

print("\nClassification report (last epoch):")
print(classification_report(
    y_true, y_hat, target_names=LABEL_COLUMNS, zero_division=0
))

# Model 2 – Embedding + Bi-LSTM

In [None]:
import os
import re
import collections
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report

DATA_CSV_PATH = os.path.join("data", "raw", "train.csv")

MAX_VOCAB_SIZE = 20000
MAX_LEN = 100
EMBED_DIM = 128
HIDDEN_DIM = 128
NUM_LAYERS = 1
BATCH_SIZE = 64
EPOCHS = 5
LEARNING_RATE = 1e-3

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

LABEL_COLUMNS = [
    "toxic",
    "severe_toxic",
    "obscene",
    "threat",
    "insult",
    "identity_hate",
]


if not os.path.exists(DATA_CSV_PATH):
    raise FileNotFoundError(
        f"Could not find {DATA_CSV_PATH}. "
        f"Make sure train.csv is at data/raw/train.csv"
    )

df = pd.read_csv(DATA_CSV_PATH)

texts = df["comment_text"].fillna("").tolist()
y = df[LABEL_COLUMNS].values.astype(np.float32)

print("Total samples:", len(texts))
print("Label matrix shape:", y.shape)

X_train_texts, X_val_texts, y_train, y_val = train_test_split(
    texts,
    y,
    test_size = 0.2,
    random_state = 42,
    stratify = y[:,0],
)

print("Train samples:", len(X_train_texts))
print("Val samples:", len(X_val_texts))


def simple_tokenize(text):
    """
    Very simple whitespace + punctuation tokenizer.
    Lowercases and splits on non-letter characters.
    """
    text = text.lower()
    # replace non-letters with space
    text = re.sub(r"[^a-z]+", " ", text)
    tokens = text.strip().split()
    return tokens

counter = collections.Counter()
for t in X_train_texts:
    counter.update(simple_tokenize(t))

most_common = counter.most_common(MAX_VOCAB_SIZE - 2)
itos = ["<pad>", "<unk>"] + [w  for (w,_) in most_common]
stoi = {w : i for i, w in enumerate(itos)}

PAD_IDX = stoi["<pad>"]
UNK_IDX = stoi["<unk>"]

vocab_size = len(itos)
print("Vocab size:", vocab_size)


def encode_text(text, max_len=MAX_LEN):
    """
    Convert raw text to a fixed-length list of token IDs.
    Unknown words -> UNK_IDX, pad/truncate to max_len.
    """
    tokens = simple_tokenize(text)
    ids = [stoi.get(tok, UNK_IDX) for tok in tokens]

    if len(ids) < max_len:
        ids = ids + [PAD_IDX] * (max_len - len(ids))
    else:
        ids = ids[:max_len]

    return np.array(ids, dtype = np.int64)


class ToxicCommentsDataset(Dataset):
    def __init__(self, texts, labels, max_len=MAX_LEN):
        self.texts = texts
        self.labels = labels
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        input_ids = encode_text(self.texts[idx], self.max_len)
        label_vec = self.labels[idx]
        return torch.tensor(input_ids, dtype=torch.long), \
               torch.tensor(label_vec, dtype=torch.float32)
    
train_ds = ToxicCommentsDataset(X_train_texts, y_train, max_len = MAX_LEN)
val_ds = ToxicCommentsDataset(X_val_texts, y_val, max_len = MAX_LEN)

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle = True)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE)

print("Batches (train):", len(train_loader))
print("Batches (val):", len(val_loader))

class ToxicBiLSTMModel(nn.Module):
    def __init__(
            self,
            vocab_size,
            embed_dim,
            hidden_dim,
            num_layers,
            num_labels,
            pad_idx = 0,

    ):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
        self.lstm = nn.LSTM(
            embed_dim,
            hidden_dim,
            num_layers = num_layers,
            batch_first = True,
            bidirectional = True,
        )
        self.dropout = nn.Dropout(0.5)
        self.fc = nn.Linear(2 * hidden_dim, num_labels)

    def forward(self, input_ids):
        emb = self.embedding(input_ids)
        outputs, (h_n, c_n) = self.lstm(emb)
        h_forward = h_n[-2,:,:]
        h_backward = h_n[-1,:,:]
        h_cat = torch.cat([h_forward, h_backward], dim = 1)
        x = self.dropout(h_cat)
        logits = self.fc(x)
        return logits
    
model = ToxicBiLSTMModel(
    vocab_size = vocab_size,
    embed_dim = EMBED_DIM,
    hidden_dim = HIDDEN_DIM,
    num_layers = NUM_LAYERS,
    num_labels = len(LABEL_COLUMNS),
    pad_idx = PAD_IDX, 
).to(DEVICE)

criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)

def train_one_epoch(model, loader, optimizer, criterion, device):
    model.train()
    total_loss = 0.0

    for input_ids, labels in loader:
        input_ids = input_ids.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        logits = model(input_ids)
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * input_ids.size(0)
    
    avg_loss = total_loss / len(loader.dataset)
    return avg_loss

def evaluate(model, loader, criterion, device, threshold = 0.5):
    model.eval()
    total_loss  = 0.0
    all_true = []
    all_pred = []

    with torch.no_grad():
        for input_ids, labels in loader:
            input_ids = input_ids.to(device)
            labels = labels.to(device)

            logits = model(input_ids)
            loss = criterion(logits, labels)
            total_loss += loss.item() * input_ids.size(0)

            probs = torch.sigmoid(logits).cpu().numpy()
            all_pred.append(probs)
            all_true.append(labels.cpu().numpy())

    avg_loss = total_loss / len(loader.dataset)

    y_true = np.vstack(all_true)
    y_scores = np.vstack(all_pred)
    y_hat = (y_scores >= threshold).astype(int)

    f1_micro = f1_score(y_true, y_hat, average = "micro", zero_division=0)
    f1_macro = f1_score(y_true, y_hat, average="macro", zero_division=0)

    return avg_loss, f1_micro, f1_macro, y_hat, y_true

best_val_f1 = 0.0

for epoch in range(1,EPOCHS + 1):
    train_loss = train_one_epoch(model,train_loader, optimizer, criterion, DEVICE)
    val_loss, f1_micro, f1_macro, y_true, y_hat = evaluate(
        model, val_loader, criterion, DEVICE 
    )
    print(
        f"Epoch {epoch:02d} | "
        f"train loss: {train_loss:.4f} | "
        f"val loss: {val_loss:.4f} | "
        f"F1 micro: {f1_micro:.4f} | "
        f"F1 macro: {f1_macro:.4f}"
    )
if f1_micro > best_val_f1:
    best_val_f1 = f1_micro
    os.makedirs("saved_models", exist_ok=True)
    save_path = os.path.join("saved_models", " model2_bilstm.pt")
    torch.save(model.state_dict(), save_path)
    print(f"  -> New best model saved to {save_path}")

print("\nClassification report (last epoch):")
print(classification_report(
    y_true, y_hat, target_names=LABEL_COLUMNS, zero_division=0
))

# Interactive CLI Demo Using Best Model

In [None]:
import os
import re
import collections
import numpy as np
import pandas as pd

import torch
import torch.nn as nn

from sklearn.model_selection import train_test_split

DATA_CSV_PATH = os.path.join("data", "raw", "train.csv")

MAX_VOCAB_SIZE = 20000
MAX_LEN = 100
EMBED_DIM = 128
HIDDEN_DIM = 128
NUM_LAYERS = 1

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

LABEL_COLUMNS = [
    "toxic",
    "severe_toxic",
    "obscene",
    "threat",
    "insult",
    "identity_hate",
]


if not os.path.exists(DATA_CSV_PATH):
    raise FileNotFoundError(
        f"Could not find {DATA_CSV_PATH}. "
        f"Make sure train.csv is at data/raw/train.csv"
    )

df = pd.read_csv(DATA_CSV_PATH)

texts = df["comment_text"].fillna("").tolist()
y = df[LABEL_COLUMNS].values.astype(np.float32)

# same split as model2 (so vocab is built on train only)
X_train_texts, X_val_texts, y_train, y_val = train_test_split(
    texts,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y[:, 0],
)

def simple_tokenize(text):
    """
    Same simple tokenizer as model2.py
    """
    text = text.lower()
    text = re.sub(r"[^a-z]+", " ", text)
    tokens = text.strip().split()
    return tokens


counter = collections.Counter()
for t in X_train_texts:
    counter.update(simple_tokenize(t))

most_common = counter.most_common(MAX_VOCAB_SIZE - 2)

itos = ["<pad>", "<unk>"] + [w for (w, _) in most_common]
stoi = {w: i for i, w in enumerate(itos)}

PAD_IDX = stoi["<pad>"]
UNK_IDX = stoi["<unk>"]
vocab_size = len(itos)


def encode_text(text, max_len=MAX_LEN):
    """
    Convert text to fixed-length list of token IDs.
    """
    tokens = simple_tokenize(text)
    ids = [stoi.get(tok, UNK_IDX) for tok in tokens]

    if len(ids) < max_len:
        ids = ids + [PAD_IDX] * (max_len - len(ids))
    else:
        ids = ids[:max_len]

    return np.array(ids, dtype=np.int64)

class ToxicBiLSTMModel(nn.Module):
    def __init__(
        self,
        vocab_size,
        embed_dim,
        hidden_dim,
        num_layers,
        num_labels,
        pad_idx=0,
    ):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
        self.lstm = nn.LSTM(
            embed_dim,
            hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            bidirectional=True,
        )
        self.dropout = nn.Dropout(0.5)
        self.fc = nn.Linear(2 * hidden_dim, num_labels)

    def forward(self, input_ids):
        emb = self.embedding(input_ids)     # (batch, seq_len, embed_dim)
        outputs, (h_n, c_n) = self.lstm(emb)
        # last layer's forward and backward hidden states
        h_forward = h_n[-2, :, :]
        h_backward = h_n[-1, :, :]
        h_cat = torch.cat([h_forward, h_backward], dim=1)
        x = self.dropout(h_cat)
        logits = self.fc(x)
        return logits
    
model_path = os.path.join("saved_models", " model2_bilstm.pt")
if not os.path.exists(model_path):
    raise FileNotFoundError(
        f"Could not find {model_path}. "
        f"Train model2.py first so it saves the weights."
    )

model = ToxicBiLSTMModel(
    vocab_size=vocab_size,
    embed_dim=EMBED_DIM,
    hidden_dim=HIDDEN_DIM,
    num_layers=NUM_LAYERS,
    num_labels=len(LABEL_COLUMNS),
    pad_idx=PAD_IDX,
).to(DEVICE)

state_dict = torch.load(model_path, map_location=DEVICE)
model.load_state_dict(state_dict)
model.eval()

print("Loaded model2_bilstm.pt")
print("Vocab size:", vocab_size)
print("Ready for interactive demo.\n")

def predict_comment(text, threshold=0.5):
    """
    Encode text, run model, return label probabilities and predictions.
    """
    ids = encode_text(text, MAX_LEN)
    input_ids = torch.tensor(ids, dtype=torch.long).unsqueeze(0).to(DEVICE)  # (1, seq_len)

    with torch.no_grad():
        logits = model(input_ids)
        probs = torch.sigmoid(logits).cpu().numpy()[0]  # shape: (6,)

    predictions = (probs >= threshold).astype(int)
    return probs, predictions


def pretty_print_predictions(text, probs, preds, threshold=0.5):
    print("\nInput comment:")
    print(text)
    print("\nPredicted labels (threshold = {:.2f}):".format(threshold))
    for label, p, pred in zip(LABEL_COLUMNS, probs, preds):
        status = "YES" if pred == 1 else "no"
        print(f"  {label:13s}  ->  {status:3s}  (p = {p:.3f})")
    print("-" * 50)


if __name__ == "__main__":
    print("Type a comment to classify. Type 'quit' to exit.\n")

    while True:
        user_input = input("Your comment: ").strip()
        if user_input.lower() in {"quit", "exit"}:
            print("Exiting demo.")
            break

        if not user_input:
            print("Please type something.\n")
            continue

        probs, preds = predict_comment(user_input, threshold=0.5)
        pretty_print_predictions(user_input, probs, preds, threshold=0.5)