# Deep Learning Paradigm: Bidirectional GRU
I selected a **Bidirectional GRU** as the neural network paradigm for this problem. While Naive Bayes relies on simple word counts, this Recurrent Neural Network (RNN) processes text as a sequence, allowing it to capture context and word order. This offers a distinct trade-off compared to the statistical approach:

- **Efficiency:** It is computationally expensive. While Naive Bayes trained instantly, this model required significantly more time (approx. 40 minutes) to converge over just 2 epochs.
- **Performance:** The computational cost has a massive gain in predictive power. As detailed below, the Bi-GRU outperforms Naive Bayes across all metrics, particularly in distinguishing difficult, rare classes.
- **Context Awareness:** Unlike the "TF-IDF" model which treats "not good" and "good" similarly (just checking for the presence of words), the Bi-GRU understands that "not" negates "good" because it sees the sequence.

In [29]:
import os

import numpy as np
import pandas as pd

from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input, Dense, Embedding, Bidirectional, GRU, GlobalMaxPool1D, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

from sklearn.metrics import roc_auc_score
from sklearn.metrics import f1_score

In [30]:
train = pd.read_csv('data/train.csv')
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
y = train[list_classes].values
list_sentences = train["comment_text"].fillna("_na_").values

In the statistical approach, I used TF-IDF to weigh words by rarity. Here, I use Word Embeddings. I map each word to a pre-trained vector (GloVe) of 100 dimensions. This allows the model to understand semantic relationships—it knows that "stupid" and "idiot" are mathematically similar vectors, whereas TF-IDF treats them as completely unrelated tokens (orthogonal). I limit the vocabulary to the top 20,000 words and pad all comments to a fixed length of 200 tokens.

In [31]:
max_features = 20000
maxlen = 200
embed_size = 100

tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(list(list_sentences))
list_tokenized_train = tokenizer.texts_to_sequences(list_sentences)
X_t = pad_sequences(list_tokenized_train, maxlen=maxlen)

In [32]:
embeddings_index = {}
with open('data/glove.6B.100d.txt', encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

word_index = tokenizer.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.zeros((nb_words, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector

I define a Keras model that starts with an Embedding layer initialized with the GloVe weights. This feeds into a Bidirectional GRU layer, which reads the comment both forwards and backwards to capture maximum context. I use GlobalMaxPool1D to reduce the dimensionality and extract the most salient features (the strongest signals of toxicity) before passing them to a Dense layer and a final Sigmoid output layer for multi-label classification.

In [33]:
inp = Input(shape=(maxlen,))
x = Embedding(max_features, embed_size, weights=[embedding_matrix])(inp)
x = Bidirectional(GRU(128, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(x)
x = GlobalMaxPool1D()(x)
x = Dense(50, activation="relu")(x)
x = Dropout(0.1)(x)
x = Dense(6, activation="sigmoid")(x)
model = Model(inputs=inp, outputs=x)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
os.makedirs("models", exist_ok=True)

history = model.fit(
    X_t, 
    y, 
    batch_size=32, 
    epochs=2, 
    validation_split=0.1, 
)

model.save("models/toxic_model.keras")

In [None]:
model = load_model("models/toxic_model.keras")

  saveable.load_own_variables(weights_store.get(inner_path))


In [36]:
test = pd.read_csv('data/test.csv')
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
list_sentences_test = test["comment_text"].fillna("_na_").values
y_test_true = test[list_classes].values

list_tokenized_test = tokenizer.texts_to_sequences(list_sentences_test)
X_test = pad_sequences(list_tokenized_test, maxlen=maxlen)

y_test_pred = model.predict(X_test)

[1m2000/2000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m120s[0m 60ms/step


I achieved a Macro AUC of 0.9802, which is a significant improvement over the Naive Bayes score of 0.9315. The most impressive gain is in the Threat class, which jumped from 0.90 (in Naive Bayes) to 0.99 here. This shows that the neural network is much better at detecting rare, context-dependent classes that the statistical model struggled with.

In [None]:
mean_auc = roc_auc_score(y_test_true, y_test_pred, average='macro')

print(mean_auc)

per_class_auc_scores = roc_auc_score(y_test_true, y_test_pred, average=None)

auc_report = pd.DataFrame({
    'Class': list_classes,
    'AUC_ROC_Score': per_class_auc_scores
})

auc_report_sorted = auc_report.sort_values(by='AUC_ROC_Score', ascending=False)
print(auc_report_sorted.to_string(index=False))

0.9801604154054294
        Class  AUC_ROC_Score
       threat       0.990243
 severe_toxic       0.989750
identity_hate       0.981485
      obscene       0.977440
       insult       0.975145
        toxic       0.966900


In [None]:
val_size = int(len(X_t) * 0.1)
X_val = X_t[-val_size:]
y_val = y[-val_size:]

y_val_proba = model.predict(X_val)

[1m499/499[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 54ms/step


In [None]:
thresholds = np.arange(0.1, 0.9, 0.05)
best_threshold = 0.5
best_f1 = 0

for thresh in thresholds:
    y_val_pred_binary = (y_val_proba > thresh).astype(int)
    
    current_f1 = f1_score(y_val, y_val_pred_binary, average='micro')
    
    if current_f1 > best_f1:
        best_f1 = current_f1
        best_threshold = thresh

print(f"\nBest Threshold found: {best_threshold:.2f}")


Best Threshold found: 0.45


By optimizing the decision boundary on the validation set, I found the best threshold to be 0.45, which gives a Macro F1 of 0.5764.

This result confirms the that the Deep Learning approach has a higher accuracy than the statistical baseline (0.36 Macro F1). While the Naive Bayes model required a drastic threshold drop to 0.25 to detect minority classes, the Bi-GRU is naturally well-calibrated near the default. The biggest improvement is in the rare classes: 'threat' jumped from a non-existent 0.00 to 0.42, and 'identity_hate' went from 0.13 to 0.59, showing that the semantic embeddings successfully captured context that TF-IDF didn't have.

In [35]:
y_test_pred_binary_simple = (y_test_pred > best_threshold).astype(int)

macro_f1_simple = f1_score(y_test_true, y_test_pred_binary_simple, average='macro')
micro_f1_simple = f1_score(y_test_true, y_test_pred_binary_simple, average='micro')

print(f"Macro F1: {macro_f1_simple:.4f}")
print(f"Micro F1: {micro_f1_simple:.4f}")

simple_f1_scores = f1_score(y_test_true, y_test_pred_binary_simple, average=None)

f1_report_simple = pd.DataFrame({
    'Class': list_classes,
    'F1_Score': simple_f1_scores
})

print(f1_report_simple.sort_values(by='F1_Score', ascending=False).to_string(index=False))



Macro F1: 0.5764
Micro F1: 0.6563
        Class  F1_Score
      obscene  0.676128
       insult  0.666005
        toxic  0.664232
identity_hate  0.595273
 severe_toxic  0.432251
       threat  0.424658
