# 📊 Dataset Preparation: Religious Hate Speech Classification

This notebook prepares the dataset used for training a deep learning model to detect **religious hate speech** in online comments.

We use the [`civil_comments`](https://huggingface.co/datasets/civil_comments) dataset from Hugging Face, originally released as part of the [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification) challenge.

---

## 🧪 Steps in this notebook:

1. **Load dataset** from Hugging Face
2. **Detect religion-related comments** using keyword-based filtering
3. **Apply weak labeling** to define hate speech: `mentions_religion AND toxicity > 0.5`
4. **Handle class imbalance** by upsampling hate comments
5. **Split dataset** into train / validation / test sets (stratified)
6. **Save final datasets** to CSV files for downstream training

---

## 📁 Output files:

All data is saved in the `data/` folder:
- `train.csv`, `val.csv`, `test.csv` → original distribution (imbalanced)
- `train_balanced.csv`, `val_balanced.csv`, `test_balanced.csv` → 50/50 balanced split for model training

---


In [17]:
# 🧠 Dataset prep for Religious Hate Detection (data.ipynb)

# ✅ 1. Install datasets package if needed
!pip install datasets --quiet

# ✅ 2. Load dataset
from datasets import load_dataset
import pandas as pd
import re

print("🔄 Loading 'civil_comments' dataset...")
dataset = load_dataset("civil_comments")
df = dataset['train'].to_pandas()

# ✅ 3. Clean & drop nulls
df = df[df['text'].notna()]

# ✅ 4. Define religion-related keywords
religion_keywords = [
    "muslim", "islam", "islamic", "jew", "jewish", "judaism",
    "christian", "christianity", "bible", "jesus", "god", "catholic", "pope",
    "hindu", "hinduism", "buddha", "buddhist", "atheist", "religion", "religious"
]

def mentions_religion(text):
    text = str(text).lower()
    return any(re.search(rf"\b{kw}\b", text) for kw in religion_keywords)

# ✅ 5. Apply religion detection + weak labeling
df['mentions_religion'] = df['text'].apply(mentions_religion)
df['religious_hate'] = (df['mentions_religion']) & (df['toxicity'] > 0.5)
df_filtered = df[df['mentions_religion']].copy()
# 🧹 Drop duplicates to prevent data leakage
df_filtered = df_filtered.drop_duplicates(subset="text").reset_index(drop=True)
df_filtered['label'] = df_filtered['religious_hate'].astype(int)

# ✅ 6. Show basic stats
print("🔢 Label distribution:")
print(df_filtered['label'].value_counts())




🔄 Loading 'civil_comments' dataset...
🔢 Label distribution:
label
0    89372
1     6762
Name: count, dtype: int64


## Dataset Labeling and Class Imbalance

After loading the `civil_comments` dataset and labeling comments that:
- (1) Mention religion (using keyword matching), and
- (2) Have high toxicity scores (`toxicity > 0.5`),

we found that only a small fraction of comments were labeled as **religious hate speech**.

| Label        | Count   | Percent |
|--------------|---------|---------|
| Non-Hate     | ~93%    | 89,818  |
| Hate         | ~7%     | 6,818   |

This class imbalance is a problem for training deep learning models, especially since they tend to learn the majority class by default, ignoring the minority. Our earlier model achieved high accuracy, but very low recall and F1-score on the hate class.

To address this:
- We **upsample** the hate class (duplicate those examples)
- We build a **balanced dataset** for training and evaluation
- We also retain the original dataset for later comparison

This approach will allow the model to learn more meaningful patterns related to religious hate speech in a balanced setting, and then later be tested on imbalanced, real-world data.


In [18]:
# ✅ 7. Train / Val / Test Split
duplicate_count = df_filtered.duplicated(subset="text").sum()
print(f"🧠 Duplicate comments (same text): {duplicate_count}")

from sklearn.model_selection import train_test_split

# ✅ Split before upsampling
train_df, temp_df = train_test_split(
    df_filtered, test_size=0.2, stratify=df_filtered["label"], random_state=42
)

val_df, test_df = train_test_split(
    temp_df, test_size=0.5, stratify=temp_df["label"], random_state=42
)

# ✅ Upsample hate in the training set only
df_hate = train_df[train_df["label"] == 1]
df_non_hate = train_df[train_df["label"] == 0]

df_hate_upsampled = df_hate.sample(n=len(df_non_hate), replace=True, random_state=42)
train_balanced_df = pd.concat([df_non_hate, df_hate_upsampled]).sample(frac=1, random_state=42)

# ✅ Save
train_balanced_df.to_csv("../data/train_balanced.csv", index=False)
val_df.to_csv("../data/val_balanced.csv", index=False)
test_df.to_csv("../data/test_balanced.csv", index=False)


🧠 Duplicate comments (same text): 0


In [19]:
overlap = set(train_df['text']) & set(test_df['text'])
print(f"⚠️ Overlapping comments: {len(overlap)}")

⚠️ Overlapping comments: 0
