# Out-of-Domain Generalization: Hate Speech Detection

This notebook evaluates how well domain-specific classifiers generalize to out-of-domain hate speech detection.
We test various model architectures trained on domain classification to see if they can effectively identify hate speech.

Key aspects evaluated:
- Few-shot transfer capabilities using DSPy
- Model robustness across different datasets

## Model Overview

This notebook evaluates several classification models:
1. **LLM-based**: Using Qwen 2.5 for zero-shot classification
2. **BERT-based**: ModernBERT with NLI approach
3. **Traditional ML**: 
   - fastText for efficient text classification
   - SVM and XGBoost with different embeddings

Each model is evaluated on hate speech detection as an out-of-domain task.

## Setup
Import dependencies and initialize models

In [None]:
import os
import pickle as pkl
import random
import statistics
import time
from functools import partial

import cupy as cp
import fasttext
import numpy as np
import pandas as pd
import onnxruntime as ort

from dotenv import load_dotenv
from fastembed import TextEmbedding
from tqdm import tqdm
from xgboost import XGBClassifier
from transformers import AutoTokenizer

from prompt_classifier.metrics import evaluate_run

load_dotenv()
random.seed(22)
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [2]:
def save_batch_results(batch_results, model_name):
    filename = f'reports/batch_{model_name}.csv'
    
    df = pd.DataFrame(batch_results)
    
    if os.path.exists(filename):
        # If file exists, append without header
        df.to_csv(filename, mode='a', header=False, index=False)
    else:
        # If file doesn't exist, create new with header
        df.to_csv(filename, index=False)

def load_batch_data():
    batch_data = pd.read_csv('data/batch_data.csv')
    return batch_data["prompt"].values.tolist()

In [3]:
print(os.getcwd())
batch_data = load_batch_data()
batch_sizes = [1, 32, 64, 128, 256]

/app


## Load Test Datasets

Load various hate speech datasets for evaluation:
- Jigsaw Toxicity
- OLID
- HateXplain
- TUKE Slovak

In [4]:
domain_data = pd.read_csv("data/domain_eval.csv")
ood_data = pd.read_csv("data/ood_eval.csv")

In [5]:
# jigsaw_splits = {
#     "train": "train_dataset.csv",
#     "validation": "val_dataset.csv",
#     "test": "test_dataset.csv",
# }
# jigsaw_df = pd.read_csv(
#     "hf://datasets/Arsive/toxicity_classification_jigsaw/"
#     + jigsaw_splits["validation"]
# )

# jigsaw_df = jigsaw_df[
#     (jigsaw_df["toxic"] == 1)
#     | (jigsaw_df["severe_toxic"] == 1)
#     | (jigsaw_df["obscene"] == 1)
#     | (jigsaw_df["threat"] == 1)
#     | (jigsaw_df["insult"] == 1)
#     | (jigsaw_df["identity_hate"] == 1)
# ]

# jigsaw_df = jigsaw_df.rename(columns={"comment_text": "prompt"})
# jigsaw_df["label"] = 0
# jigsaw_df = jigsaw_df[["prompt", "label"]]
# jigsaw_df = jigsaw_df.dropna(subset=["prompt"])
# jigsaw_df = jigsaw_df[jigsaw_df["prompt"].str.strip() != ""]

# # Load OLID dataset
# olid_splits = {"train": "train.csv", "test": "test.csv"}
# olid_df = pd.read_csv("hf://datasets/christophsonntag/OLID/" + olid_splits["test"])
# olid_df = olid_df.rename(columns={"cleaned_tweet": "prompt"})
# olid_df["label"] = 0
# olid_df = olid_df[["prompt", "label"]]
# olid_df = olid_df.dropna(subset=["prompt"])
# olid_df = olid_df[olid_df["prompt"].str.strip() != ""]

# # Load hateXplain dataset
# hate_xplain = pd.read_parquet(
#     "hf://datasets/nirmalendu01/hateXplain_filtered/data/train-00000-of-00001.parquet"
# )
# hate_xplain = hate_xplain.rename(columns={"test_case": "prompt"})
# hate_xplain = hate_xplain[(hate_xplain["gold_label"] == "hateful")]
# hate_xplain = hate_xplain[["prompt", "label"]]
# hate_xplain["label"] = 0
# hate_xplain = hate_xplain.dropna(subset=["prompt"])
# hate_xplain = hate_xplain[hate_xplain["prompt"].str.strip() != ""]

# # Load TUKE Slovak dataset
# tuke_sk_splits = {"train": "train.json", "test": "test.json"}
# tuke_sk_df = pd.read_json(
#     "hf://datasets/TUKE-KEMT/hate_speech_slovak/" + tuke_sk_splits["test"],
#     lines=True,
# )
# tuke_sk_df = tuke_sk_df.rename(columns={"text": "prompt"})
# tuke_sk_df = tuke_sk_df[tuke_sk_df["label"] == 0]
# tuke_sk_df = tuke_sk_df[["prompt", "label"]]
# tuke_sk_df = tuke_sk_df.dropna(subset=["prompt"])
# tuke_sk_df = tuke_sk_df[tuke_sk_df["prompt"].str.strip() != ""]


# dkk_all = pd.read_parquet("data/test-00000-of-00001.parquet")
# dkk_all = dkk_all.rename(columns={"text": "prompt"})
# dkk_all["label"] = 0
# dkk_all = dkk_all.dropna(subset=["prompt"])
# dkk_all = dkk_all[dkk_all["prompt"].str.strip() != ""]

# splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
# web_questions = pd.read_parquet("hf://datasets/Stanford/web_questions/" + splits["test"])

# web_questions['prompt'] = web_questions['question']
# web_questions['label'] = 0
# web_questions['dataset'] = 'web_questions'
# web_questions = web_questions[['prompt', 'label']]

# splits = {'train': 'data/train-00000-of-00001-7ebb9cdef03dd950.parquet', 'test': 'data/test-00000-of-00001-fbd3905b045b12b8.parquet'}
# ml_questions = pd.read_parquet("hf://datasets/mjphayes/machine_learning_questions/" + splits["test"])

# ml_questions['prompt'] = ml_questions['question']
# ml_questions['label'] = 0
# ml_questions['dataset'] = 'machine_learning_questions'
# ml_questions = ml_questions[['prompt', 'label']]

# datasets = {
#     "jigsaw": jigsaw_df,
#     "olid": olid_df,
#     "hate_xplain": hate_xplain,
#     "tuke_sk": tuke_sk_df,
#     "dkk": dkk_all,
#     "web_questions": web_questions,
#     "ml_questions": ml_questions,
# }

## Datasets Overview

We use four major hate speech datasets:

1. **Jigsaw Toxicity**
   - Multi-label toxicity classification
   - Includes toxic, severe_toxic, obscene, threat, insult, identity_hate labels

2. **OLID (Offensive Language Identification Dataset)**
   - Hierarchical labeling of offensive language
   - Focuses on Twitter content

3. **HateXplain**
   - Annotated with rationales for hate speech
   - Includes target community information

4. **TUKE Slovak**
   - Slovak language hate speech dataset
   - Tests cross-lingual generalization

## Embedding Models

Using multiple embedding approaches:
- **BAAI BGE**: Optimized for semantic similarity
- **MiniLM**: Efficient sentence transformers model
- **TF-IDF**: Traditional bag-of-words approach

These embeddings are used with SVM and XGBoost classifiers.

In [6]:
baai_embedding = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", 
    providers=["CUDAExecutionProvider"]
)

mini_embedding = TextEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    providers=["CUDAExecutionProvider"],
)

# TF-IDF
tfidf_finance = pkl.load(open("models/tfidf_finance.pkl", "rb"))
tfidf_healthcare = pkl.load(open("models/tfidf_healthcare.pkl", "rb"))
tfidf_law = pkl.load(open("models/tfidf_law.pkl", "rb"))

[0;93m2025-04-24 16:54:00.771001950 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2025-04-24 16:54:00.771054357 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m
[0;93m2025-04-24 16:54:01.560910484 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.[m
[0;93m2025-04-24 16:54:01.560986724 [W:onnxruntime:, session_state.cc:1170 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.[m


## Model Evaluation Process

For each dataset and model combination:
1. Load pre-trained domain classifiers
2. Process test samples through each domain classifier
3. Combine predictions using OR logic (any domain=1 -> toxic=0)
4. Calculate metrics:
   - Accuracy, Precision, Recall

# LLM

In [7]:
# model_name = "ollama_chat/qwen2.5:14b"

# for domain, inference_df in datasets.items():
#     try:
#         llm_classifier_finance = LlmClassifier(
#             api_key="",
#             api_base="http://localhost:11434",
#             model_name=model_name,
#             domain="finance",
#             train_data=inference_df,
#             test_data=inference_df,
#         )

#         llm_classifier_healthcare = LlmClassifier(
#             api_key="",
#             api_base="http://localhost:11434",
#             model_name=model_name,
#             domain="healthcare",
#             train_data=inference_df,
#             test_data=inference_df,
#         )

#         llm_classifier_law = LlmClassifier(
#             api_key="",
#             api_base="http://localhost:11434",
#             model_name=model_name,
#             domain="law",
#             train_data=inference_df,
#             test_data=inference_df,
#         )

#         # Load models
#         llm_classifier_finance.load_model("models/qwen2.5:14b-finance.json")
#         llm_classifier_healthcare.load_model("models/qwen2.5:14b-healthcare.json")
#         llm_classifier_law.load_model("models/qwen2.5:14b-law.json")

#         predictions_llm = []
#         prediction_times_llm = []
#         actuals_llm = []

#         # Get predictions for each prompt
#         for _, row in tqdm(inference_df.iterrows(), total=len(inference_df)):
#             start_time = time.perf_counter_ns()

#             # Get predictions from all models
#             pred_finance = llm_classifier_finance.predict_single(row["prompt"])
#             pred_healthcare = llm_classifier_healthcare.predict_single(row["prompt"])
#             pred_law = llm_classifier_law.predict_single(row["prompt"])

#             end_time = time.perf_counter_ns()
#             prediction_times_llm.append(end_time - start_time)

#             # If all models predict 0, final prediction is 0, otherwise 1
#             predictions_llm.append(
#                 0 if (pred_finance == 0 and pred_healthcare == 0 and pred_law == 0) else 1
#             )
#             actuals_llm.append(row["label"])

#         evaluate_run(
#             predictions=predictions_llm,
#             true_labels=actuals_llm,
#             latency=statistics.mean(prediction_times_llm),
#             domain=domain,
#             embed_model="qwen2.5",
#             model_name=model_name,
#             train_acc=0.0,
#             cost=0.0,
#             training=False,
#         )
#     except Exception as e:
#         print(f"Error running LLM model: {e}")

# ModernBERT

In [8]:
# bert_classifier_finance = ModernBERTNLI(domain="finance")
# bert_classifier_healthcare = ModernBERTNLI(domain="healthcare")
# bert_classifier_law = ModernBERTNLI(domain="law")

# try:
#     # Move models to GPU
#     bert_classifier_finance.classifier.model.to("cuda")
#     bert_classifier_healthcare.classifier.model.to("cuda")
#     bert_classifier_law.classifier.model.to("cuda")

#     for domain, inference_df in datasets.items():
#         predictions_bert = []
#         prediction_times_bert = []
#         actuals_bert = []
#         # Get predictions for each prompt
#         for _, row in tqdm(inference_df.iterrows(), total=len(inference_df)):
#             start_time = time.perf_counter_ns()

#             # Get predictions from all models
#             pred_finance = bert_classifier_finance.predict(row["prompt"])
#             pred_healthcare = bert_classifier_healthcare.predict(row["prompt"])
#             pred_law = bert_classifier_law.predict(row["prompt"])

#             end_time = time.perf_counter_ns()
#             prediction_times_bert.append(end_time - start_time)

#             # If all models predict 0, final prediction is 0, otherwise 1
#             predictions_bert.append(
#                 0 if (pred_finance == 0 and pred_healthcare == 0 and pred_law == 0) else 1
#             )
#             actuals_bert.append(row["label"])

#     evaluate_run(
#         predictions=predictions_bert,
#         true_labels=actuals_bert,
#         latency=statistics.mean(prediction_times_bert),
#         domain=domain,
#         embed_model="BERT",
#         model_name="ModernBERT",
#         train_acc=0.0,
#         cost=0.0,
#         training=False,
#     )
# except Exception as e:
#     print(f"Error running ModernBERT models: {e}")

In [9]:
fasttext_law = fasttext.load_model("models/fastText_law_fasttext.bin")
fasttext_finance = fasttext.load_model("models/fastText_finance_fasttext.bin")
fasttext_healthcare = fasttext.load_model("models/fastText_healthcare_fasttext.bin")




# FastText

In [10]:
batch_results = []
try:
    for batch_size in batch_sizes:
        all_results = []
        num_batches = 0
        # create batches from batch_data
        batches = [
            batch_data[i : i + batch_size] for i in range(0, len(batch_data), batch_size)
        ]
        for batch in batches:
            num_batches += 1
            batch_metrics = {
                "time_taken_law": 0,
                "time_taken_finance": 0,  
                "time_taken_healthcare": 0
            }

            batch_prompt = [prompt.replace("\n", " ") for prompt in batch]
            
            # Time law predictions
            start_time = time.perf_counter()
            law_preds = fasttext_law.predict(batch_prompt)
            batch_metrics['time_taken_law'] += time.perf_counter() - start_time

            # Time finance predictions
            start_time = time.perf_counter()
            finance_preds = fasttext_finance.predict(batch_prompt)
            batch_metrics['time_taken_finance'] += time.perf_counter() - start_time

            # Time healthcare predictions
            start_time = time.perf_counter()
            health_preds = fasttext_healthcare.predict(batch_prompt)
            batch_metrics['time_taken_healthcare'] += time.perf_counter() - start_time

            results =[]
            for prediction_finance, prediction_healthcare, prediction_law in zip(finance_preds[0], health_preds[0], law_preds[0]):
                  results.append({
                    'finance': 1 if prediction_finance[0] == "__label__1" else 0,
                    'healthcare': 1 if prediction_healthcare[0] == "__label__1" else 0,
                    'law': 1 if prediction_law[0] == "__label__1" else 0
                })

            batch_results.append({
                "batch_size": batch_size,
                "time_taken_embed": 0,
                "time_taken_law": batch_metrics['time_taken_law'],
                "time_taken_finance": batch_metrics['time_taken_finance'],
                "time_taken_healthcare": batch_metrics['time_taken_healthcare'],
                "results": results,
                "model_name": "fasttext",
                "embedding_model": "fasttext",
                "embedding": False
            })

except Exception as e:
    print(f"FastText error: {e}")
pd.DataFrame(batch_results).to_csv("reports/batch_fasttext.csv", index=False)

# ML - SVM, XGB

In [None]:
for embedding_model in ["mini", "baai", "tf_idf"]:    
    svm_batch_results = []
    xgb_batch_results = []
    try:
        xgb_law = XGBClassifier(
            tree_method="hist",
            device="cuda",
            seed=22
        )
        xgb_finance = XGBClassifier(
            tree_method="hist",
            device="cuda",
            seed=22
        )
        xgb_healthcare = XGBClassifier(
            tree_method="hist",
            device="cuda",
            seed=22
        )

        xgb_law.load_model(f"models/XGBoost_law_{embedding_model}.json")
        xgb_finance.load_model(f"models/XGBoost_finance_{embedding_model}.json")
        xgb_healthcare.load_model(f"models/XGBoost_healthcare_{embedding_model}.json")

        for batch_size in batch_sizes:
            svm_all_results = []
            xgb_all_results = []
            num_batches = 0

            batches = [
                batch_data[i : i + batch_size] for i in range(0, len(batch_data), batch_size)
            ]
            for batch in batches:
                num_batches += 1

                batch_metrics = {
                    "embed_time": 0,
                    "svm_law_time": 0,
                    "svm_finance_time": 0,
                    "svm_health_time": 0,
                    "xgb_law_time": 0,
                    "xgb_finance_time": 0,
                    "xgb_health_time": 0
                }

                # Time embeddings
                start_time = time.perf_counter()
                if embedding_model == "tf_idf":
                    embeds = tfidf_finance.transform(batch)
                    embeds = embeds.toarray()
                elif embedding_model == "mini":
                    embeds = list(mini_embedding.embed(batch))
                else: # baai
                    embeds = list(baai_embedding.embed(batch))
                batch_metrics['embed_time'] += time.perf_counter() - start_time

                embeds = cp.array(embeds)

                # XGB predictions
                start_time = time.perf_counter()
                xgb_law_preds = xgb_law.predict(embeds)
                batch_metrics['xgb_law_time'] += time.perf_counter() - start_time

                start_time = time.perf_counter()
                xgb_finance_preds = xgb_finance.predict(embeds)
                batch_metrics['xgb_finance_time'] += time.perf_counter() - start_time

                start_time = time.perf_counter()
                xgb_health_preds = xgb_healthcare.predict(embeds)
                batch_metrics['xgb_health_time'] += time.perf_counter() - start_time

                xgb_batch_preds = [1 if (l or f or h) else 0
                                  for l,f,h in zip(xgb_law_preds, xgb_finance_preds, xgb_health_preds)]


                xgb_batch_results.append({
                    "batch_size": batch_size,
                    "time_taken_embed": batch_metrics['embed_time'],
                    "time_taken_law": batch_metrics['xgb_law_time'],
                    "time_taken_finance": batch_metrics['xgb_finance_time'],
                    "time_taken_healthcare": batch_metrics['xgb_health_time'],
                    "results": xgb_batch_preds,
                    "model_name": "xgb",
                    "embedding_model": embedding_model,
                    "embedding": True
                })
        pd.DataFrame(xgb_batch_results).to_csv(f"reports/batch_xgb_{embedding_model}.csv", index=False)

    except Exception as e:
        print(f"Error processing {embedding_model}: {e}")

cuda


Potential solutions:
- Use a data structure that matches the device ordinal in the booster.
- Set the device for booster before call to inplace_predict.




cuda
cuda


In [None]:
for embedding_model in ["mini", "baai", "tf_idf"]:    
    svm_batch_results = []
    xgb_batch_results = []

    try:
        # Load models
        svm_law = pkl.load(open(f"models/SVM_law_{embedding_model}.pkl", "rb"))
        svm_finance = pkl.load(open(f"models/SVM_finance_{embedding_model}.pkl", "rb"))
        svm_healthcare = pkl.load(open(f"models/SVM_healthcare_{embedding_model}.pkl", "rb"))


        for batch_size in batch_sizes:
            
            batches = [
                batch_data[i : i + batch_size] for i in range(0, len(batch_data), batch_size)
            ]
            for batch in batches:
                num_batches += 1

                batch_metrics = {
                    "embed_time": 0,
                    "svm_law_time": 0,
                    "svm_finance_time": 0,
                    "svm_health_time": 0,
                    "xgb_law_time": 0,
                    "xgb_finance_time": 0,
                    "xgb_health_time": 0
                }

                # Time embeddings
                start_time = time.perf_counter()
                if embedding_model == "tf_idf":
                    embeds = tfidf_finance.transform(batch)
                elif embedding_model == "mini":
                    embeds = np.array(list(mini_embedding.embed(batch)))
                else: # baai
                    embeds = np.array(list(baai_embedding.embed(batch)))
                batch_metrics['embed_time'] += time.perf_counter() - start_time

                # Get all predictions and time them
                start_time = time.perf_counter()
                svm_law_preds = svm_law.predict(embeds)
                batch_metrics['svm_law_time'] += time.perf_counter() - start_time

                start_time = time.perf_counter()
                svm_finance_preds = svm_finance.predict(embeds)
                batch_metrics['svm_finance_time'] += time.perf_counter() - start_time

                start_time = time.perf_counter() 
                svm_health_preds = svm_healthcare.predict(embeds)
                batch_metrics['svm_health_time'] += time.perf_counter() - start_time
                # Combine predictions - 0 only if all predict 0
                svm_batch_preds = [1 if (l or f or h) else 0
                                  for l,f,h in zip(svm_law_preds, svm_finance_preds, svm_health_preds)]
               
                # Record results for this batch size (averaged)
                svm_batch_results.append({
                    "batch_size": batch_size,
                    "time_taken_embed": batch_metrics['embed_time'],
                    "time_taken_law": batch_metrics['svm_law_time'],
                    "time_taken_finance": batch_metrics['svm_finance_time'],
                    "time_taken_healthcare": batch_metrics['svm_health_time'],
                    "results": svm_batch_preds,
                    "model_name": "svm",
                    "embedding_model": embedding_model,
                    "embedding": True
                })

        pd.DataFrame(svm_batch_results).to_csv(f"reports/batch_svm_{embedding_model}.csv", index=False)

    except Exception as e:
        print(f"Error processing {embedding_model}: {e}")

# Tibor

In [None]:
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

In [None]:
# Initialize tokenizer function for batch processing
tokenizer_func = partial(
    tokenizer, padding=True, truncation=True, return_tensors="pt", max_length=512
)

In [None]:
try:
    # Load ONNX models
    mlp_classifier = ort.InferenceSession(
        "models/text_classifier_optimized_int8.onnx",
        providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
    )
    print("Successfully loaded all ONNX models")

except Exception as e:
    print(f"Error loading ONNX models: {e}")


In [None]:
domain_data = pd.read_csv("data/domain_eval.csv")
ood_data = pd.read_csv("data/ood_eval.csv")

data = {"domain": domain_data, "ood": ood_data}

In [None]:
for domain, inference_df in data.items():
    print(f"\nProcessing {domain} dataset...")
    predictions_mlp = []
    prediction_times_mlp = []
    actuals_mlp = []

    try:
        for _, row in tqdm(inference_df.iterrows(), total=len(inference_df)):
            # Tokenize input text
            start_time = time.perf_counter_ns()
            inputs = tokenizer_func(row["prompt"])

            # Convert to numpy arrays for ONNX
            onnx_inputs = {
                'input_ids': inputs['input_ids'].numpy(),
                'attention_mask': inputs['attention_mask'].numpy(),
            }

            # Run inference
            pred = mlp_classifier.run(None, onnx_inputs)[0]
            end_time = time.perf_counter_ns()
            prediction_times_mlp.append(end_time - start_time)
            predictions_mlp.append(0 if np.argmax(pred) == 3 else 1)
            

        # Evaluate results
        evaluate_run(
            predictions=predictions_mlp,
            true_labels=actuals_mlp,
            latency=statistics.mean(prediction_times_mlp),
            domain=domain,
            embed_model="MiniLM_L12",
            model_name="MLP_ONNX",
            train_acc=0.0,
            cost=0.0,
            training=False,
        )

    except Exception as e:
        print(f"Error processing {domain} dataset: {e}")
        continue
