# T5-ParaDetox Pipeline
This notebook mirrors the XDetox_Pipeline structure for direct comparison:

- **Small-batch runs**: choose how many examples to process
- **Dataset picker**: run a single dataset or **all**
- **Same datasets** as XDetox (paradetox, microagressions, sbf, dynabench, jigsaw, appdia)
- **Same evaluation metrics** (BLEU, BERTScore, Perplexity, Toxicity)
- **Same output format** (CSV summaries)

> **Prereqs**: You have the trained T5 model checkpoint on Drive and datasets available.

## Setup

In [1]:
#@title Mount Drive & locate project
from google.colab import drive
drive.mount('/content/drive')

import os, glob, re, sys, torch, json, shutil, math, nltk
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
from pathlib import Path

# Set your project base path
PROJECT_BASE = "/content/drive/MyDrive/ds266/w266 - Project"
print("Try PROJECT_BASE:", PROJECT_BASE, "->", os.path.isdir(PROJECT_BASE))

assert os.path.isdir(PROJECT_BASE), f"PROJECT_BASE does not exist: {PROJECT_BASE}"
print("Using PROJECT_BASE:", PROJECT_BASE)

Mounted at /content/drive
Try PROJECT_BASE: /content/drive/MyDrive/ds266/w266 - Project -> True
Using PROJECT_BASE: /content/drive/MyDrive/ds266/w266 - Project


In [2]:
#@title Runtime setup (paths, cache, GPU)
# HuggingFace cache
HF_CACHE = os.path.join(PROJECT_BASE, "cache")
os.environ["TRANSFORMERS_CACHE"] = HF_CACHE
os.environ["WANDB_DISABLED"] = "true"

# Model checkpoint path
T5_CHECKPOINT = os.path.join(PROJECT_BASE, "t5-base-detox-model")

print("PROJECT_BASE:", PROJECT_BASE)
print("T5_CHECKPOINT:", T5_CHECKPOINT)
print("TRANSFORMERS_CACHE:", HF_CACHE)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))

PROJECT_BASE: /content/drive/MyDrive/ds266/w266 - Project
T5_CHECKPOINT: /content/drive/MyDrive/ds266/w266 - Project/t5-base-detox-model
TRANSFORMERS_CACHE: /content/drive/MyDrive/ds266/w266 - Project/cache
CUDA available: False


In [3]:
#@title Install dependencies
# Use Colab's pre-installed versions (no version pinning needed)
!pip install -q transformers torch datasets
!pip install -q evaluate sacrebleu bert-score
!pip install -q sentence-transformers
!pip install -q accelerate -U
!pip install -q rouge_score
!pip install -q pandas numpy scikit-learn matplotlib nltk

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.1/104.1 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m380.9/380.9 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone


In [4]:
#@title NLTK data
nltk.download("punkt", quiet=True)
try:
    nltk.download("punkt_tab", quiet=True)
except Exception:
    pass
print("NLTK ready")

NLTK ready


In [5]:
#@title Import libraries
from transformers import (
    T5Tokenizer,
    T5ForConditionalGeneration,
    AutoTokenizer,
    AutoModelForSequenceClassification,
    GPT2Tokenizer,
    GPT2LMHeadModel
)
from torch.utils.data import Dataset
from sentence_transformers import SentenceTransformer
from evaluate import load
from sklearn.model_selection import train_test_split

print("✓ Libraries imported")



✓ Libraries imported


## Dataset Configuration

In [15]:
#@title Data configs (matching XDetox datasets)
# Note: For T5, we don't use XDetox-specific params (alpha_a, etc.)
# But we keep the same dataset paths for consistency

# These paths are relative to XDetox repo - we'll need to adapt them
# For now, assume datasets are in a shared location accessible from both notebooks

data_configs = {
    "microagressions_test": {
        "data_path": "./datasets/microagressions/test.csv",
        "format": "csv",
    },
    "sbf_test": {
        "data_path": "./datasets/sbf/sbftst.csv",
        "format": "csv",
    },
    "dynabench_test": {
        "data_path": "./datasets/dynabench/db_test.csv",
        "format": "csv",
    },
    "jigsaw_toxic": {
        "data_path": "./datasets/jigsaw_full_30/test_10k_toxic.txt",
        "format": "txt",
    },
    "paradetox": {
        "data_path": "./datasets/paradetox/test_toxic_parallel.txt",
        "format": "txt",
    },
    "appdia_original": {
        "data_path": "./datasets/appdia/original-annotated-data/original-test.tsv",
        "format": "tsv",
    },
    "appdia_discourse": {
        "data_path": "./datasets/appdia/discourse-augmented-data/discourse-test.tsv",
        "format": "tsv",
    },
}

# If datasets are in XDetox repo, point to it:
DATASET_BASE = "/content/drive/MyDrive/ds266/w266 - Project/XDetox"

print(f"✓ {len(data_configs)} test datasets configured:")
for name in data_configs.keys():
    print(f"  - {name}")

✓ 7 test datasets configured:
  - microagressions_test
  - sbf_test
  - dynabench_test
  - jigsaw_toxic
  - paradetox
  - appdia_original
  - appdia_discourse


## Helper Functions

In [25]:
#@title Helper functions for data loading

def _ensure_dir(p):
    Path(p).mkdir(parents=True, exist_ok=True)

def load_test_data(data_type, num_examples=None):
    """
    Load test data from various formats (.txt, .csv, .tsv).
    Returns list of toxic texts as strings.
    """
    if data_type not in data_configs:
        raise ValueError(f"Unknown data_type: {data_type}")

    cfg = data_configs[data_type]
    data_path = os.path.join(DATASET_BASE, cfg["data_path"].lstrip("./"))

    texts = []

    if cfg["format"] == "txt":
        with open(data_path, 'r', encoding='utf-8') as f:
            texts = [line.strip() for line in f if line.strip()]

    elif cfg["format"] == "csv":
        df = pd.read_csv(data_path)
        # Try to find the toxic text column
        if 'text' in df.columns:
            texts = df['text'].tolist()
        elif 'toxic' in df.columns:
            texts = df['toxic'].tolist()
        else:
            texts = df.iloc[:, 0].tolist()  # First column

    elif cfg["format"] == "tsv":
        df = pd.read_csv(data_path, sep='\t')
        if 'text' in df.columns:
            texts = df['text'].tolist()
        else:
            texts = df.iloc[:, 0].tolist()

    # Clean and convert to strings
    cleaned_texts = []
    for text in texts:
        # Skip NaN, None, or non-string/non-numeric values
        if pd.isna(text):
            continue
        # Convert to string
        text_str = str(text).strip()
        # Skip empty strings
        if text_str:
            cleaned_texts.append(text_str)

    # Limit to num_examples if specified
    if num_examples and num_examples > 0:
        cleaned_texts = cleaned_texts[:num_examples]

    return cleaned_texts

def _safe_float(x):
    try:
        return float(x)
    except Exception:
        return float('nan')

print("✓ Helper functions loaded")

✓ Helper functions loaded


## T5 Model Loading

In [26]:
#@title Load T5 model
print(f"Loading T5 model from {T5_CHECKPOINT}...")

t5_tokenizer = T5Tokenizer.from_pretrained(T5_CHECKPOINT)
t5_model = T5ForConditionalGeneration.from_pretrained(T5_CHECKPOINT)
t5_model.eval()

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
t5_model = t5_model.to(device)

print(f"✓ T5 model loaded on {device}")

Loading T5 model from /content/drive/MyDrive/ds266/w266 - Project/t5-base-detox-model...
✓ T5 model loaded on cpu


## T5 Inference Function

In [27]:
#@title T5 inference function

def t5_detoxify_text(text, model, tokenizer, max_length=128, device="cuda"):
    """
    Generate detoxified text using T5 model.
    """
    input_text = f"detoxify: {text}"
    input_ids = tokenizer.encode(input_text, return_tensors='pt', max_length=max_length, truncation=True)
    input_ids = input_ids.to(device)

    with torch.no_grad():
        outputs = model.generate(
            input_ids,
            max_length=max_length,
            num_beams=5,
            early_stopping=True,
            no_repeat_ngram_size=2
        )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

def t5_detoxify_batch(texts, model, tokenizer, max_length=128, device="cuda", batch_size=8):
    """
    Batch inference for efficiency.
    """
    results = []

    for i in tqdm(range(0, len(texts), batch_size), desc="T5 Generation"):
        batch = texts[i:i+batch_size]
        batch_results = []

        for text in batch:
            result = t5_detoxify_text(text, model, tokenizer, max_length, device)
            batch_results.append(result)

        results.extend(batch_results)

    return results

# Test
test_text = "This is a stupid idea"
detoxified = t5_detoxify_text(test_text, t5_model, t5_tokenizer, device=device)
print(f"Test: {test_text}")
print(f"Result: {detoxified}")

Test: This is a stupid idea
Result: This is a bad idea


## Evaluation Functions

In [28]:
#@title Load evaluation models (matching XDetox metrics)

print("Loading evaluation models...")

# Toxicity classifier
tox_tokenizer = AutoTokenizer.from_pretrained("s-nlp/roberta_toxicity_classifier")
tox_model = AutoModelForSequenceClassification.from_pretrained("s-nlp/roberta_toxicity_classifier")
tox_model.eval()
tox_model = tox_model.to(device)

# Perplexity model (GPT-2)
ppl_tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
ppl_model = GPT2LMHeadModel.from_pretrained("gpt2-medium")
ppl_model.eval()
ppl_model = ppl_model.to(device)
if ppl_tokenizer.pad_token is None:
    ppl_tokenizer.pad_token = ppl_tokenizer.eos_token

# Sentence embeddings
sim_model = SentenceTransformer('all-MiniLM-L6-v2')

# Metrics from evaluate
bleu_metric = load("sacrebleu")
bertscore_metric = load("bertscore")

print("✓ Evaluation models loaded")

Loading evaluation models...


Some weights of the model checkpoint at s-nlp/roberta_toxicity_classifier were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


✓ Evaluation models loaded


In [29]:
#@title Evaluation functions

def compute_toxicity(texts, tokenizer, model, device="cuda", batch_size=32):
    """
    Compute average toxicity score.
    """
    all_scores = []

    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        inputs = tokenizer(batch, return_tensors="pt", padding=True, truncation=True, max_length=512)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        with torch.no_grad():
            outputs = model(**inputs)
            predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
            toxic_probs = predictions[:, 1]  # Label 1 = toxic
            all_scores.extend(toxic_probs.cpu().tolist())

    return np.mean(all_scores)

def compute_perplexity(texts, tokenizer, model, device="cuda"):
    """
    Compute average perplexity.
    """
    perplexities = []

    for text in texts:
        inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        with torch.no_grad():
            outputs = model(**inputs, labels=inputs["input_ids"])
            loss = outputs.loss
            ppl = torch.exp(loss).item()
            perplexities.append(ppl)

    return np.mean(perplexities)

def compute_bertscore(predictions, references):
    """
    Compute BERTScore.
    """
    result = bertscore_metric.compute(predictions=predictions, references=references, lang="en")
    return np.mean(result['f1'])

def compute_bleu(predictions, references):
    """
    Compute BLEU score.
    """
    # Format references as list of lists
    formatted_refs = [[ref] for ref in references]
    result = bleu_metric.compute(predictions=predictions, references=formatted_refs)
    return result['score'] / 100.0  # Convert to 0-1 range

def evaluate_all(orig_texts, gen_texts, device="cuda"):
    """
    Run all evaluations (matching XDetox metrics).
    """
    results = {}

    print("  Computing toxicity scores...")
    results['toxicity_gen'] = compute_toxicity(gen_texts, tox_tokenizer, tox_model, device)
    results['toxicity_orig'] = compute_toxicity(orig_texts, tox_tokenizer, tox_model, device)

    print("  Computing perplexity...")
    results['perplexity_gen'] = compute_perplexity(gen_texts, ppl_tokenizer, ppl_model, device)
    results['perplexity_orig'] = compute_perplexity(orig_texts, ppl_tokenizer, ppl_model, device)

    print("  Computing BERTScore...")
    results['bertscore'] = compute_bertscore(gen_texts, orig_texts)

    print("  Computing BLEU...")
    results['bleu4'] = compute_bleu(gen_texts, orig_texts)

    return results

print("✓ Evaluation functions defined")

✓ Evaluation functions defined


## Main Pipeline Function

In [30]:
#@title T5 detoxify pipeline (matching XDetox interface)

def t5_detoxify_pipeline(
    data_type: str = "paradetox",
    output_folder: str = "t5_colab_run",
    batch_size: int = 8,
    max_length: int = 128,
    num_examples: int = 100,
    overwrite_gen: bool = False,
    run_eval: bool = True,
    overwrite_eval: bool = False,
):
    """
    Run T5 detoxification pipeline on a dataset.

    Mirrors XDetox's detoxify() function interface.
    """
    assert data_type in data_configs, f"Unknown data_type: {data_type}"

    # Create output directory
    base_out_dir = os.path.join(PROJECT_BASE, "data", "t5_outputs", output_folder)
    data_out_dir = os.path.join(base_out_dir, data_type)
    _ensure_dir(data_out_dir)

    orig_path = os.path.join(data_out_dir, "orig.txt")
    gen_path = os.path.join(data_out_dir, "gen.txt")
    stats_path = os.path.join(data_out_dir, "gen_stats.txt")

    # Generate if needed
    if overwrite_gen or not os.path.exists(gen_path):
        print(f"\n[{data_type}] Loading data...")
        orig_texts = load_test_data(data_type, num_examples)
        print(f"  Loaded {len(orig_texts)} examples")

        print(f"  Generating detoxified texts...")
        gen_texts = t5_detoxify_batch(
            orig_texts,
            t5_model,
            t5_tokenizer,
            max_length=max_length,
            device=device,
            batch_size=batch_size
        )

        # Save outputs
        with open(orig_path, 'w') as f:
            for text in orig_texts:
                f.write(text + '\n')

        with open(gen_path, 'w') as f:
            for text in gen_texts:
                f.write(text + '\n')

        print(f"  ✓ Saved outputs to {data_out_dir}")
    else:
        print(f"\n[{data_type}] Loading existing outputs...")
        with open(orig_path, 'r') as f:
            orig_texts = [line.strip() for line in f]
        with open(gen_path, 'r') as f:
            gen_texts = [line.strip() for line in f]
        print(f"  Loaded {len(gen_texts)} examples")

    # Evaluate if needed
    if run_eval and (overwrite_eval or not os.path.exists(stats_path)):
        print(f"  Running evaluation...")
        results = evaluate_all(orig_texts, gen_texts, device=device)

        # Save stats
        with open(stats_path, 'w') as f:
            for key, value in results.items():
                f.write(f"{key}: {value}\n")

        print(f"  ✓ Saved stats to {stats_path}")
        return results

    elif run_eval:
        # Load existing stats
        print(f"  Loading existing stats...")
        results = {}
        with open(stats_path, 'r') as f:
            for line in f:
                if ': ' in line:
                    key, value = line.strip().split(': ', 1)
                    results[key] = _safe_float(value)
        return results

    return None

print("✓ T5 pipeline function defined")

✓ T5 pipeline function defined


## Run Evaluation on All Datasets

In [31]:
#@title Run T5 on multiple datasets

# Configuration
datasets_to_eval = ["paradetox", "microagressions_test", "sbf_test", "dynabench_test"]
num_examples = 200  # Match XDetox
output_folder = "t5_comparison_run"

# Store results
all_results = {}

print("="*80)
print("T5-PARADETOX PIPELINE")
print("="*80)

for dataset_name in datasets_to_eval:
    try:
        results = t5_detoxify_pipeline(
            data_type=dataset_name,
            output_folder=output_folder,
            batch_size=8,
            max_length=128,
            num_examples=num_examples,
            overwrite_gen=False,
            run_eval=True,
            overwrite_eval=False
        )

        if results:
            all_results[dataset_name] = results
            print(f"  ✓ {dataset_name} complete!")

    except Exception as e:
        print(f"  ✗ Error on {dataset_name}: {e}")
        import traceback
        traceback.print_exc()
        continue

print("\n" + "="*80)

T5-PARADETOX PIPELINE

[paradetox] Loading existing outputs...
  Loaded 200 examples
  Loading existing stats...
  ✓ paradetox complete!

[microagressions_test] Loading data...
  Loaded 200 examples
  Generating detoxified texts...


T5 Generation:   0%|          | 0/25 [00:00<?, ?it/s]

  ✓ Saved outputs to /content/drive/MyDrive/ds266/w266 - Project/data/t5_outputs/t5_comparison_run/microagressions_test
  Running evaluation...
  Computing toxicity scores...
  Computing perplexity...
  Computing BERTScore...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  Computing BLEU...
  ✓ Saved stats to /content/drive/MyDrive/ds266/w266 - Project/data/t5_outputs/t5_comparison_run/microagressions_test/gen_stats.txt
  ✓ microagressions_test complete!

[sbf_test] Loading data...
  Loaded 200 examples
  Generating detoxified texts...


T5 Generation:   0%|          | 0/25 [00:00<?, ?it/s]

  ✓ Saved outputs to /content/drive/MyDrive/ds266/w266 - Project/data/t5_outputs/t5_comparison_run/sbf_test
  Running evaluation...
  Computing toxicity scores...
  Computing perplexity...
  Computing BERTScore...
  Computing BLEU...
  ✓ Saved stats to /content/drive/MyDrive/ds266/w266 - Project/data/t5_outputs/t5_comparison_run/sbf_test/gen_stats.txt
  ✓ sbf_test complete!

[dynabench_test] Loading existing outputs...
  Loaded 200 examples
  Loading existing stats...
  ✓ dynabench_test complete!



## Results Summary

In [32]:
#@title Display results table (matching XDetox format)

if all_results:
    # Convert to DataFrame
    rows = []
    for dataset_name, results in all_results.items():
        row = {'dataset': dataset_name}
        row.update(results)
        rows.append(row)

    df = pd.DataFrame(rows)

    # Reorder columns to match XDetox
    col_order = [
        'dataset',
        'bertscore',
        'bleu4',
        'perplexity_gen',
        'perplexity_orig',
        'toxicity_gen',
        'toxicity_orig',
    ]
    df = df[[col for col in col_order if col in df.columns]]

    # Save to CSV
    summary_csv = os.path.join(PROJECT_BASE, "data", "t5_outputs", output_folder, "t5_summary.csv")
    df.to_csv(summary_csv, index=False)
    print(f"✓ Saved summary to {summary_csv}\n")

    # Display
    print("="*80)
    print("T5-PARADETOX RESULTS SUMMARY")
    print("="*80)
    print(df.to_string(index=False))
    print("="*80)
else:
    print("No results available.")

✓ Saved summary to /content/drive/MyDrive/ds266/w266 - Project/data/t5_outputs/t5_comparison_run/t5_summary.csv

T5-PARADETOX RESULTS SUMMARY
             dataset  bertscore    bleu4  perplexity_gen  perplexity_orig  toxicity_gen  toxicity_orig
           paradetox   0.952658 0.629718      253.352590       409.280498      0.179528       0.979286
microagressions_test   0.981540 0.757424      119.182082       108.895894      0.038149       0.050522
            sbf_test   0.997629 0.000000             NaN              NaN      0.000046       0.000045
      dynabench_test   0.982489 0.862799      327.626058       312.833179      0.381875       0.499826


## Comparison Notes

This notebook produces results in the same format as XDetox_Pipeline.ipynb:

**Metrics (same as XDetox)**:
- `bertscore`: Semantic similarity
- `bleu4`: N-gram overlap
- `perplexity_gen`: Fluency of generated text
- `perplexity_orig`: Fluency of original text
- `toxicity_gen`: Toxicity of generated text
- `toxicity_orig`: Toxicity of original text

**Datasets (same as XDetox)**:
- paradetox
- microagressions_test
- sbf_test
- dynabench_test
- jigsaw_toxic
- appdia_original
- appdia_discourse

**To compare with XDetox**:
1. Run XDetox_Pipeline.ipynb with same `num_examples` setting
2. Run this notebook with same `num_examples` setting
3. Compare the CSV summaries side-by-side
4. Lower `toxicity_gen` = better detoxification
5. Higher `bertscore` = better meaning preservation
6. Lower `perplexity_gen` = more fluent outputs