# AI Text Detector — Fast Training (DistilBERT, ~30-45 min)

**Before running:** `Runtime → Change runtime type → GPU → T4 (or A100) → Save`

## Why the v2 notebook (RoBERTa) likely failed

| Problem | Detail |
|---|---|
| Training timeout | ~385K examples × 4 epochs with RoBERTa + 512 tokens ≈ 3–6 h. Colab disconnects end training mid-epoch, leaving a partially-fitted or uncalibrated model. |
| Class imbalance | 300K AI samples vs 150K human (≈ 2:1). Without class-weight compensation the model learns to always predict AI and gets ~70% accuracy by never saying 'human'. |
| Slow RAID load | `to_pandas()` on 5.6 M rows uses 8–10 GB RAM and 15–20 min just loading data. |
| MAX_LEN=512 | Self-attention is O(n²). 512 tokens is 4× slower per batch than 256. |

## What this notebook does differently

| | v2 (broken) | v3 (this) |
|---|---|---|
| Base model | RoBERTa-base (125 M) | DistilBERT-base (67 M) — 2× faster |
| Max tokens | 512 | 256 — 4× faster attention |
| Dataset | ~385K (unbalanced) | ~48K HC3, perfectly balanced |
| RAID loading | full 5.6 M rows → pandas | skipped (not needed for baseline) |
| Training time | 3–6 h | **~30–45 min on T4, ~20 min on A100** |
| Class balance | 70 % AI / 30 % human | 50 / 50 enforced |
| Precision | bf16 only (A100) | fp16 (works on T4 AND A100) |

**Output**: a `best-model/` directory compatible with the existing `model_loader.py`.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
SAVE_DIR = '/content/drive/MyDrive/ai-detector-fast'
os.makedirs(SAVE_DIR, exist_ok=True)
print(f'Drive mounted. Model will save to: {SAVE_DIR}')

In [None]:
!pip install -q transformers datasets accelerate evaluate scikit-learn seaborn

In [None]:
# ── Configuration ──────────────────────────────────────────────────────────────
BASE_MODEL   = 'distilbert-base-uncased'  # 67M params — ~2x faster than roberta-base
MAX_LEN      = 256    # 4x faster attention than 512; most text fits in 256 tokens
EPOCHS       = 3
BATCH_SIZE   = 32
LR           = 3e-5
WEIGHT_DECAY = 0.01
WARMUP_RATIO = 0.06
SEED         = 42

# Cap per class — HC3 has ~24K AI and ~37K human rows after exploding.
# We cap both at PER_CLASS to keep training perfectly balanced.
PER_CLASS = 24_000

print('Config loaded.')
print(f'  Base model : {BASE_MODEL}')
print(f'  Max tokens : {MAX_LEN}')
print(f'  Epochs     : {EPOCHS}')
print(f'  Save dir   : {SAVE_DIR}')

In [None]:
import pandas as pd
import numpy as np

# ── Load HC3 (ChatGPT vs human answers from Reddit/medicine/finance/etc.) ──────
# Loads from HuggingFace as a parquet — much faster than RAID
print('Loading HC3...')
hc3_raw = pd.read_parquet(
    'hf://datasets/Hello-SimpleAI/HC3@refs/convert/parquet/all/train/0000.parquet'
)
print(f'HC3 raw rows: {len(hc3_raw):,}')

# Explode the list columns so each answer is its own row
human_rows = (
    hc3_raw[['human_answers']]
    .explode('human_answers')
    .rename(columns={'human_answers': 'text'})
    .assign(label=0)
)
ai_rows = (
    hc3_raw[['chatgpt_answers']]
    .explode('chatgpt_answers')
    .rename(columns={'chatgpt_answers': 'text'})
    .assign(label=1)
)

# Clean: strip whitespace, drop nulls, drop very short responses (< 15 words)
for df in [human_rows, ai_rows]:
    df['text'] = df['text'].astype(str).str.strip()

human_rows = human_rows[
    human_rows['text'].str.split().str.len() >= 15
].dropna(subset=['text']).reset_index(drop=True)

ai_rows = ai_rows[
    ai_rows['text'].str.split().str.len() >= 15
].dropna(subset=['text']).reset_index(drop=True)

print(f'After filtering — Human: {len(human_rows):,}  AI: {len(ai_rows):,}')

# ── Balance: undersample majority class to PER_CLASS ──────────────────────────
n = min(len(human_rows), len(ai_rows), PER_CLASS)
human_bal = human_rows.sample(n, random_state=SEED).reset_index(drop=True)
ai_bal    = ai_rows.sample(n, random_state=SEED).reset_index(drop=True)

combined = pd.concat([human_bal, ai_bal], ignore_index=True)
combined = combined.sample(frac=1, random_state=SEED).reset_index(drop=True)

print(f'\n=== Dataset ===')
print(f'Total rows : {len(combined):,}')
print(f'Human (0)  : {(combined["label"]==0).sum():,}')
print(f'AI    (1)  : {(combined["label"]==1).sum():,}')
print('Balance is exactly 50/50 ✓')

In [None]:
from sklearn.model_selection import train_test_split

# 80% train / 10% val / 10% test
train_df, test_df = train_test_split(
    combined, test_size=0.10, random_state=SEED, stratify=combined['label']
)
train_df, val_df = train_test_split(
    train_df, test_size=0.1111, random_state=SEED, stratify=train_df['label']
)  # 0.1111 * 0.90 ≈ 0.10 of total

train_df = train_df.reset_index(drop=True)
val_df   = val_df.reset_index(drop=True)
test_df  = test_df.reset_index(drop=True)

print(f'Train : {len(train_df):,}  |  Val : {len(val_df):,}  |  Test : {len(test_df):,}')

In [None]:
from datasets import Dataset
from transformers import AutoTokenizer

print(f'Loading tokenizer: {BASE_MODEL}')
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

def tokenize(batch):
    return tokenizer(
        batch['text'],
        truncation=True,
        padding='max_length',
        max_length=MAX_LEN,
    )

train_ds = Dataset.from_pandas(train_df[['text', 'label']])
val_ds   = Dataset.from_pandas(val_df[['text', 'label']])
test_ds  = Dataset.from_pandas(test_df[['text', 'label']])

# num_proc=2 speeds up tokenization in Colab
train_tok = train_ds.map(tokenize, batched=True, batch_size=1000, num_proc=2)
val_tok   = val_ds.map(tokenize,   batched=True, batch_size=1000, num_proc=2)
test_tok  = test_ds.map(tokenize,  batched=True, batch_size=1000, num_proc=2)

cols = ['input_ids', 'attention_mask', 'label']
train_tok.set_format('torch', columns=cols)
val_tok.set_format('torch',   columns=cols)
test_tok.set_format('torch',  columns=cols)

print('Tokenization done.')

In [None]:
import evaluate
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    BASE_MODEL,
    num_labels=2,
    id2label={0: 'human', 1: 'ai'},
    label2id={'human': 0, 'ai': 1},
)

total_params = sum(p.numel() for p in model.parameters())
print(f'Model: {BASE_MODEL}  |  Params: {total_params/1e6:.1f}M')

# ── Metrics ──
acc_metric = evaluate.load('accuracy')
f1_metric  = evaluate.load('f1')

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    acc      = acc_metric.compute(predictions=preds, references=labels)['accuracy']
    f1_mac   = f1_metric.compute(predictions=preds, references=labels, average='macro')['f1']
    f1_ai    = f1_metric.compute(predictions=preds, references=labels, average='binary', pos_label=1)['f1']
    f1_human = f1_metric.compute(predictions=preds, references=labels, average='binary', pos_label=0)['f1']
    return {
        'accuracy' : acc,
        'f1_macro' : f1_mac,
        'f1_ai'    : f1_ai,
        'f1_human' : f1_human,
    }

In [None]:
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback

args = TrainingArguments(
    output_dir=SAVE_DIR,

    # Learning rate
    learning_rate=LR,
    lr_scheduler_type='cosine',
    warmup_ratio=WARMUP_RATIO,

    # Batch
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=64,

    # Regularization
    weight_decay=WEIGHT_DECAY,

    # Epochs
    num_train_epochs=EPOCHS,
    eval_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='f1_macro',

    # Performance — fp16 works on T4 AND A100 (unlike bf16 which needs A100)
    fp16=True,
    dataloader_num_workers=2,
    group_by_length=True,   # batches similar-length seqs together → faster

    # Logging
    logging_steps=50,
    report_to='none',
    seed=SEED,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_tok,
    eval_dataset=val_tok,
    processing_class=tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
)

# ── Time estimate ──
steps_per_epoch = len(train_tok) // BATCH_SIZE
total_steps     = steps_per_epoch * EPOCHS
print(f'Train examples : {len(train_tok):,}')
print(f'Steps / epoch  : {steps_per_epoch:,}')
print(f'Total steps    : {total_steps:,}')
print(f'Est. time (T4) : ~{total_steps * 0.22 / 60:.0f} min')
print(f'Est. time (A100): ~{total_steps * 0.10 / 60:.0f} min')
print('\nTrainer ready — run next cell to start training.')

In [None]:
# ── TRAIN ─────────────────────────────────────────────────────────────────────
# Expect ~30-45 min on T4, ~15-20 min on A100.
# EarlyStoppingCallback will stop if val f1_macro stops improving.

trainer.train()
print('\nTraining complete!')

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix

# ── Full evaluation on held-out test set ──────────────────────────────────────
test_pred = trainer.predict(test_tok)
preds     = np.argmax(test_pred.predictions, axis=-1)
labels    = test_pred.label_ids

print('=== Test Set Results ===')
print(classification_report(labels, preds, target_names=['Human', 'AI'], digits=4))

# Sanity check: if precision/recall is 0 for either class, the model is broken
from sklearn.metrics import precision_recall_fscore_support
p, r, f, _ = precision_recall_fscore_support(labels, preds, average='macro')
if p < 0.55 or r < 0.55:
    print('\n⚠️  WARNING: low precision or recall — model may be predicting only one class.')
    print('   Check the confusion matrix. If one row is all zeros, retrain with a lower LR.')

# Confusion matrix
cm = confusion_matrix(labels, preds)
plt.figure(figsize=(5, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Human', 'AI'], yticklabels=['Human', 'AI'])
plt.title('Confusion Matrix — Test Set')
plt.ylabel('True')
plt.xlabel('Predicted')
plt.tight_layout()
plt.savefig(f'{SAVE_DIR}/confusion_matrix.png', dpi=150)
plt.show()

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import json

# ── Temperature scaling calibration ───────────────────────────────────────────
# Makes the model's confidence scores better calibrated (less overconfident).
# Calibrated temperature is saved into training_config.json so model_loader.py
# picks it up automatically — no hardcoding needed.

val_pred   = trainer.predict(val_tok)
val_logits = torch.tensor(val_pred.predictions, dtype=torch.float32)
val_labels = torch.tensor(val_pred.label_ids,   dtype=torch.long)

class TemperatureScaler(nn.Module):
    def __init__(self):
        super().__init__()
        self.log_temp = nn.Parameter(torch.zeros(1))
    def forward(self, logits):
        return logits / torch.exp(self.log_temp)

scaler = TemperatureScaler()
opt    = torch.optim.LBFGS([scaler.log_temp], lr=0.1, max_iter=100)

def calib_loss():
    opt.zero_grad()
    loss = F.cross_entropy(scaler(val_logits), val_labels)
    loss.backward()
    return loss

opt.step(calib_loss)

T = float(torch.exp(scaler.log_temp).detach().numpy()[0])
print(f'Calibrated temperature: {T:.4f}')
print('(Values > 1 reduce overconfidence; < 1 increase it)')

In [None]:
# ── Save best model + tokenizer + training config ─────────────────────────────
BEST_MODEL_DIR = f'{SAVE_DIR}/best-model'
os.makedirs(BEST_MODEL_DIR, exist_ok=True)

# Save model weights and tokenizer
trainer.model.save_pretrained(BEST_MODEL_DIR)
tokenizer.save_pretrained(BEST_MODEL_DIR)

# Save temperature + metadata so model_loader.py picks them up
training_config = {
    'temperature' : T,
    'base_model'  : BASE_MODEL,
    'max_length'  : MAX_LEN,
}
config_path = f'{BEST_MODEL_DIR}/training_config.json'
with open(config_path, 'w') as f:
    json.dump(training_config, f, indent=2)

print(f'Model saved to: {BEST_MODEL_DIR}')
print(f'training_config.json: {training_config}')
print()
for fname in sorted(os.listdir(BEST_MODEL_DIR)):
    size = os.path.getsize(f'{BEST_MODEL_DIR}/{fname}') / 1e6
    print(f'  {fname}  ({size:.1f} MB)')

In [None]:
# ── Sanity-check inference ────────────────────────────────────────────────────

def predict_text(text: str, temp: float = T, threshold: float = 0.85):
    m = trainer.model
    m.eval()
    inputs = tokenizer(
        text, return_tensors='pt', truncation=True, max_length=MAX_LEN
    )
    inputs = {k: v.to(m.device) for k, v in inputs.items()}
    with torch.no_grad():
        logits = m(**inputs).logits[0].detach().cpu()
    probs   = torch.softmax(logits / temp, dim=-1).numpy()
    ai_p    = float(probs[1])
    human_p = float(probs[0])
    pred    = (
        'uncertain' if max(ai_p, human_p) < threshold
        else ('ai' if ai_p >= human_p else 'human')
    )
    print(f'  AI: {ai_p:.4f}  Human: {human_p:.4f}  → {pred}')
    return {'ai_prob': ai_p, 'human_prob': human_p, 'pred': pred}


print('--- Should be HUMAN (casual) ---')
predict_text(
    "i was just kinda sitting there and then boom it happened lol idk man it was weird "
    "tbh i dont even know what to say about it, really strange experience overall"
)

print('\n--- Should be HUMAN (news) ---')
predict_text(
    "The Federal Reserve raised interest rates by a quarter point Wednesday, its 10th increase "
    "since March 2022, as officials signaled they may be nearing the end of their aggressive "
    "campaign to bring inflation back under control. The decision was unanimous."
)

print('\n--- Should be AI (formal essay conclusion) ---')
predict_text(
    "In conclusion, the integration of artificial intelligence into modern healthcare systems "
    "presents both significant opportunities and considerable challenges. By leveraging machine "
    "learning algorithms, medical professionals can enhance diagnostic accuracy, streamline "
    "patient care workflows, and ultimately improve patient outcomes. However, it is essential "
    "to address ethical considerations, data privacy concerns, and the need for robust "
    "validation frameworks to ensure the responsible deployment of these transformative technologies."
)

print('\n--- Should be AI (listicle / structured tips) ---')
predict_text(
    "There are several key strategies to improve your productivity. First, prioritize your tasks "
    "using the Eisenhower Matrix to distinguish between urgent and important activities. Second, "
    "implement time-blocking techniques to allocate dedicated periods for focused work. Third, "
    "minimize distractions by creating a dedicated workspace and utilizing productivity applications. "
    "Finally, regularly review your progress and adjust your strategies accordingly."
)

In [None]:
# ── Download the trained model as a zip ───────────────────────────────────────
# Unzip into model-service/model/ and restart the Python service.
# The model_loader.py will pick it up automatically.

import shutil
from google.colab import files

zip_path = '/content/ai-detector-fast-export'
shutil.make_archive(zip_path, 'zip', BEST_MODEL_DIR)

zip_size = os.path.getsize(zip_path + '.zip') / 1e6
print(f'Zip size: {zip_size:.0f} MB')
print('Downloading...')
files.download(zip_path + '.zip')

print()
print('After downloading:')
print('  1. Unzip into model-service/model/')
print('  2. Restart the Python service: uvicorn app:app --host 0.0.0.0 --port 8000 --reload')
print('  3. The model_loader.py will load DistilBERT weights + calibrated temperature automatically.')