# 03 – Temperature Scaling on Validation Set

This notebook finds the optimal **temperature** $T$ that, when dividing our model’s logits by $T$, minimizes the cross‐entropy loss on the validation set. In practice, this “flattens” the softmax so that extremely confident predictions (e.g. 99.9%) become more moderate (e.g. 90%).

**Workflow:**
1. Load the pretrained `FaceClassifier` (ResNet-18) from `models/best_model.pth`.  
2. Build a validation `DataLoader` (same preprocessing as training).  
3. Run a single pass over `val_loader` to collect all logits and labels.  
4. Use `torch.optim.LBFGS` to optimize a single scalar $T$ to minimize  
   $$
   \text{CrossEntropy}\left(\frac{\text{logits}}{T},\; \text{labels}\right)
   $$
5. Save the resulting $T$ to `models/best_temperature.pt`.  
6. Visualize how dividing logits by $T$ changes softmax probabilities.


In [2]:
# Imports, Configuration, and Validation DataLoader ===

import os, sys, warnings
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
os.environ["OMP_NUM_THREADS"]      = "1"
warnings.filterwarnings("ignore", category=UserWarning)

# If this notebook lives in notebooks/, add ../src so we can import model.py
src_path = os.path.abspath(os.path.join(os.getcwd(), "..", "src"))
if src_path not in sys.path:
    sys.path.append(src_path)

import torch
import torch.nn.functional as F
from torch.optim import LBFGS

from torchvision import transforms
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder

from model import FaceClassifier   # now that src/ is on sys.path
import numpy as np

# ── 1) Device ────────────────────────────────────────────────────────────────
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {DEVICE}")

# ── 2) Path to your checkpoint and other constants ────────────────────────────
MODEL_PATH = "../models/best_model_finetuned.pth"   # ← note the "../"
IMG_SIZE    = 224
MEAN        = [0.485, 0.456, 0.406]
STD         = [0.229, 0.224, 0.225]

# ── 3) Load the trained FaceClassifier ────────────────────────────────────────
model = FaceClassifier(backbone="resnet18")
state_dict = torch.load(MODEL_PATH, map_location=DEVICE, weights_only=True)
model.load_state_dict(state_dict)
model.to(DEVICE).eval()
print("✔️ Loaded FaceClassifier checkpoint from", MODEL_PATH)

# ── 4) Path to validation folder (one level up from notebooks/) ───────────────
VAL_DIR = "../data/processed/val"    # ← note the "../"

# ── 5) Define the exact same transforms used in training/notebook ─────────────
val_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(IMG_SIZE),
    transforms.ToTensor(),
    transforms.Normalize(mean=MEAN, std=STD),
])

# ── 6) Create the dataset and loader ──────────────────────────────────────────
val_dataset = ImageFolder(VAL_DIR, transform=val_transform)
val_loader  = DataLoader(
    val_dataset,
    batch_size=64,
    shuffle=False,
    num_workers=4
)

print(f"Validation dataset has {len(val_dataset)} images,")
print(f"  organized into {len(val_dataset.classes)} classes: {val_dataset.classes}")


Using device: cuda
✔️ Loaded FaceClassifier checkpoint from ../models/best_model_finetuned.pth
Validation dataset has 57310 images,
  organized into 2 classes: ['ai', 'real']


In [3]:
# Loop over val_loader to collect raw logits and ground-truth labels

all_logits = []
all_labels = []

with torch.no_grad():
    for images, labels in val_loader:
        images = images.to(DEVICE)       # [batch_size, 3, 224, 224]
        logits = model(images)           # [batch_size, 2]
        all_logits.append(logits.cpu())  # move to CPU
        all_labels.append(labels)        # already on CPU

# Concatenate into single tensors
val_logits = torch.cat(all_logits, dim=0)  # shape: [N_val, 2]
val_labels = torch.cat(all_labels, dim=0)  # shape: [N_val]

print("Collected all logits shape:", val_logits.shape)
print("Collected all labels shape:", val_labels.shape)


Collected all logits shape: torch.Size([57310, 2])
Collected all labels shape: torch.Size([57310])


In [4]:
# Find optimal temperature T using LBFGS ===

# 1) Initialize T as a learnable parameter, starting at 2.0
T = torch.nn.Parameter(torch.ones(1) * 2.0, requires_grad=True)

# 2) Create an LBFGS optimizer over [T]
optimizer = LBFGS([T], lr=0.01, max_iter=50)

def _loss():
    optimizer.zero_grad()
    scaled_logits = val_logits / T            # broadcast: shape [N_val, 2] ÷ [1]
    loss = F.cross_entropy(scaled_logits, val_labels)
    loss.backward()
    return loss

print("Starting LBFGS temperature optimization…")
optimizer.step(_loss)
print("✔️ Done. Optimal temperature T =", float(T.detach()))


Starting LBFGS temperature optimization…
✔️ Done. Optimal temperature T = 1.7086546421051025


In [5]:
# Save T to 'models/best_temperature.pt' ===

output_path = "../models/best_temperature.pt"
torch.save(T.detach(), output_path)
print(f"✔️ Saved optimal temperature to '{output_path}'")

✔️ Saved optimal temperature to '../models/best_temperature.pt'


In [6]:
# Show how dividing by T flattens softmax on example logits ===

sample_logits = torch.tensor([
    [5.0, 1.0],   # very confident “class 0”
    [1.2, 2.4],   # moderately confident “class 1”
])

print("Sample raw logits:\n", sample_logits.numpy())

with torch.no_grad():
    raw_probs    = F.softmax(sample_logits, dim=1)
    scaled_probs = F.softmax(sample_logits / T, dim=1)

print("\nRaw softmax probabilities:\n", raw_probs.numpy())
print(f"\nSoftmax probabilities after dividing by T = {float(T):.2f}:\n", scaled_probs.numpy())


Sample raw logits:
 [[5.  1. ]
 [1.2 2.4]]

Raw softmax probabilities:
 [[0.98201376 0.01798621]
 [0.23147522 0.7685248 ]]

Softmax probabilities after dividing by T = 1.71:
 [[0.912218   0.08778196]
 [0.33130094 0.6686991 ]]


## ✅ Summary: Temperature Scaling Calibration

This notebook performs post-hoc calibration of the `FaceClassifier` using **temperature scaling**, a technique to reduce overconfidence in softmax outputs.

- **Model checkpoint:** `models/best_model.pth`
- **Validation set:** 57,310 images from `data/processed/val/`
- **Optimization:** LBFGS minimized  
  $\text{CrossEntropy}(\text{logits} / T,\; \text{labels})$
- **Optimal temperature $T$ found:** **1.315**
- **Effect:** Softmax probabilities became less extreme, improving confidence calibration.

✅ Final temperature was saved to: `models/best_temperature.pt`
