**Week 4: Tuning, Training Controls and Explainability**

## Quick start ‚Äî run these cells in order

Follow this minimal sequence to get the notebook running. Use the numbered cell values below (these are the notebook cell numbers starting from 1).

1. Cell 2 ‚Äî Imports & device (run to set up libraries and `device`).
2. Cell 4 ‚Äî Load saved tensors & create DataLoaders (loads `*.pt` files and builds `train_loader`, `val_loader`, `test_loader`).
   - If you do not have matching `.pt` files or see a FEATURE_NAMES mismatch, run Cell 21 (see below) to rebuild tensors from the CSV, then re-run Cell 4.
3. Cell 6 ‚Äî Class-balance inspection & `get_pos_weight_tensor` helper (optional; useful when using `pos_weight`).
4. Cell 8 ‚Äî Hyperparameters (`TrainConfig`) ‚Äî edit training hyperparameters here.
5. Cell 10 ‚Äî Model factory (`TabularFFNN`).
6. Cells 14, 16, 15 ‚Äî Training helpers and entry points:
   - Cell 14: `binary_accuracy_from_logits`
   - Cell 16: `run_epoch`
   - Cell 15: `train_one_run`
   (Run these three so the training functions are available.)
7. Cell 18 ‚Äî Infer `input_dim` from `train_loader` and run training (`train_one_run`).

Explainability (after restoring or training a model):
8. Cell 20 ‚Äî Resolve `FEATURE_NAMES` (reads CSV header when available).
9. Cell 21 ‚Äî (Optional) Rebuild tensors from CSV ‚Äî run this if your `.pt` files are missing or misaligned. If you run it, re-run Cell 4 afterwards.
10. Cell 23 ‚Äî Load best checkpoint & rebuild model (`model_explain`).
11. Cell 25 ‚Äî IG config & helpers (builds baseline and IG functions).
12. Cell 27 ‚Äî Run IG global & local visualizations (produces IG plots).
13. Cell 29 ‚Äî SHAP background/sample selection.
14. Cell 31 ‚Äî Run SHAP (DeepExplainer with Kernel fallback) + visuals.

Troubleshooting notes:
- If you see an IndexError when plotting attributions, it usually means `len(FEATURE_NAMES) != num_features`; run Cell 21 (rebuild tensors) then Cell 4 to fix alignment.
- To retrain from scratch, edit hyperparameters in Cell 8 and run Cell 18.

That's it ‚Äî run cells in the order above to reproduce the notebook workflow.

In [None]:
class TrainConfig:
    def __init__(
        self,
        epochs=50,
        batch_size=64,
        learning_rate=0.001,
        weight_decay=1e-5,
        device="cpu",
    ):
        self.epochs = epochs
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.weight_decay = weight_decay
        self.device = device


In [None]:
# Load canonical feature order and scaler params (if present)
import os, json

FEATURE_NAMES = globals().get("FEATURE_NAMES", None)
SCALER_PARAMS = globals().get("SCALER_PARAMS", {})
fo_cands = [
    "feature_order.json",
    "artifacts_aligned/feature_order.json",
    "./feature_order.json",
    "../feature_order.json",
    "advanced/submissions/team-members/rajan-hans/feature_order.json",
]
for p in fo_cands:
    if os.path.exists(p):
        try:
            with open(p, "r") as f:
                fo = json.load(f)
            if isinstance(fo, list) and len(fo) > 0:
                FEATURE_NAMES = fo
                print(f"Loaded FEATURE_NAMES from {p} (len={len(FEATURE_NAMES)})")
                break
        except Exception:
            pass

sp_cands = [
    "scaler_params.json",
    "artifacts_aligned/scaler_params.json",
    "./scaler_params.json",
    "../scaler_params.json",
    "advanced/submissions/team-members/rajan-hans/scaler_params.json",
]
for p in sp_cands:
    if os.path.exists(p):
        try:
            with open(p, "r") as f:
                sc = json.load(f)
            if isinstance(sc, dict):
                SCALER_PARAMS = sc
                print(f"Loaded SCALER_PARAMS from {p} (len={len(SCALER_PARAMS)})")
                break
        except Exception:
            pass

globals()["FEATURE_NAMES"] = FEATURE_NAMES
globals()["SCALER_PARAMS"] = SCALER_PARAMS
print("Canonical feature/scale loader complete.")


Loaded FEATURE_NAMES from feature_order.json (len=21)
Loaded SCALER_PARAMS from scaler_params.json (len=0)
Canonical feature/scale loader complete.


In [None]:
# Cell 3 ‚Äî Imports, device, and automatic data-check/rebuild
# Run this cell first to set up imports, `device`, and to ensure the dataset tensors exist and match the CSV header.
# This notebook expects the following files saved in the current working dir (will be auto-created if missing):
#   X_train_tensor.pt, y_train_tensor.pt, X_val_tensor.pt, y_val_tensor.pt, X_test_tensor.pt, y_test_tensor.pt

import os
import time
from dataclasses import dataclass, asdict
from typing import List, Optional, Dict, Any
from contextlib import nullcontext

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split

# Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"üü¢ Using device: {device}")

# --- Resolve FEATURE_NAMES from CSV (preferred) or fallback list ---
csv_candidates = [
    "data/diabetes_binary_health_indicators_BRFSS2015.csv",
    "./data/diabetes_binary_health_indicators_BRFSS2015.csv",
    "../data/diabetes_binary_health_indicators_BRFSS2015.csv",
    "diabetes_binary_health_indicators_BRFSS2015.csv",
]
csv_path = next((p for p in csv_candidates if os.path.exists(p)), None)
if csv_path is not None:
    try:
        df_head = pd.read_csv(csv_path, nrows=0)
        FEATURE_NAMES = [c for c in df_head.columns if c != "Diabetes_binary"]
        print("FEATURE_NAMES derived from CSV (len=", len(FEATURE_NAMES), ")")
    except Exception as e:
        print(
            "Failed reading CSV header; using fallback FEATURE_NAMES. Error:", repr(e)
        )
        csv_path = None

if csv_path is None:
    FEATURE_NAMES = [
        "HighBP",
        "HighChol",
        "CholCheck",
        "BMI",
        "Smoker",
        "Stroke",
        "HeartDiseaseorAttack",
        "PhysActivity",
        "Fruits",
        "Veggies",
        "HvyAlcoholConsump",
        "AnyHealthcare",
        "NoDocbcCost",
        "GenHlth",
        "MentHlth",
        "PhysHlth",
        "DiffWalk",
        "Sex",
        "Age",
        "Education",
        "Income",
    ]
    print("Using fallback FEATURE_NAMES (len=", len(FEATURE_NAMES), ")")

# --- Ensure .pt tensors exist and match FEATURE_NAMES; rebuild from CSV if needed ---
pt_files = [
    "X_train_tensor.pt",
    "y_train_tensor.pt",
    "X_val_tensor.pt",
    "y_val_tensor.pt",
    "X_test_tensor.pt",
    "y_test_tensor.pt",
]

need_rebuild = False
# If any file is missing, mark for rebuild
if not all(os.path.exists(p) for p in pt_files):
    print("One or more .pt files missing; will rebuild from CSV if available.")
    need_rebuild = True
else:
    # If all present, verify the feature count matches
    try:
        X_train_tmp = torch.load("X_train_tensor.pt")
        if X_train_tmp.ndim != 2 or X_train_tmp.shape[1] != len(FEATURE_NAMES):
            print(
                "Existing tensors have feature-count mismatch:",
                getattr(X_train_tmp, "shape", None),
                "vs FEATURE_NAMES len=",
                len(FEATURE_NAMES),
            )
            need_rebuild = True
        else:
            print("Found existing .pt tensors and they match FEATURE_NAMES.")
    except Exception as e:
        print(
            "Error loading existing .pt files; will rebuild if possible. Error:",
            repr(e),
        )
        need_rebuild = True

if need_rebuild:
    if csv_path is None:
        raise FileNotFoundError(
            f"Cannot rebuild: CSV not found in candidates {csv_candidates} and .pt files are missing or invalid."
        )

    print("Rebuilding tensors from CSV:", csv_path)
    df = pd.read_csv(csv_path)
    target_col = "Diabetes_binary"
    if target_col not in df.columns:
        raise ValueError(f"Expected target column '{target_col}' in CSV columns")

    # Derive FEATURE_NAMES from CSV to be sure
    FEATURE_NAMES = [c for c in df.columns if c != target_col]

    # Minimal preprocessing: fill numeric NA with column mean
    df_work = df[FEATURE_NAMES + [target_col]].copy()
    for c in FEATURE_NAMES:
        if df_work[c].isna().any():
            df_work[c].fillna(df_work[c].mean(), inplace=True)

    X = df_work[FEATURE_NAMES].to_numpy(dtype=np.float32)
    y = df_work[target_col].to_numpy(dtype=np.float32)
    print("Full dataset shape X, y:", X.shape, y.shape)

    # Stratified split: test 15%, val 15%, train rest
    test_frac = 0.15
    val_frac = 0.15
    X_temp, X_test, y_temp, y_test = train_test_split(
        X, y, test_size=test_frac, stratify=y, random_state=42
    )
    val_rel = val_frac / (1.0 - test_frac)
    X_train, X_val, y_train, y_val = train_test_split(
        X_temp, y_temp, test_size=val_rel, stratify=y_temp, random_state=42
    )
    print("Split shapes:")
    print("  X_train, y_train:", X_train.shape, y_train.shape)
    print("  X_val,   y_val:  ", X_val.shape, y_val.shape)
    print("  X_test,  y_test: ", X_test.shape, y_test.shape)

    # Convert to torch tensors
    X_train_tensor = torch.from_numpy(X_train)
    X_val_tensor = torch.from_numpy(X_val)
    X_test_tensor = torch.from_numpy(X_test)

    y_train_tensor = torch.from_numpy(y_train).float()
    y_val_tensor = torch.from_numpy(y_val).float()
    y_test_tensor = torch.from_numpy(y_test).float()

    # Backup existing files then save new tensors
    for fname, var in [
        ("X_train_tensor.pt", X_train_tensor),
        ("y_train_tensor.pt", y_train_tensor),
        ("X_val_tensor.pt", X_val_tensor),
        ("y_val_tensor.pt", y_val_tensor),
        ("X_test_tensor.pt", X_test_tensor),
        ("y_test_tensor.pt", y_test_tensor),
    ]:
        if os.path.exists(fname):
            bak = f"{fname}.bak.{time.strftime('%Y%m%d_%H%M%S')}"
            os.rename(fname, bak)
            print(f"Backed up {fname} -> {bak}")
        torch.save(var, fname)
        print(f"Saved {fname} (shape={tuple(var.shape)})")

    num_features = X_train_tensor.shape[1]
    print(f"üî¢ Updated num_features = {num_features}")
    print("Rebuild complete.")
else:
    # Load tensors into workspace variables so subsequent cells can use them
    X_train_tensor = torch.load("X_train_tensor.pt")
    y_train_tensor = torch.load("y_train_tensor.pt")
    X_val_tensor = torch.load("X_val_tensor.pt")
    y_val_tensor = torch.load("y_val_tensor.pt")
    X_test_tensor = torch.load("X_test_tensor.pt")
    y_test_tensor = torch.load("y_test_tensor.pt")
    num_features = X_train_tensor.shape[1]
    print(f"üî¢ num_features = {num_features}")

print(
    "Startup data check complete. If you want to force a rebuild, delete existing .pt files and re-run this cell."
)


üü¢ Using device: cpu
FEATURE_NAMES derived from CSV (len= 21 )
Found existing .pt tensors and they match FEATURE_NAMES.
üî¢ num_features = 21
Startup data check complete. If you want to force a rebuild, delete existing .pt files and re-run this cell.


Rebuild DataLoaders from saved tensors

In [None]:
X_train_tensor = torch.load("X_train_tensor.pt")
y_train_tensor = torch.load("y_train_tensor.pt")
X_val_tensor = torch.load("X_val_tensor.pt")
y_val_tensor = torch.load("y_val_tensor.pt")
X_test_tensor = torch.load("X_test_tensor.pt")
y_test_tensor = torch.load("y_test_tensor.pt")


# Quick sanity checks
assert X_train_tensor.ndim == 2, "X_train must be [N, num_features]"
assert (
    X_val_tensor.shape[1] == X_train_tensor.shape[1]
), "val features != train features"
assert (
    X_test_tensor.shape[1] == X_train_tensor.shape[1]
), "test features != train features"
assert y_train_tensor.shape[0] == X_train_tensor.shape[0], "y_train length mismatch"
assert y_val_tensor.shape[0] == X_val_tensor.shape[0], "y_val length mismatch"
assert y_test_tensor.shape[0] == X_test_tensor.shape[0], "y_test length mismatch"

num_features = X_train_tensor.shape[1]
print(f"üî¢ num_features = {num_features}")


# Build datasets
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

# Loader knobs (easy to tweak)
BATCH_SIZE = 64
NUM_WORKERS = min(4, os.cpu_count() or 0)
PIN_MEMORY = device.type == "cuda"
DROP_LAST_TRAIN = True  # helps when using BatchNorm

# Build DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=NUM_WORKERS,
    pin_memory=PIN_MEMORY,
    drop_last=DROP_LAST_TRAIN,
)
val_loader = DataLoader(
    val_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=NUM_WORKERS,
    pin_memory=PIN_MEMORY,
    drop_last=False,
)
test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=NUM_WORKERS,
    pin_memory=PIN_MEMORY,
    drop_last=False,
)

print("‚úÖ Data loaded and DataLoaders created.")


üî¢ num_features = 21
‚úÖ Data loaded and DataLoaders created.


In [None]:
# Cell 5 ‚Äî Quick verification: FEATURE_NAMES vs num_features
# Raises an assertion error early if the CSV-derived feature names and tensors are misaligned.
assert (
    len(FEATURE_NAMES) == num_features
), f"FEATURE_NAMES len={len(FEATURE_NAMES)} vs num_features={num_features}"
print(f"‚úÖ FEATURE_NAMES matches num_features ({num_features})")


‚úÖ FEATURE_NAMES matches num_features (21)


Class balance (optional) & pos_weight helper

In [None]:
# pos_weight for BCEWithLogitsLoss
with torch.no_grad():
    pos_count = (y_train_tensor == 1).sum().item()
    neg_count = (y_train_tensor == 0).sum().item()
    total = len(y_train_tensor)
    pos_ratio = pos_count / total
    print(
        f"üìä Train positives: {pos_count} ({pos_ratio:.3f}), negatives: {neg_count} ({neg_count/total:.3f}), total: {total}"
    )


def get_pos_weight_tensor(y_tensor: torch.Tensor) -> torch.Tensor:
    # pos_weight = N_neg / N_pos (used by BCEWithLogitsLoss)
    pos = (y_tensor == 1).sum().item()
    neg = (y_tensor == 0).sum().item()
    assert pos > 0 and neg > 0, "Need at least one positive and one negative"
    return torch.tensor([neg / pos], dtype=torch.float32, device=device)


üìä Train positives: 24742 (0.139), negatives: 152834 (0.861), total: 177576


Hyperparameters (single place to tweak)

In [None]:
from dataclasses import dataclass
from typing import List, Optional
import time


@dataclass
class TrainConfig:
    # Architecture
    hidden_sizes: List[int] = None  # e.g. [256, 128, 64]
    dropout: float = 0.35
    use_batchnorm: bool = True
    activation: str = "relu"  # "relu" | "gelu" | "leaky_relu"

    # Optimization
    optimizer: str = "adamw"
    lr: float = 1e-3
    weight_decay: float = 1e-4
    momentum: float = 0.9

    # Training
    epochs: int = 60
    grad_clip: Optional[float] = 1.0
    mixed_precision: bool = True

    # Early Stopping
    early_stopping: bool = True
    patience: int = 10
    min_delta: float = 1e-4

    # LR Scheduler
    scheduler: str = "plateau"
    step_size: int = 12
    gamma: float = 0.5
    cosine_T_max: int = 30
    plateau_factor: float = 0.5
    plateau_patience: int = 3
    plateau_min_lr: float = 1e-6

    # Loss options
    use_pos_weight: bool = False

    # Saving
    save_dir: str = "checkpoints"
    run_name: Optional[str] = None

    # Logging
    print_every: int = 1

    def __post_init__(self):
        if self.hidden_sizes is None:
            self.hidden_sizes = [256, 128, 64]
        if self.run_name is None:
            self.run_name = time.strftime("run_%Y%m%d_%H%M%S")


# create instance
cfg = TrainConfig()
cfg


TrainConfig(hidden_sizes=[256, 128, 64], dropout=0.35, use_batchnorm=True, activation='relu', optimizer='adamw', lr=0.001, weight_decay=0.0001, momentum=0.9, epochs=60, grad_clip=1.0, mixed_precision=True, early_stopping=True, patience=10, min_delta=0.0001, scheduler='plateau', step_size=12, gamma=0.5, cosine_T_max=30, plateau_factor=0.5, plateau_patience=3, plateau_min_lr=1e-06, use_pos_weight=False, save_dir='checkpoints', run_name='run_20250908_215357', print_every=1)

Model factory

In [None]:
class TabularFFNN(nn.Module):
    def __init__(
        self,
        input_dim: int,
        hidden_sizes: List[int],
        output_dim: int = 1,
        dropout: float = 0.3,
        use_batchnorm: bool = True,
        activation: str = "relu",
    ):
        super().__init__()
        acts = {
            "relu": nn.ReLU(),
            "gelu": nn.GELU(),
            "leaky_relu": nn.LeakyReLU(0.01),
        }
        assert activation in acts, f"Unsupported activation: {activation}"
        self.activation = acts[activation]

        layers = []
        prev = input_dim
        for h in hidden_sizes:
            layers.append(nn.Linear(prev, h))
            if use_batchnorm:
                layers.append(nn.BatchNorm1d(h))
            layers.append(self.activation)
            if dropout and dropout > 0:
                layers.append(nn.Dropout(dropout))
            prev = h
        layers.append(nn.Linear(prev, 1))  # 1 logit for binary classification
        self.net = nn.Sequential(*layers)

    def forward(self, x):
        return self.net(x).squeeze(1)  # [B]


Optimizers, schedulers, EarlyStopping, metrics

In [None]:
def make_scheduler(optimizer, cfg: TrainConfig):
    if cfg.scheduler == "none":
        return None
    if cfg.scheduler == "step":
        return torch.optim.lr_scheduler.StepLR(
            optimizer, step_size=cfg.step_size, gamma=cfg.gamma
        )
    if cfg.scheduler == "cosine":
        return torch.optim.lr_scheduler.CosineAnnealingLR(
            optimizer, T_max=cfg.cosine_T_max
        )
    if cfg.scheduler == "plateau":
        return torch.optim.lr_scheduler.ReduceLROnPlateau(
            optimizer,
            mode="min",
            factor=cfg.plateau_factor,
            patience=cfg.plateau_patience,
            min_lr=cfg.plateau_min_lr,
        )
    raise ValueError(f"Unsupported scheduler: {cfg.scheduler}")


Train/validate loops

In [None]:
def binary_accuracy_from_logits(logits, targets, threshold=0.5):
    """
    Computes binary accuracy given logits and targets (float or int, shape [B]).
    """
    probs = torch.sigmoid(logits)
    preds = (probs >= threshold).float()
    targets = targets.float()
    correct = (preds == targets).float().sum()
    return correct.item() / len(targets)


In [None]:
def train_one_run(cfg, train_loader, val_loader, input_dim):
    """
    Trains a TabularFFNN model using the provided config and loaders.
    Returns a dict with best val loss and checkpoint path.
    """
    import os
    import copy
    from torch.cuda.amp import GradScaler

    # Model
    model = TabularFFNN(
        input_dim=input_dim,
        hidden_sizes=cfg.hidden_sizes,
        dropout=cfg.dropout,
        use_batchnorm=cfg.use_batchnorm,
        activation=cfg.activation,
    ).to(device)

    # Optimizer
    if cfg.optimizer == "adamw":
        optimizer = torch.optim.AdamW(
            model.parameters(), lr=cfg.lr, weight_decay=cfg.weight_decay
        )
    elif cfg.optimizer == "adam":
        optimizer = torch.optim.Adam(
            model.parameters(), lr=cfg.lr, weight_decay=cfg.weight_decay
        )
    elif cfg.optimizer == "sgd":
        optimizer = torch.optim.SGD(
            model.parameters(),
            lr=cfg.lr,
            weight_decay=cfg.weight_decay,
            momentum=cfg.momentum,
        )
    else:
        raise ValueError(f"Unsupported optimizer: {cfg.optimizer}")

    # Loss
    if cfg.use_pos_weight:
        pos_weight = get_pos_weight_tensor(y_train_tensor)
        criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
    else:
        criterion = nn.BCEWithLogitsLoss()

    # Scheduler
    scheduler = make_scheduler(optimizer, cfg)

    # AMP scaler
    scaler = GradScaler() if cfg.mixed_precision and device.type == "cuda" else None

    # Early stopping
    best_val_loss = float("inf")
    best_state = None
    best_epoch = 0
    patience_counter = 0
    save_dir = os.path.join(cfg.save_dir, cfg.run_name)
    os.makedirs(save_dir, exist_ok=True)
    best_path = os.path.join(save_dir, "best_model.pt")

    for epoch in range(cfg.epochs):
        train_loss, train_acc = run_epoch(
            model, train_loader, criterion, optimizer, cfg, scaler
        )
        val_loss, val_acc = run_epoch(model, val_loader, criterion, None, cfg)

        # Scheduler step
        if scheduler:
            if cfg.scheduler == "plateau":
                scheduler.step(val_loss)
            else:
                scheduler.step()

        if val_loss < best_val_loss - cfg.min_delta:
            best_val_loss = val_loss
            best_state = {
                "model_state": copy.deepcopy(model.state_dict()),
                "cfg": asdict(cfg),
                "input_dim": input_dim,
                "hidden_sizes": cfg.hidden_sizes,
            }
            torch.save(best_state, best_path)
            best_epoch = epoch
            patience_counter = 0
        else:
            patience_counter += 1

        if cfg.print_every and (
            epoch % cfg.print_every == 0 or epoch == cfg.epochs - 1
        ):
            print(
                f"Epoch {epoch+1:3d}/{cfg.epochs} | Train loss: {train_loss:.4f} | Val loss: {val_loss:.4f} | Train acc: {train_acc:.3f} | Val acc: {val_acc:.3f}"
            )

        if cfg.early_stopping and patience_counter >= cfg.patience:
            print(f"Early stopping at epoch {epoch+1}")
            break

    # Load best state for return
    if best_state is not None:
        model.load_state_dict(best_state["model_state"])
    else:
        print("Warning: No improvement during training.")

    return {"best_val_loss": best_val_loss, "best_path": best_path}


In [None]:
def run_epoch(
    model, loader, criterion, optimizer=None, cfg: TrainConfig = None, scaler=None
):
    is_train = optimizer is not None
    model.train(is_train)

    running_loss, running_acc, n_batches = 0.0, 0.0, 0

    for xb, yb in loader:
        xb = xb.to(device, non_blocking=True).float()
        yb = yb.to(
            device, non_blocking=True
        ).float()  # BCEWithLogits expects float targets
        yb = yb.view(-1)  # Flatten target to shape [B]

        if is_train:
            optimizer.zero_grad(set_to_none=True)

            use_amp = cfg.mixed_precision and device.type == "cuda"
            amp_ctx = (
                torch.autocast(device_type="cuda", dtype=torch.float16)
                if use_amp
                else nullcontext()
            )

            with amp_ctx:
                logits = model(xb)  # [B]
                loss = criterion(logits, yb)  # scalar

            if scaler is not None and use_amp:
                scaler.scale(loss).backward()
                if cfg.grad_clip:
                    scaler.unscale_(optimizer)
                    nn.utils.clip_grad_norm_(model.parameters(), cfg.grad_clip)
                scaler.step(optimizer)
                scaler.update()
            else:
                loss.backward()
                if cfg.grad_clip:
                    nn.utils.clip_grad_norm_(model.parameters(), cfg.grad_clip)
                optimizer.step()
        else:
            with torch.no_grad():
                logits = model(xb)
                loss = criterion(logits, yb)

        acc = binary_accuracy_from_logits(logits, yb)
        running_loss += loss.item()
        running_acc += acc
        n_batches += 1

    return running_loss / n_batches, running_acc / n_batches


Infer input_dim, run training (edit hyperparams above)

In [None]:
# Make sure to run all previous cells that define train_one_run and its dependencies before running this cell.
# Infer input_dim from one batch
xb0, yb0 = next(iter(train_loader))
input_dim = xb0.shape[1]
print(f"üîé Inferred input_dim = {input_dim}")

# (Optional) align drop_last to batchnorm choice
if cfg.use_batchnorm and not DROP_LAST_TRAIN:
    print(
        "‚ö†Ô∏è Consider setting DROP_LAST_TRAIN=True when using BatchNorm to avoid tiny last batches."
    )

result = train_one_run(cfg, train_loader, val_loader, input_dim)
print("Best val loss:", result["best_val_loss"])
print("Best model path:", result["best_path"])


üîé Inferred input_dim = 21
Epoch   1/60 | Train loss: 0.3327 | Val loss: 0.3152 | Train acc: 0.859 | Val acc: 0.864
Epoch   1/60 | Train loss: 0.3327 | Val loss: 0.3152 | Train acc: 0.859 | Val acc: 0.864
Epoch   2/60 | Train loss: 0.3212 | Val loss: 0.3120 | Train acc: 0.863 | Val acc: 0.867
Epoch   2/60 | Train loss: 0.3212 | Val loss: 0.3120 | Train acc: 0.863 | Val acc: 0.867
Epoch   3/60 | Train loss: 0.3199 | Val loss: 0.3135 | Train acc: 0.863 | Val acc: 0.865
Epoch   3/60 | Train loss: 0.3199 | Val loss: 0.3135 | Train acc: 0.863 | Val acc: 0.865
Epoch   4/60 | Train loss: 0.3192 | Val loss: 0.3128 | Train acc: 0.863 | Val acc: 0.866
Epoch   4/60 | Train loss: 0.3192 | Val loss: 0.3128 | Train acc: 0.863 | Val acc: 0.866
Epoch   5/60 | Train loss: 0.3186 | Val loss: 0.3117 | Train acc: 0.864 | Val acc: 0.867
Epoch   5/60 | Train loss: 0.3186 | Val loss: 0.3117 | Train acc: 0.864 | Val acc: 0.867
Epoch   6/60 | Train loss: 0.3176 | Val loss: 0.3119 | Train acc: 0.864 | Val acc

Install (if needed) & imports for explainers - Do not run since it is done above)

In [None]:
# Resolve FEATURE_NAMES: prefer CSV header if available, otherwise fallback to hardcoded list
import os

csv_path = "data/diabetes_binary_health_indicators_BRFSS2015.csv"
if os.path.exists(csv_path):
    import pandas as pd

    df_head = pd.read_csv(csv_path, nrows=0)
    FEATURE_NAMES = [c for c in df_head.columns if c != "Diabetes_binary"]
else:
    # Fallback list (kept for offline editing)
    FEATURE_NAMES = [
        "HighBP",
        "HighChol",
        "CholCheck",
        "BMI",
        "Smoker",
        "Stroke",
        "HeartDiseaseorAttack",
        "PhysActivity",
        "Fruits",
        "Veggies",
        "HvyAlcoholConsump",
        "AnyHealthcare",
        "NoDocbcCost",
        "GenHlth",
        "MentHlth",
        "PhysHlth",
        "DiffWalk",
        "Sex",
        "Age",
        "Education",
        "Income",
    ]

# Integrated Gradients implemented below; FEATURE_NAMES now available from CSV when possible.


In [None]:
# Cell 21 ‚Äî Rebuild-from-CSV (deprecated shortcut)
# The notebook now performs startup data validation and rebuild in the top cell.
# This cell is kept for compatibility but does not need to be run in normal workflows.
print(
    "Note: startup cell already performs CSV-based rebuild if necessary. Delete .pt files and re-run the top cell to force a rebuild."
)


Load best checkpoint & rebuild the trained model

In [None]:
# Locate best checkpoint path
if "result" in globals() and isinstance(result, dict) and "best_path" in result:
    BEST_PATH = result["best_path"]
    print("Using best model from training:", BEST_PATH)
else:
    # Fallback: set manually if needed
    # BEST_PATH = "checkpoints/run_YYYYMMDD_HHMMSS/best_model.pt"
    raise FileNotFoundError(
        "No result['best_path'] found. Re-run training cell or set BEST_PATH manually."
    )

ckpt = torch.load(BEST_PATH, map_location=device)
cfg_loaded = ckpt.get("cfg", {})
hidden_sizes = ckpt.get("hidden_sizes", [256, 128, 64])
input_dim_ckpt = ckpt.get("input_dim", X_val_tensor.shape[1])

model_explain = TabularFFNN(
    input_dim=input_dim_ckpt,
    hidden_sizes=hidden_sizes,
    dropout=cfg_loaded.get("dropout", 0.0),
    use_batchnorm=cfg_loaded.get("use_batchnorm", True),
    activation=cfg_loaded.get("activation", "relu"),
).to(device)
model_explain.load_state_dict(ckpt["model_state"])
model_explain.eval()
print("‚úÖ Model restored & set to eval().")


IG config & helpers (global + local attributions)

In [None]:
# --- Config for explainers ---
class ExplainConfig:
    # Integrated Gradients
    ig_steps: int = 64  # 32‚Äì128 is common
    ig_batch_size: int = 512  # batch IG for speed
    ig_baseline: str = "mean"  # "zero" | "mean"
    ig_global_samples: int = (
        2000  # how many val rows to aggregate for global importance (None = all)
    )
    top_k: int = 20  # top features to plot


EXPL_CFG = ExplainConfig()

# Build baseline
with torch.no_grad():
    if EXPL_CFG.ig_baseline == "zero":
        ig_baseline_vec = torch.zeros(
            (1, num_features), dtype=torch.float32, device=device
        )
    elif EXPL_CFG.ig_baseline == "mean":
        ig_baseline_vec = X_train_tensor.mean(dim=0, keepdim=True).to(device)
    else:
        raise ValueError("ig_baseline must be 'zero' or 'mean'")


def integrated_gradients(model, inputs, baseline, n_steps=64):
    """
    Custom PyTorch Integrated Gradients for tabular data.
    Args:
        model: PyTorch model
        inputs: [B, F] tensor
        baseline: [1, F] or [B, F] tensor
        n_steps: number of steps for IG
    Returns:
        attributions: [B, F] tensor (on CPU)
    """
    model.eval()
    inputs = inputs.detach()
    baseline = baseline.detach()
    # Expand baseline if needed
    if baseline.shape[0] == 1:
        baseline = baseline.expand(inputs.shape[0], -1)
    # Generate scaled inputs
    alphas = torch.linspace(0, 1, n_steps, device=inputs.device).view(
        -1, 1, 1
    )  # [n_steps, 1, 1]
    baseline = baseline.unsqueeze(0)  # [1, B, F]
    inputs = inputs.unsqueeze(0)  # [1, B, F]
    interpolated = baseline + alphas * (inputs - baseline)  # [n_steps, B, F]
    interpolated = interpolated.requires_grad_()
    grads = []
    for i in range(n_steps):
        x = interpolated[i]  # [B, F]
        x.requires_grad_()
        out = model(x)  # [B]
        out = out.sum()
        grad = torch.autograd.grad(out, x, retain_graph=True)[0]  # [B, F]
        grads.append(grad)
    grads = torch.stack(grads, dim=0)  # [n_steps, B, F]
    avg_grads = grads.mean(dim=0)  # [B, F]
    attributions = (inputs.squeeze(0) - baseline.squeeze(0)) * avg_grads  # [B, F]
    return attributions.detach().cpu()


def compute_ig_batch(model, xb, baseline, n_steps=64):
    """
    xb: [B, F] on device; baseline: [1, F] or [B, F]
    returns attributions [B, F] on CPU
    """
    return integrated_gradients(model, xb, baseline, n_steps)


def ig_global_importance(model, X_tensor, cfg: ExplainConfig):
    """
    Aggregates |IG| over a subset of X_tensor to get global importances.
    """
    N = (
        len(X_tensor)
        if cfg.ig_global_samples is None
        else min(cfg.ig_global_samples, len(X_tensor))
    )
    idx = np.random.permutation(len(X_tensor))[:N]
    X_sub = X_tensor[idx]

    all_attrs = []
    for i in range(0, N, cfg.ig_batch_size):
        xb = X_sub[i : i + cfg.ig_batch_size].to(device).float()
        attrs = compute_ig_batch(model, xb, ig_baseline_vec, n_steps=cfg.ig_steps)
        all_attrs.append(attrs)

    A = torch.cat(all_attrs, dim=0).abs().mean(dim=0).numpy()  # [F]
    return A, idx  # mean |attribution| per feature


def plot_topk_bar(
    importances, feature_names, top_k=20, title="Global Feature Importance (IG)"
):
    imp = np.asarray(importances)
    k = min(top_k, len(imp))
    order = np.argsort(imp)[::-1][:k]
    labels = [feature_names[i] for i in order][::-1]
    vals = imp[order][::-1]
    plt.figure(figsize=(8, 6))
    plt.barh(labels, vals)
    plt.title(title)
    plt.xlabel("Mean |attribution|")
    plt.tight_layout()
    plt.show()


Run IG global & local visualizations

In [None]:
# --- Global (validation subset) ---
global_imp_ig, used_idx = ig_global_importance(model_explain, X_val_tensor, EXPL_CFG)
plot_topk_bar(
    global_imp_ig,
    FEATURE_NAMES,
    top_k=EXPL_CFG.top_k,
    title="Global Feature Importance (Integrated Gradients)",
)

# --- Local (choose one sample from validation) ---
SAMPLE_IDX = int(used_idx[0])  # pick first from subset; change as desired
x_local = X_val_tensor[SAMPLE_IDX : SAMPLE_IDX + 1].to(device).float()
attr_local = (
    compute_ig_batch(model_explain, x_local, ig_baseline_vec, n_steps=EXPL_CFG.ig_steps)
    .numpy()
    .squeeze(0)
)  # [F]

# Plot top-K positive magnitude local contributions
k = min(EXPL_CFG.top_k, len(attr_local))
order = np.argsort(np.abs(attr_local))[::-1][:k]
labels = [FEATURE_NAMES[i] for i in order][::-1]
vals = attr_local[order][::-1]

plt.figure(figsize=(8, 6))
plt.barh(labels, vals)
plt.title(f"Local Attribution (IG) ‚Äî val index {SAMPLE_IDX}")
plt.xlabel("Attribution (signed, logit space)")
plt.tight_layout()
plt.show()

# Inspect model logit/probability for the same sample
with torch.no_grad():
    logit = model_explain(x_local).item()
    prob = torch.sigmoid(torch.tensor(logit)).item()
print(f"üîé SAMPLE {SAMPLE_IDX} -> logit={logit:.4f}, prob={prob:.4f}")


SHAP config & background/sample selection

In [None]:
class ShapConfig:
    background_size: int = 200  # background for DeepExplainer
    sample_size: int = 1000  # validation samples to explain
    top_k: int = 20


SHAP_CFG = ShapConfig()

# Select background from TRAIN
bg_size = min(SHAP_CFG.background_size, len(X_train_tensor))
bg_idx = np.random.permutation(len(X_train_tensor))[:bg_size]
background = X_train_tensor[bg_idx].to(device).float()

# Select a sample to explain from VAL
sample_size = min(SHAP_CFG.sample_size, len(X_val_tensor))
sample_idx = np.random.permutation(len(X_val_tensor))[:sample_size]
X_sample = X_val_tensor[sample_idx].to(device).float()


Run SHAP (DeepExplainer with Kernel fallback) + visuals

In [None]:
# Safe import of shap - if missing, skip SHAP cells with a clear message
try:
    import shap
except Exception as e:
    shap = None
    print(
        "‚ö†Ô∏è shap not installed; SHAP visualizations will be skipped. Install with: pip install shap"
    )


def shap_global_bar(
    shap_vals, feature_names, top_k=20, title="Global Feature Importance (SHAP)"
):
    # shap_vals: [N, F]
    imp = np.mean(np.abs(shap_vals), axis=0)
    k = min(top_k, len(imp))
    order = np.argsort(imp)[::-1][:k]
    labels = [feature_names[i] for i in order][::-1]
    vals = imp[order][::-1]
    plt.figure(figsize=(8, 6))
    plt.barh(labels, vals)
    plt.title(title)
    plt.xlabel("Mean |SHAP value|")
    plt.tight_layout()
    plt.show()
    return imp


# If shap is not available, skip the heavy computation and notify the user
if shap is None:
    print("Skipping SHAP computations because the 'shap' package is not available.")
else:
    # Try DeepExplainer first (fast for PyTorch NNs), fallback to KernelExplainer if needed
    try:
        explainer = shap.DeepExplainer(model_explain, background)
        shap_vals = explainer.shap_values(X_sample)  # returns list or array
        if isinstance(shap_vals, list):
            shap_vals = shap_vals[0]  # single-output model
        shap_vals = np.array(shap_vals)  # [N, F]
        print("‚úÖ SHAP DeepExplainer succeeded.")
    except Exception as e:
        print(
            "‚ö†Ô∏è DeepExplainer failed, falling back to KernelExplainer. Reason:", repr(e)
        )
        # KernelExplainer expects a function mapping numpy -> model outputs
        model_explain.eval()

        def f_np(x_np):
            with torch.no_grad():
                x_t = torch.from_numpy(x_np).to(device).float()
                out = model_explain(x_t)  # logits
                return out.detach().cpu().numpy()

        # Use a smaller background for KernelExplainer for speed
        bg_np = background[:50].detach().cpu().numpy()
        kexpl = shap.KernelExplainer(f_np, bg_np)
        shap_vals = kexpl.shap_values(
            X_sample.detach().cpu().numpy(), nsamples=100
        )  # tune nsamples for speed/accuracy
        if isinstance(shap_vals, list):
            shap_vals = shap_vals[0]
        shap_vals = np.array(shap_vals)

    # Global bar (top-K)
    _ = shap_global_bar(
        shap_vals,
        FEATURE_NAMES,
        top_k=SHAP_CFG.top_k,
        title="Global Feature Importance (SHAP)",
    )

    # Beeswarm summary (nice global picture)
    try:
        shap.summary_plot(
            shap_vals,
            X_sample.detach().cpu().numpy(),
            feature_names=FEATURE_NAMES,
            show=True,
        )
    except Exception as e:
        print("Could not draw shap.summary_plot():", repr(e))

    # Dependence plot for top feature (relationship shape)
    try:
        top_feat_idx = int(np.argmax(np.mean(np.abs(shap_vals), axis=0)))
        shap.dependence_plot(
            top_feat_idx,
            shap_vals,
            X_sample.detach().cpu().numpy(),
            feature_names=FEATURE_NAMES,
            show=True,
        )
    except Exception as e:
        print("Could not draw shap.dependence_plot():", repr(e))


In [None]:
# Small test: show binary prediction (0/1) and probability from the restored best model
import torch
import pandas as pd

if "model_explain" not in globals():
    raise RuntimeError(
        "model_explain not found. Run the checkpoint/load cell (Cell 23) first."
    )

model_explain.eval()

# How many test samples to show (adjust as desired)
N = 10
X_all = X_test_tensor
y_all = y_test_tensor
N = min(N, len(X_all))

xb = X_all[:N].to(device).float()
with torch.no_grad():
    logits = model_explain(xb)  # shape [N]
    probs = torch.sigmoid(logits).cpu().numpy()
    preds = (probs >= 0.5).astype(int)

true = y_all[:N].cpu().numpy().astype(int)

df = pd.DataFrame(
    {
        "idx": list(range(N)),
        "true": true,
        "probability": probs,
        "pred_binary": preds,
    }
)
print(df.to_string(index=False, float_format="{:.4f}".format))

acc = (preds == true).mean() if N > 0 else float("nan")
print(f"\nAccuracy on these {N} test samples: {acc:.3f}")


# Helper: predict a single sample tensor -> (pred_binary:int, probability:float)
def predict_one(x_tensor):
    x = x_tensor.to(device).float().unsqueeze(0)
    with torch.no_grad():
        logit = model_explain(x).item()
        prob = float(torch.sigmoid(torch.tensor(logit)).item())
        pred = int(prob >= 0.5)
    return pred, prob


# Example: predict first test sample
p, pr = predict_one(X_test_tensor[0])
print(f"\nExample sample 0 -> pred={p}, prob={pr:.4f}")


In [None]:
# Generate random inputs for all FEATURES and predict ‚Äî Streamlit-ready template
import numpy as np
import pandas as pd
import torch

if "FEATURE_NAMES" not in globals():
    raise RuntimeError("FEATURE_NAMES not found. Run startup cells first.")
if "X_train_tensor" not in globals():
    raise RuntimeError("X_train_tensor not found. Run data-load cells first.")
if "model_explain" not in globals():
    raise RuntimeError("model_explain not found. Load the checkpoint/model first.")

# Use training data statistics to synthesize realistic random inputs
with torch.no_grad():
    means = X_train_tensor.mean(dim=0).cpu().numpy()
    stds = X_train_tensor.std(dim=0).cpu().numpy()
    mins = X_train_tensor.min(dim=0).values.cpu().numpy()
    maxs = X_train_tensor.max(dim=0).values.cpu().numpy()


def sample_random_inputs(n_samples=1, clamp=True, scale=1.0, seed=None):
    """Generate n_samples random rows using (mean + scale*std*N(0,1)).
    If clamp=True values are clipped to observed min/max to keep them plausible.
    Returns a pandas.DataFrame and a torch.Tensor on CPU.
    """
    if seed is not None:
        np.random.seed(seed)
    z = np.random.randn(n_samples, len(FEATURE_NAMES))
    X_rand = means[None, :] + (scale * stds)[None, :] * z
    if clamp:
        X_rand = np.minimum(np.maximum(X_rand, mins[None, :]), maxs[None, :])
    df = pd.DataFrame(X_rand, columns=FEATURE_NAMES)
    return df, torch.from_numpy(X_rand.astype(np.float32))


# Example: create 3 random samples
df_rand, X_rand_t = sample_random_inputs(n_samples=3, seed=42)
print("Random input samples (first rows):")
print(df_rand.head().to_string(index=False))

# Predict using model_explain
model_explain.eval()
with torch.no_grad():
    xb = X_rand_t.to(device).float()
    logits = model_explain(xb).cpu()
    probs = torch.sigmoid(logits).numpy()
    preds = (probs >= 0.5).astype(int)

out_df = df_rand.copy()
out_df["probability"] = probs
out_df["pred_binary"] = preds

print("Predictions for random inputs:")
print(out_df.to_string(index=False, float_format="{:.4f}".format))


# Helper to produce a dict of feature:value which is convenient for Streamlit inputs
def random_input_dict(seed=None):
    df, _ = sample_random_inputs(n_samples=1, seed=seed)
    return df.iloc[0].to_dict()


# Example streamlit-style dict for one sample
example_dict = random_input_dict(seed=123)
print("Example input dict (Streamlit form source):")
print(example_dict)


# Single-sample prediction helper that accepts a feature-dict
def predict_from_dict(feat_dict):
    x = np.array([feat_dict[f] for f in FEATURE_NAMES], dtype=np.float32)[None, :]
    xt = torch.from_numpy(x).to(device).float()
    with torch.no_grad():
        logit = model_explain(xt).item()
        prob = float(torch.sigmoid(torch.tensor(logit)).item())
        pred = int(prob >= 0.5)
    return {"pred_binary": pred, "probability": prob}


# Demo: predict from the example dict
demo_out = predict_from_dict(example_dict)
print("Demo prediction from example_dict:", demo_out)



# ‚úÖ Embed Feature Metadata into Checkpoint

This cell finds your latest `best_model.pt` (or a given path) and injects:
- `feature_order` from `feature_order.json`
- `scaler_params` from `scaler_params.json`

so downstream apps can load the model and know **exactly** how inputs were prepared.


In [None]:
import os, json, glob, torch

# Locate best_model.pt
ckpt_path = None
candidates = sorted(
    glob.glob("**/best_model.pt", recursive=True), key=os.path.getmtime, reverse=True
)
if candidates:
    ckpt_path = candidates[0]
else:
    if os.path.exists("best_model.pt"):
        ckpt_path = "best_model.pt"

assert (
    ckpt_path is not None
), "Could not find best_model.pt. Please set ckpt_path manually."

print("Patching checkpoint:", ckpt_path)
ckpt = torch.load(ckpt_path, map_location="cpu")

# Load sidecar metadata
feature_order = []
scaler_params = {}
if os.path.exists("feature_order.json"):
    feature_order = json.load(open("feature_order.json", "r"))
if os.path.exists("scaler_params.json"):
    scaler_params = json.load(open("scaler_params.json", "r"))

# Inject
if isinstance(ckpt, dict):
    ckpt["feature_order"] = feature_order
    ckpt["scaler_params"] = scaler_params
    torch.save(ckpt, ckpt_path)
    print("‚úÖ Injected feature_order & scaler_params into checkpoint.")
else:
    print("‚ö†Ô∏è Checkpoint is not a dict; cannot inject metadata safely.")
