# Plot of the Week: Train vs Validation Loss Curves (Overfitting)

**Goal.** Fit a **nonlinear regression** model and plot **train vs validation loss vs epoch** to diagnose **overfitting**.

Students will **modify one element** (model size, regularization, optimizer hyperparameters, dataset size/noise, etc.) to **prevent overfitting**.

---

## What question does this plot answer?
How do the model’s training and validation losses evolve during training, and **does the gap indicate overfitting**?

## Why is it important?
Loss curves are a fast, reliable diagnostic for:
- **Overfitting vs underfitting**
- **Optimization stability** (learning rate too high/low, divergence, plateaus)
- **Experiment tracking** (comparing runs and hyperparameters)


In [None]:
# If you're running this on Colab, you might need:
# !pip install torch --quiet

import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

# Reproducibility
torch.manual_seed(0)
np.random.seed(0)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

## 1) Create a simple nonlinear regression dataset (simulated)

We’ll learn a noisy nonlinear function (a sinusoid with noise).
To make overfitting **easy to observe**, we use:
- relatively **small training set**
- a **high-capacity** model later


In [None]:
def make_sine_data(n_train=64, n_val=256, noise_std=0.25, x_range=(-3.0, 3.0)):
    # Train points
    x_train = np.random.uniform(x_range[0], x_range[1], size=(n_train, 1)).astype(np.float32)
    y_train = (np.sin(2.5 * x_train) + 0.3*np.cos(6 * x_train) + noise_std*np.random.randn(n_train, 1)).astype(np.float32)

    # Validation points (denser grid)
    x_val = np.linspace(x_range[0], x_range[1], n_val).reshape(-1, 1).astype(np.float32)
    y_val = (np.sin(2.5 * x_val) + 0.3*np.cos(6 * x_val) + noise_std*np.random.randn(n_val, 1)).astype(np.float32)
    return x_train, y_train, x_val, y_val

x_train, y_train, x_val, y_val = make_sine_data()

plt.figure()
plt.scatter(x_train, y_train, s=18, label="train (noisy samples)")
plt.scatter(x_val, y_val, s=8, alpha=0.35, label="val (noisy samples)")
plt.title("Nonlinear regression dataset")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

## 2) Define a nonlinear regression model (MLP)

This MLP can easily **overfit** if:
- too many hidden units/layers
- training too long
- weak/no regularization

**Student knobs** (change these later): `hidden_dim`, `depth`, `dropout`, `weight_decay`.


In [None]:
class MLPRegressor(nn.Module):
    def __init__(self, in_dim=1, hidden_dim=256, depth=4, dropout=0.0):
        super().__init__()
        layers = []
        d = in_dim
        for _ in range(depth):
            layers.append(nn.Linear(d, hidden_dim))
            layers.append(nn.Tanh())  # smooth nonlinearity works well for this toy regression
            if dropout > 0:
                layers.append(nn.Dropout(dropout))
            d = hidden_dim
        layers.append(nn.Linear(d, 1))
        self.net = nn.Sequential(*layers)

    def forward(self, x):
        return self.net(x)

# Baseline model (intentionally high-capacity to provoke overfitting)
model = MLPRegressor(hidden_dim=256, depth=4, dropout=0.0).to(device)
sum(p.numel() for p in model.parameters())

## 3) Training loop with logging

We log:
- training loss per epoch
- validation loss per epoch

and then plot them.

**Student knobs**: `lr`, `weight_decay`, `batch_size`, `epochs`.


In [None]:
def train_one_run(
    model,
    x_train, y_train, x_val, y_val,
    lr=3e-3,
    weight_decay=0.0,
    batch_size=32,
    epochs=300,
    print_every=50,
):
    # Data
    train_ds = TensorDataset(torch.from_numpy(x_train), torch.from_numpy(y_train))
    val_ds   = TensorDataset(torch.from_numpy(x_val), torch.from_numpy(y_val))
    train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
    val_loader   = DataLoader(val_ds, batch_size=256, shuffle=False)

    # Loss + optimizer
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

    train_losses = []
    val_losses = []

    for epoch in range(1, epochs + 1):
        # ---- train ----
        model.train()
        total = 0.0
        n = 0
        for xb, yb in train_loader:
            xb = xb.to(device)
            yb = yb.to(device)
            pred = model(xb)
            loss = criterion(pred, yb)

            optimizer.zero_grad(set_to_none=True)
            loss.backward()
            optimizer.step()

            total += loss.item() * xb.size(0)
            n += xb.size(0)
        train_loss = total / n

        # ---- val ----
        model.eval()
        total = 0.0
        n = 0
        with torch.no_grad():
            for xb, yb in val_loader:
                xb = xb.to(device)
                yb = yb.to(device)
                pred = model(xb)
                loss = criterion(pred, yb)
                total += loss.item() * xb.size(0)
                n += xb.size(0)
        val_loss = total / n

        train_losses.append(train_loss)
        val_losses.append(val_loss)

        if print_every and (epoch % print_every == 0 or epoch == 1 or epoch == epochs):
            print(f"epoch {epoch:4d} | train MSE {train_loss:.4f} | val MSE {val_loss:.4f}")

    return np.array(train_losses), np.array(val_losses)

# Run a baseline that often overfits
model = MLPRegressor(hidden_dim=256, depth=4, dropout=0.0).to(device)

train_losses, val_losses = train_one_run(
    model,
    x_train, y_train, x_val, y_val,
    lr=3e-3,
    weight_decay=0.0,
    batch_size=32,
    epochs=300,
    print_every=60,
)

## 4) Plot 1: Train vs Validation Loss Curves (minimum requirement)

**Interpretation tip:**  
Overfitting often looks like:
- train loss keeps decreasing
- validation loss bottoms out then starts increasing (or stops improving)



In [None]:
plt.figure()
plt.plot(train_losses, label="train loss")
plt.plot(val_losses, label="validation loss")
plt.xlabel("epoch")
plt.ylabel("MSE loss")
plt.title("Train vs Validation Loss Curves")
plt.legend()
plt.show()

# Optional: print epoch of best validation performance
best_epoch = int(val_losses.argmin()) + 1
print("Best validation loss at epoch:", best_epoch, "val_loss:", float(val_losses.min()))

## 5) Visualize the fitted function (helps explain overfitting)

This is optional but often makes the story clearer.


In [None]:
# Predict on a dense grid
model.eval()
x_grid = np.linspace(-3, 3, 600).reshape(-1, 1).astype(np.float32)
with torch.no_grad():
    y_pred = model(torch.from_numpy(x_grid).to(device)).cpu().numpy()

plt.figure()
plt.scatter(x_train, y_train, s=18, label="train")
plt.plot(x_grid, np.sin(2.5*x_grid) + 0.3*np.cos(6*x_grid), linewidth=2, label="true function (noise-free)")
plt.plot(x_grid, y_pred, linewidth=2, label="model prediction")
plt.title("Fit visualization (optional)")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

## 6) Student task: prevent overfitting

**Choose ONE change** and re-run the training cell(s). Then submit your updated **loss curve plot**.

### Suggested edits (pick one)
1. **Regularization**  
   - increase `weight_decay` (e.g., `1e-4`, `1e-3`, `1e-2`)  
   - add `dropout` (e.g., `0.1`–`0.3`)  
2. **Reduce model capacity**  
   - lower `hidden_dim` (e.g., `32`, `64`)  
   - reduce `depth` (e.g., `2`)  
3. **Train less / early stopping (simple version)**  
   - reduce `epochs`  
   - stop training near the best validation epoch  
4. **More data / less noise**  
   - increase `n_train` (e.g., `256`)  
   - reduce `noise_std` (e.g., `0.10`)  
5. **Optimizer dynamics**  
   - change `lr` (try `1e-3` or `1e-4`)  
   - change `batch_size`

### What “success” looks like
A run where **validation loss does not rise** (or rises much less), and the **train–val gap** is smaller.

---

## Required Submission Template

**Upload**:

the figure (PNG or PDF)

OR the notebook cell output

Short caption (1–2 sentences)

**Title:**  
Train vs Validation Loss Curves

**What question does this plot answer?**  
(1 sentence.)

**Description (1–2 sentences):**  
What data / model / comparison is shown?  
What change did you make, and what happened?



### Starter cell for experiments (students edit this)

Edit the hyperparameters below, re-run, and regenerate **Plot 1**.


In [None]:
# =========================
# STUDENTS: EDIT THESE
# =========================
n_train = 64
noise_std = 0.25

hidden_dim = 256
depth = 4
dropout = 0.

lr = 3e-3
weight_decay = 0.
batch_size = 32
epochs = 300

# Recreate data + model, then train
x_train, y_train, x_val, y_val = make_sine_data(n_train=n_train, noise_std=noise_std)
model = MLPRegressor(hidden_dim=hidden_dim, depth=depth, dropout=dropout).to(device)

train_losses, val_losses = train_one_run(
    model,
    x_train, y_train, x_val, y_val,
    lr=lr,
    weight_decay=weight_decay,
    batch_size=batch_size,
    epochs=epochs,
    print_every=60,
)

# Plot 1 (required)
plt.figure()
plt.plot(train_losses, label="train loss")
plt.plot(val_losses, label="validation loss")
plt.xlabel("epoch")
plt.ylabel("MSE loss")
plt.title("Train vs Validation Loss Curves (your run)")
plt.legend()
plt.show()

best_epoch = int(val_losses.argmin()) + 1
print("Best validation loss at epoch:", best_epoch, "val_loss:", float(val_losses.min()))