<a href="https://www.kaggle.com/code/mrafraim/dl-day-17-validation-test-split-in-pytorch?scriptVersionId=288185651" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Day 17: Validation & Test Split in PyTorch

Welcome to Day 17!

Today you'll learn:
- Understand the difference between training, validation, and test sets
- Learn how to split data properly in PyTorch
- Implement metrics to evaluate model performance
- Prepare for early stopping and hyperparameter tuning

If you found this notebook helpful, your **<b style="color:red;">UPVOTE</b>** would be greatly appreciated! It helps others discover the work and supports continuous improvement.

---

# Why Validation & Test Split?

- **Training set:** Used to fit the model  
- **Validation set:** Used to tune hyperparameters, monitor overfitting  
- **Test set:** Used only to evaluate final model performance  
  
> Never use the validation or test set for training. Doing so leads to overfitting and unrealistic performance estimates.

# Sample Dataset (PyTorch Tensors)

In [1]:
import torch
from sklearn.model_selection import train_test_split

# 100 samples, 2 features
X = torch.randn(100, 2)
y = X[:, 0] * 2 + X[:, 1] * -3 + torch.randn(100) * 0.5  # Linear relationship + noise
y = y.unsqueeze(1)  # Make it (100,1) for regression


```python
X = torch.randn(100, 2)
```

* Purpose: Create a dataset of features for regression.
* **`100`** → number of samples (rows).
* **`2`** → number of features (columns).
* **`torch.randn`** → generates random numbers from a standard normal distribution (mean = 0, std = 1).
* Shape of `X`: `(100, 2)`

Example values of `X` (first 4 rows):

```
[[ 0.5, -1.2],
 [ 0.1,  0.7],
 [-0.3,  0.4],
 [ 1.0, -0.5]]
```



```python
y = X[:, 0] * 2 + X[:, 1] * -3 + torch.randn(100) * 0.5
```

* Purpose: Define a linear relationship between features and target `y`, with some random noise.
* `X[:, 0]` → first column (feature 1)
* `X[:, 1]` → second column (feature 2)
* Linear formula applied:

$$
y = 2 \cdot X_1 - 3 \cdot X_2
$$

* `torch.randn(100) * 0.5` → adds noise to simulate real-world data (std = 0.5).
* Shape of `y`: `(100,)` → 1D tensor (100 target values).

Example values of `y` (first 4 rows):

```
[ 0.73, -1.12,  2.45, 0.01]
```



```python
y = y.unsqueeze(1)
```

* Purpose: Convert `y` from 1D to column vector, required by PyTorch regression models.
* `unsqueeze(1)` → adds a new dimension at index 1.
* Shape of `y` after unsqueeze: `(100, 1)`

Example values after unsqueeze:

```
[[ 0.73],
 [-1.12],
 [ 2.45],
 [ 0.01]]
```

* Each row now corresponds to one sample’s target value.

# Train / Validation / Test Split

In [2]:
# First, split into training + temp (val+test)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)

# Split temp into validation and test
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

print(f"Training samples: {X_train.shape[0]}")
print(f"Validation samples: {X_val.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")


Training samples: 70
Validation samples: 15
Test samples: 15


# Convert to PyTorch Dataset & DataLoader

In [3]:
from torch.utils.data import TensorDataset, DataLoader

# Training dataset and loader
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

# Validation dataset and loader
val_dataset = TensorDataset(X_val, y_val)
val_loader = DataLoader(val_dataset, batch_size=16)

# Test dataset and loader
test_dataset = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_dataset, batch_size=16)


```python

Raw tensors
   ↓
TensorDataset (pairs X and y)
   ↓
DataLoader (batching + shuffling)
   ↓
Training loop


# Simple Linear Model

In [4]:
import torch.nn as nn

# nn.Sequential() is ordered container of layers
# Input flows through each layer in order automatically.
model = nn.Sequential(
    nn.Linear(2, 1) ) # Linear layer: 2 input features → 1 output


# Define Loss & Optimizer

In [5]:
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training Loop with Validation Check

In [6]:
epochs = 50

for epoch in range(epochs):
    model.train()                             # Tells PyTorch the model is in training mode
    train_loss = 0
    for xb, yb in train_loader:               # Iterate over batches of data (xb = input batch, yb = target batch)
        optimizer.zero_grad()                 # Reset gradients before this batch
        preds = model(xb)                     # Pass batch xb through the model
        loss = criterion(preds, yb)           # Measures how far predictions are from true values
        loss.backward()                       # Computes gradients of loss w.r.t. all model parameters
        optimizer.step()                      # Optimizer updates all weights using computed gradients and learning rate
        train_loss += loss.item() * xb.size(0)# Multiply by batch size → convert batch average to total sum over samples
    
    train_loss /= len(train_loader.dataset)   # Divide by total number of training samples → average loss per sample
    
    # Validation
    model.eval()                              # evaluation mode
    val_loss = 0
    with torch.no_grad():                     # no gradients are computed, faster and saves memory.we do not update weights during validation
        for xb, yb in val_loader:
            preds = model(xb)
            loss = criterion(preds, yb)
            val_loss += loss.item() * xb.size(0)
    val_loss /= len(val_loader.dataset)       # average loss per sample for the validation set
    
    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Train Loss = {train_loss:.4f}, Val Loss = {val_loss:.4f}")


Epoch 0: Train Loss = 10.6214, Val Loss = 10.4218
Epoch 10: Train Loss = 1.6957, Val Loss = 1.6869
Epoch 20: Train Loss = 0.4307, Val Loss = 0.4168
Epoch 30: Train Loss = 0.2349, Val Loss = 0.2115
Epoch 40: Train Loss = 0.1990, Val Loss = 0.1672


The model’s training and validation loss both decrease steadily over epochs, with rapid improvement in early epochs. By epoch 40, the training loss reaches ~0.19 and validation loss ~0.20, indicating that the model has learned the underlying pattern in the data and is generalizing well, with no significant overfitting observed.

# Evaluate on Test Set

In [7]:
model.eval()                                   # set model to evaluation mode
test_loss = 0  
with torch.no_grad():                          # no gradient computation needed
    for xb, yb in test_loader:                 # iterate over test batches
        preds = model(xb)                      # forward pass
        loss = criterion(preds, yb)            # compute batch loss
        test_loss += loss.item() * xb.size(0)  # accumulate total loss
test_loss /= len(test_loader.dataset)          # average loss per sample
print(f"Test Loss: {test_loss:.4f}")


Test Loss: 0.1757


The model achieves a test loss of ~0.205, which is very close to the final training (0.193) and validation (0.198) losses. This indicates that the model has learned the underlying pattern in the data effectively and generalizes well to unseen samples.

# Metrics

## MSE

* MSE measures how far your predictions are from the true values
* Think: “On average, how wrong am I?”

$$
\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
$$

Where:

* $y_i$ = true value
* $\hat{y}_i$ = predicted value
* $N$ = number of samples
* Squaring → penalizes bigger mistakes more heavily

Example:

* True targets: `[2, 4, 6]`
* Model predicts: `[2.5, 3.5, 5.0]`

1. Compute errors: `[0.5, -0.5, -1.0]`
2. Square errors: `[0.25, 0.25, 1.0]`
3. Average: `(0.25+0.25+1)/3 = 0.5`

MSE = 0.5 → small error on average

* Bigger mistakes → squared → much larger contribution to MSE

Note:

* MSE = average squared “distance” between prediction and true value
* Small MSE → model is close to true values
* Large MSE → predictions are far off

## R² Score
* R² tells you how much of the variation in the data your model explains
* Think: “How much better is my model than just guessing the mean?”

$$
R^2 = 1 - \frac{\text{Error of my model}}{\text{Error of mean model}}
$$

Where:

* **Error of my model** = sum of squared differences between predictions and true values
* **Error of mean model** = sum of squared differences between true values and mean of true values

**Actual formula:**

$$
R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
$$

where
* $y_i$ → true value
* $\hat{y}_i$ → model prediction
* $\bar{y}$ → mean of true values

Example:

* True targets: `[2, 4, 6]`
* Model predicts: `[2.1, 3.9, 6.2]`
* Mean of targets: `(2+4+6)/3 = 4`
* **Error of model** → small, predictions close to true values
* **Error of mean** → bigger, just guessing the mean every time

$$
R^2 = 1 - \frac{\text{small error}}{\text{big error}} \approx 0.95
$$

* R² ≈ 1 → model is very good
* R² ≈ 0 → model is as good as predicting the mean
* R² < 0 → model is worse than predicting the mean



In [8]:
from sklearn.metrics import mean_squared_error, r2_score

# Get predictions on test set
y_test_pred = model(X_test).detach().numpy() # Get predicted values from the model as a NumPy array (no gradients)
y_test_true = y_test.numpy()                 # Convert true targets to NumPy array for metric computation

mse = mean_squared_error(y_test_true, y_test_pred)
r2 = r2_score(y_test_true, y_test_pred)

print(f"MSE: {mse:.4f}, R2 Score: {r2:.4f}")


MSE: 0.1757, R2 Score: 0.9796


| Step             | Action                         | Output                          |
| ---------------- | ------------------------------ | ------------------------------- |
| `model(X_test)`  | Forward pass through the model | Predictions as a PyTorch tensor |
| `.detach()`      | Remove gradients               | Same tensor, no tracking        |
| `.numpy()`       | Convert to NumPy array         | Array ready for metrics         |
| `y_test.numpy()` | Convert true targets           | NumPy array for comparison      |


The model demonstrates strong predictive performance, achieving a low MSE (0.166) and a high R² (0.976), indicating it explains nearly all variance in the target. Given consistent validation and test results, this suggests good generalization rather than overfitting.

# Key Takeaways from Day 17

- Validation set helps monitor generalization and prevent overfitting  
- Test set is strictly for final performance evaluation  
- Always split data before training to avoid data leakage  
- Use loss curves to visualize training vs validation  
- Metrics like MSE, R2, or accuracy quantify performance on unseen data

---

<p style="text-align:center; font-size:18px;">
© 2025 Mostafizur Rahman
</p>
