# 07 — Datasets, Transforms, and DataLoader (Vision)
**Goal:** master the input pipeline for image tasks so you can swap datasets and augmentations without touching model code.

You’ll learn:
- How **torchvision.datasets** works (MNIST/CIFAR-10 as examples).
- Why we **normalize** and how **data augmentation** reduces overfitting.
- DataLoader knobs: `batch_size`, `shuffle`, `num_workers`, `pin_memory`, `persistent_workers`.
- A clean train/val split you can reuse.


In [None]:
import torch, os
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Device:', device)

data_root = './data'


## 1) Transforms 101
- **ToTensor**: converts H×W×C `[0,255]` images to C×H×W floats `[0,1]`.
- **Normalize**: standardizes channels: `(x - mean) / std`.
- **Augmentation** (train only): flips, crops, color jitter, etc. (Don’t augment validation/test!).


In [None]:
# MNIST (grayscale) normalization constants (precomputed on the dataset)
mnist_mean, mnist_std = (0.1307,), (0.3081,)

train_tf = transforms.Compose([
    transforms.RandomAffine(degrees=10, translate=(0.05,0.05), scale=(0.95,1.05)),
    transforms.ToTensor(),
    transforms.Normalize(mnist_mean, mnist_std),
])

test_tf = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mnist_mean, mnist_std),
])


## 2) Load a dataset
`download=True` will fetch it the first time. Switch to CIFAR-10 by replacing `MNIST` + norms.


In [None]:
train_full = datasets.MNIST(root=data_root, train=True,  transform=train_tf, download=True)
test_ds    = datasets.MNIST(root=data_root, train=False, transform=test_tf, download=True)

# Make a small validation split from the training set
val_size = 5000
train_size = len(train_full) - val_size
train_ds, val_ds = random_split(train_full, [train_size, val_size], generator=torch.Generator().manual_seed(42))

len(train_ds), len(val_ds), len(test_ds)


## 3) Build DataLoaders (performance knobs)
- **batch_size**: bigger = faster but uses more VRAM.
- **num_workers**: use >0 to load/transform in parallel (try 2–4 first).
- **pin_memory=True**: faster host→GPU transfer.
- **persistent_workers=True**: keeps workers alive between epochs for speed.


In [None]:
train_loader = DataLoader(train_ds, batch_size=128, shuffle=True,  num_workers=2, pin_memory=True, persistent_workers=True)
val_loader   = DataLoader(val_ds,   batch_size=256, shuffle=False, num_workers=2, pin_memory=True, persistent_workers=True)
test_loader  = DataLoader(test_ds,  batch_size=256, shuffle=False, num_workers=2, pin_memory=True, persistent_workers=True)

for name, loader in [('train', train_loader), ('val', val_loader), ('test', test_loader)]:
    xb, yb = next(iter(loader))
    print(f"{name:>5} batch -> x:{tuple(xb.shape)}  y:{tuple(yb.shape)}")


## 4) Plug-and-play with your model
Drop in any CNN (e.g., the MNIST net from earlier). The loaders are interchangeable as long as shapes match what your model expects.

**Quick exercise:** copy your MNIST CNN here, train for 3 epochs using these loaders, then try:
- Doubling `batch_size` (watch VRAM via `nvidia-smi`).
- Increasing `num_workers` (should improve data throughput).
- Removing augmentation (train accuracy up, val accuracy down = overfitting).
