# MNIST MLP With Training, Validation and Test Sets

In this notebook we build a simple MLP for the MNIST dataset. The interesting part is the split of the training `DataLoader` into a training and a validation part, by means of an instance of the `SubsetRandomSampler` class. In fact, it's not possible to directly index or subset a `DataLoader`. This is why the `sampler=` argument in a `DataLoader` is so important.

In [45]:
import numpy as np
import torch
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler
import torch.nn as nn
import torch.nn.functional as F

This notebook was written to run on my desktop computer. If you want to run it elsewhere you have to change the location where the MNIST dataset will be downloaded.

In [46]:
# location of the MNIST images
ROOT = '~/.pytorch'
# number of subprocesses to use for data loading
NUM_WORKERS = 0
# how many samples per batch to load
BATCH_SIZE = 32
# percentage of training set to use as validation
VALID_SIZE = 0.2
# Size of the hidden layers
HIDDEN_SIZE = 512
# Size of the output layer
OUTPUT_SIZE = 10

# Set device
CUDA_DEVICE = 'cuda'
if torch.cuda.is_available():
    device = torch.device(CUDA_DEVICE)
else:
    torch.device('cpu')

**QUESTION**: when we create the training and the validation `DataLoaders`, it appears to me that we are forced to use the same transformations for both. Therefore, how can I apply data augmentation to the training data loader but not to the validation one? Do we have to split the data in different folders? Probably.

We start as usual from the training and test `DataLoaders` that come with MNIST.

In [47]:
# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# choose the training and test datasets
train_data = datasets.MNIST(root=ROOT, train=True,
                            download=True, transform=transform)
test_data = datasets.MNIST(root=ROOT, train=False,
                           download=True, transform=transform)

The code below creates the samplers for the training and the validation sets.

In [48]:
# obtain training indices that will be used for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(VALID_SIZE * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)

With these samplers, we can build the three data loaders.

In [49]:
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=BATCH_SIZE,
                                           sampler=train_sampler,
                                           num_workers=NUM_WORKERS)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=BATCH_SIZE,
                                           sampler=valid_sampler,
                                           num_workers=NUM_WORKERS)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=BATCH_SIZE,
                                          num_workers=NUM_WORKERS)

images, labels = next(iter(train_loader))

We can now define the network architecture.

In [50]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, HIDDEN_SIZE)
        self.fc2 = nn.Linear(HIDDEN_SIZE, HIDDEN_SIZE)
        self.fc3 = nn.Linear(HIDDEN_SIZE, OUTPUT_SIZE)

    def forward(self, x):
        x = x.view(-1, 784)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

For convenience, we can define a function that runs the typical operations of a training or validation loop and, based on this, create a `fit` function.

In [51]:
def train_valid_loop(dataloader, model, criterion, optimizer, every=10, what='train'):
    running_loss = 0
    running_acc = 0
    batch_num = 0

    for images, labels in dataloader:
        batch_num += 1
        images, labels = images.to(device), labels.to(device)
        logits = model(images)
        loss = criterion(logits, labels)
        if what == 'train':
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        running_loss += loss
        _, preds = torch.max(logits, dim=1)
        running_acc += (preds == labels).type(torch.IntTensor).sum()

        if batch_num % every == 0:
            print(('Batch N.: {:3}, ' + what.title() + 'Loss: {:.3f}, ' +
                   what.title() + 'Acc.: {:.3f}').format(
                       batch_num,
                       running_loss.item()/(batch_num*dataloader.batch_size),
                       running_acc.item()/(batch_num*dataloader.batch_size)
            ))

The fit function will run the `train_valid_loop` function with or without the computation of the gradients, based on the type of data loader.

In [52]:
def fit(epochs, model, criterion, optimizer, every):
    for epoch in range(epochs):
        print(f'--- Epoch: {epoch} ---')
        # Train loop
        model.train()
        print('\n--- Train loop ----\n')
        train_valid_loop(train_loader, model, criterion,
                         optimizer, every, 'train')

        # Validation loop
        model.eval()
        print('\n--- Validation loop ---\n')
        with torch.no_grad():
            train_valid_loop(valid_loader, model, criterion,
                             optimizer, every, what='valid')


We can now instantiate the model, define the loss function and the optimizer, and run the training/validation loop.

In [53]:
model = Net()
model = model.to(device)
criterion = nn.CrossEntropyLoss(reduction='sum')
optimizer = torch.optim.RMSprop(model.parameters())

In [54]:
fit(3, model, criterion, optimizer, every=100)

--- Epoch: 0 ---

--- Train loop ----

Batch N.: 100, TrainLoss: 12.142, TrainAcc.: 0.627
Batch N.: 200, TrainLoss: 6.354, TrainAcc.: 0.735
Batch N.: 300, TrainLoss: 4.396, TrainAcc.: 0.776
Batch N.: 400, TrainLoss: 3.398, TrainAcc.: 0.805
Batch N.: 500, TrainLoss: 2.794, TrainAcc.: 0.822
Batch N.: 600, TrainLoss: 2.398, TrainAcc.: 0.834
Batch N.: 700, TrainLoss: 2.107, TrainAcc.: 0.844
Batch N.: 800, TrainLoss: 1.888, TrainAcc.: 0.851
Batch N.: 900, TrainLoss: 1.711, TrainAcc.: 0.859
Batch N.: 1000, TrainLoss: 1.575, TrainAcc.: 0.864
Batch N.: 1100, TrainLoss: 1.461, TrainAcc.: 0.869
Batch N.: 1200, TrainLoss: 1.370, TrainAcc.: 0.872
Batch N.: 1300, TrainLoss: 1.289, TrainAcc.: 0.875
Batch N.: 1400, TrainLoss: 1.218, TrainAcc.: 0.879
Batch N.: 1500, TrainLoss: 1.160, TrainAcc.: 0.881

--- Validation loop ---

Batch N.: 100, ValidLoss: 0.252, ValidAcc.: 0.936
Batch N.: 200, ValidLoss: 0.251, ValidAcc.: 0.935
Batch N.: 300, ValidLoss: 0.266, ValidAcc.: 0.933
--- Epoch: 1 ---

--- Train 