Sam Tempel's homework for session 2.

# Homework: train a Nonlinear Classifier

1. Write some code to train the NonlinearClassifier.
2. Create a data loader for the test data and check your model's accuracy on the test data. 

If you have time, experiment with how to improve the model. Note: training and validation data can be used to compare models, but test data should be saved until the end as a final check of generalization. 

I'm still learning Jupyter notebooks, so apologies if this is weirdly formatted.

First, I'm going to copy/paste the needed code from the lecture notebook.


In [1]:
%matplotlib inline

import torch
import torchvision
from torch import nn

import numpy 
import matplotlib.pyplot as plt
import time

In [2]:
training_data = torchvision.datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor()
)

test_data = torchvision.datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor()
)


In [3]:
training_data, validation_data = torch.utils.data.random_split(training_data, [0.8, 0.2], generator=torch.Generator().manual_seed(55))
print('MNIST data loaded: train:',len(training_data),' examples, validation: ', len(validation_data), 'examples, test:',len(test_data), 'examples')
print('Input shape', training_data[0][0].shape)


MNIST data loaded: train: 48000  examples, validation:  12000 examples, test: 10000 examples
Input shape torch.Size([1, 28, 28])


In [4]:
batch_size = 128

# The dataloader makes our dataset iterable 
train_dataloader128 = torch.utils.data.DataLoader(training_data, batch_size=batch_size)
val_dataloader128 = torch.utils.data.DataLoader(validation_data, batch_size=batch_size)

In [5]:
batch_size = 256

# The dataloader makes our dataset iterable 
train_dataloader256 = torch.utils.data.DataLoader(training_data, batch_size=batch_size)
val_dataloader256 = torch.utils.data.DataLoader(validation_data, batch_size=batch_size)

Alright, this time I'm using the nonlinear model as defined in the lecture notebook.

In [6]:
class NonlinearClassifier(nn.Module):

    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.layers_stack = nn.Sequential(
            nn.Linear(28*28, 50),
            nn.ReLU(),
            ##nn.Dropout(0.2),
            nn.Linear(50, 50),
            nn.ReLU(),
            ##nn.Dropout(0.2),
            nn.Linear(50, 10)
        )
        
    def forward(self, x):
        x = self.flatten(x)
        x = self.layers_stack(x)

        return x

I'm going to create two nonlinear model objects, one for each batch size. I'll leave learning rate alone for now.

In [7]:
nonlinear_model128 = NonlinearClassifier()
nonlinear_model256 = NonlinearClassifier()

loss_fn = nn.CrossEntropyLoss()
optimizer128 = torch.optim.SGD(nonlinear_model128.parameters(), lr=0.01)
optimizer256 = torch.optim.SGD(nonlinear_model256.parameters(), lr=0.01)

Looks alright. It seems I can recycle the training and evaluate function, and try going through some training epochs.

In [8]:
def train_one_epoch(dataloader, model, loss_fn, optimizer):
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # forward pass
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # backward pass calculates gradients
        loss.backward()
        
        # take one step with these gradients
        optimizer.step()
        
        # resets the gradients 
        optimizer.zero_grad()

In [9]:
def evaluate(dataloader, model, loss_fn):
    # Set the model to evaluation mode - some NN pieces behave differently during training
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    loss, correct = 0, 0

    # We can save computation and memory by not calculating gradients here - we aren't optimizing 
    with torch.no_grad():
        # loop over all of the batches
        for X, y in dataloader:
            pred = model(X)
            loss += loss_fn(pred, y).item()
            # how many are correct in this batch? Tracking for accuracy 
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    loss /= num_batches
    correct /= size
    
    accuracy = 100*correct
    return accuracy, loss

Alright, let's run the one with batch_size=128 through 10 epochs and see what happens.

Let's also define a function for the training a number of epochs.

In [10]:
%%time

def training_session(epochs, dataloader, model, loss_fn, optimizer):
    for j in range(epochs):
        train_one_epoch(dataloader, model, loss_fn, optimizer)

        # checking on the training loss and accuracy once per epoch
        acc, loss = evaluate(dataloader, model, loss_fn)
        print(f"Epoch {j}: training loss: {loss}, accuracy: {acc}")

epochs = 10
training_session(epochs, train_dataloader128, nonlinear_model128, loss_fn, optimizer128)

Epoch 0: training loss: 2.054208186149597, accuracy: 43.4
Epoch 1: training loss: 1.1352396494547525, accuracy: 75.03958333333334
Epoch 2: training loss: 0.6668230414390564, accuracy: 82.23125
Epoch 3: training loss: 0.5226160089969635, accuracy: 85.6625
Epoch 4: training loss: 0.4524427552223206, accuracy: 87.60208333333334
Epoch 5: training loss: 0.4110342762072881, accuracy: 88.60625
Epoch 6: training loss: 0.3838413459857305, accuracy: 89.3375
Epoch 7: training loss: 0.3642228608528773, accuracy: 89.89583333333333
Epoch 8: training loss: 0.3488120183547338, accuracy: 90.31041666666667
Epoch 9: training loss: 0.33597377097606657, accuracy: 90.62291666666667
CPU times: user 36min 15s, sys: 45.2 s, total: 37min
Wall time: 1min 9s


Same thing for the one with batch_size=256.

In [11]:
%%time
training_session(epochs, train_dataloader256, nonlinear_model256, loss_fn, optimizer256)

Epoch 0: training loss: 2.240848493068776, accuracy: 25.96875
Epoch 1: training loss: 2.0775381846630827, accuracy: 37.329166666666666
Epoch 2: training loss: 1.7491438699529527, accuracy: 51.4375
Epoch 3: training loss: 1.3178486551376098, accuracy: 70.375
Epoch 4: training loss: 0.9613915868896119, accuracy: 77.85416666666667
Epoch 5: training loss: 0.7571233054424854, accuracy: 81.19375
Epoch 6: training loss: 0.6360275720028167, accuracy: 83.50416666666666
Epoch 7: training loss: 0.5559374755050274, accuracy: 85.12916666666666
Epoch 8: training loss: 0.5009567452237961, accuracy: 86.40833333333333
Epoch 9: training loss: 0.4625019635608856, accuracy: 87.33333333333333
CPU times: user 35min 57s, sys: 43.8 s, total: 36min 41s
Wall time: 1min 8s


Seems to have no effect on speed and some change in accuracy. Let's also try the nonlinear classifier with dropouts.

In [12]:
class NonlinearClassifierWithDrops(nn.Module):

    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.layers_stack = nn.Sequential(
            nn.Linear(28*28, 50),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(50, 50),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(50, 10)
        )
        
    def forward(self, x):
        x = self.flatten(x)
        x = self.layers_stack(x)

        return x

In [13]:
droppy_model128 = NonlinearClassifierWithDrops()
droppy_model256 = NonlinearClassifierWithDrops()

optimizerD128 = torch.optim.SGD(droppy_model128.parameters(), lr=0.01)
optimizerD256 = torch.optim.SGD(droppy_model256.parameters(), lr=0.01)

In [14]:
%%time
training_session(epochs, train_dataloader128, droppy_model128, loss_fn, optimizerD128)

Epoch 0: training loss: 2.162392141342163, accuracy: 43.51875
Epoch 1: training loss: 1.467673095703125, accuracy: 66.62291666666667
Epoch 2: training loss: 0.9030340145428976, accuracy: 78.1375
Epoch 3: training loss: 0.6756682435671488, accuracy: 83.25625
Epoch 4: training loss: 0.5637385493119558, accuracy: 85.51875
Epoch 5: training loss: 0.4956769909063975, accuracy: 86.85833333333333
Epoch 6: training loss: 0.45151536536216735, accuracy: 87.89375
Epoch 7: training loss: 0.4186584941546122, accuracy: 88.59166666666667
Epoch 8: training loss: 0.39290662050247194, accuracy: 89.17291666666667
Epoch 9: training loss: 0.37290904625256854, accuracy: 89.62708333333333
CPU times: user 36min 34s, sys: 45.4 s, total: 37min 20s
Wall time: 1min 10s


And of course the larger batch size.

In [15]:
%%time
training_session(epochs, train_dataloader256, droppy_model256, loss_fn, optimizerD256)

Epoch 0: training loss: 2.243618130683899, accuracy: 26.49375
Epoch 1: training loss: 2.105739229537071, accuracy: 44.875
Epoch 2: training loss: 1.7990784860671836, accuracy: 56.574999999999996
Epoch 3: training loss: 1.400007629648168, accuracy: 68.34791666666666
Epoch 4: training loss: 1.0643982535347025, accuracy: 75.4875
Epoch 5: training loss: 0.8561332070447029, accuracy: 79.86666666666666
Epoch 6: training loss: 0.730108638710164, accuracy: 82.22708333333333
Epoch 7: training loss: 0.6462055064262228, accuracy: 83.69583333333334
Epoch 8: training loss: 0.5895016921010423, accuracy: 84.88333333333333
Epoch 9: training loss: 0.5472904966866716, accuracy: 85.68124999999999
CPU times: user 36min 28s, sys: 46.2 s, total: 37min 14s
Wall time: 1min 9s


Let's check validity for the first version (without dropouts), batch_size=128.

In [16]:
acc_val, loss_val = evaluate(val_dataloader128, nonlinear_model128, loss_fn)
print("Validation loss: %.4f, validation accuracy: %.2f%%" % (loss_val, acc_val))

Validation loss: 0.3242, validation accuracy: 90.58%


And the same for the batch_size=256.

In [17]:
acc_val, loss_val = evaluate(val_dataloader256, nonlinear_model256, loss_fn)
print("Validation loss: %.4f, validation accuracy: %.2f%%" % (loss_val, acc_val))

Validation loss: 0.4558, validation accuracy: 87.35%


Now droppy version, batch_size=128.

In [18]:
acc_val, loss_val = evaluate(val_dataloader128, droppy_model128, loss_fn)
print("Validation loss: %.4f, validation accuracy: %.2f%%" % (loss_val, acc_val))

Validation loss: 0.3658, validation accuracy: 89.54%


And droppy version, batch_size=256.

In [19]:
acc_val, loss_val = evaluate(val_dataloader256, droppy_model256, loss_fn)
print("Validation loss: %.4f, validation accuracy: %.2f%%" % (loss_val, acc_val))

Validation loss: 0.5437, validation accuracy: 85.64%


Cool. Definitely more to explore by changing learning rate and fiddling with epoch count, but I'll call it quits here.