<a href="https://colab.research.google.com/github/olinml2024/notebooks/blob/main/ML24_Assignment08_part_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 8 Companion Notebook (part 2): Neural Network Implementation in Pytorch

Learning Objectives:
* Implement a multi-layer perceptron in pytorch
* Create learning curves to understand model training
* Learn about batch-based training

## Recognizing Digits

Recognizing handwritten digits using machine learning is embedded so thoroughly into the discipline's early history, that anytime someone in the field brings it up at a conference you are likely to elicit groans from the audience.  Feel free to groan at us whenever we mention it in class.

We're going to use a digit recognition dataset that is built into sklearn.  It's not the famous MNIST dataset, but it's the same idea.  Once we've loaded the digits, we'll create some models using pytorch to predict the identity of a digit from its pixels.

In [None]:
from sklearn.datasets import load_digits

digits = load_digits()
print(digits.DESCR)

Here are some sample images from the dataset.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
fig, axs = plt.subplots(4, 2)
axs = axs.flatten()             # we flatten the axes to make indexing easier
for i in range(8):
    axs[i].imshow(digits['data'][i,:].reshape((8,8)), cmap='gray')
    axs[i].axes.xaxis.set_visible(False)
    axs[i].axes.yaxis.set_visible(False)

plt.show()

Here's the part we will look at in class.



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.model_selection import train_test_split
import numpy as np

class LogisticRegression(nn.Module):
    """ A model that implements a logistic regression classifier. """
    def __init__(self, input_size, num_classes):
        super(LogisticRegression, self).__init__()
        # initialize the model weights
        self.linear = nn.Linear(input_size, num_classes)

    def forward(self, x):
        """ Implement the forward pass of the model. """
        out = self.linear(x)
        # We leave of the softmax here as we are going to use cross entropy loss
        # which applies the softmax for us.
        # out = F.softmax(out, dim=1)
        return out

# see if we can run on the GPU (change your runtime to T4 if you want cuda)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# each input is 64 features (8x8 pixels) and there are 10 possible digits
model = LogisticRegression(64, 10).to(device)

X_train, X_test, y_train, y_test = train_test_split(digits['data'],
                                                    digits['target'],
                                                    test_size=0.3,
                                                    random_state=42)

# we need to convert from numpy to pytorch and also move teh data to the GPU
# (if we are running on a machine with a GPU)
X_train = torch.from_numpy(X_train.astype(np.float32)).to(device)
y_train = torch.from_numpy(y_train).to(device)
X_test = torch.from_numpy(X_test.astype(np.float32)).to(device)
y_test = torch.from_numpy(y_test).to(device)

# you might need to adjust this if you are not using a GPU instance
n_epochs = 1000
learning_rate = 0.01
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
train_losses = np.zeros((n_epochs,))
test_losses = np.zeros((n_epochs,))
accuracies = np.zeros((n_epochs,))

for epoch in range(n_epochs):
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    train_losses[epoch] = loss.item()

    with torch.no_grad():
        test_outputs = model(X_test)
        test_loss = criterion(test_outputs, y_test)
        test_losses[epoch] = test_loss.item()
        accuracies[epoch] = (torch.sum(test_outputs.argmax(dim=1) == y_test) / y_test.shape[0]).item()
    loss.backward()
    optimizer.step()

plt.figure()
plt.plot(range(n_epochs), train_losses, label='training cross-entropy')
plt.plot(range(n_epochs), test_losses, label='testing cross-entropy')
plt.xlabel('step')
plt.legend()
plt.show()

plt.figure()
plt.plot(range(n_epochs), accuracies)
plt.xlabel('step')
plt.ylabel('accuarcy')
plt.show()

### Part 2 Exercise 1

Using the code above as a starting point, write code to train an MLP on the same dataset.  Your MLP should allow you to specify the number of hidden units to use.  Create a plot that compares the training and test cross entropies for both the logistic regression model and the MLP.  Create a similar plot that shows the same for accuracy.

*Bonus: experiment with different numbers of hidden units and tell us what you find in terms of changes in accuracy or learning time*

### Solution

In [None]:
# class MLP(nn.Module):
#     def __init__(self, input_size, hidden_size, num_classes):
#         super(MLP, self).__init__()
#         self.linear_1 = nn.Linear(input_size, hidden_size)
#         self.linear_2 = nn.Linear(hidden_size, num_classes)

#     def forward(self, x):
#         out = self.linear_1(x)
#         out = F.sigmoid(out)
#         out = self.linear_2(out)
#         # again: leave this out since our loss function already incorporates it
#         # out = F.softmax(out, dim=1)
#         return out

# model = MLP(64, 50, 10).to(device)

# n_epochs_mlp = 2000
# learning_rate = 0.1
# criterion = nn.CrossEntropyLoss()
# optimizer = optim.SGD(model.parameters(), lr=learning_rate)
# train_losses_mlp = np.zeros((n_epochs_mlp,))
# test_losses_mlp = np.zeros((n_epochs_mlp,))
# accuracies_mlp = np.zeros((n_epochs_mlp,))

# for epoch in range(n_epochs_mlp):
#     optimizer.zero_grad()
#     outputs = model(X_train)
#     loss = criterion(outputs, y_train)
#     train_losses_mlp[epoch] = loss.item()

#     with torch.no_grad():
#         test_outputs = model(X_test)
#         test_loss = criterion(test_outputs, y_test)
#         test_losses_mlp[epoch] = test_loss.item()
#         accuracies_mlp[epoch] = (torch.sum(test_outputs.argmax(dim=1) == y_test) / y_test.shape[0]).item()
#     loss.backward()
#     optimizer.step()

# plt.figure()
# plt.plot(range(n_epochs_mlp), train_losses_mlp, label='train cross-entropy (mlp)')
# plt.plot(range(n_epochs_mlp), test_losses_mlp, label='test cross-entropy (mlp)')
# plt.plot(range(n_epochs), train_losses, label='train cross-entropy (logistic)')
# plt.plot(range(n_epochs), test_losses, label='test cross-entropy (logistic)')
# plt.xlabel('step')
# plt.legend()
# plt.show()

# plt.figure()
# plt.plot(range(n_epochs_mlp), accuracies_mlp, label='accuracy (mlp)')
# plt.plot(range(n_epochs), accuracies, label='accuracy (logistic)')
# plt.xlabel('step')
# plt.legend()
# plt.show()

# print(f"final MLP accuracy: {accuracies_mlp[-1]}")
# print(f"final MLP training loss: {train_losses_mlp[-1]}")

## Optimizing Using Batches

So far we've been using gradient descent where we take the gradient of the our loss function with computed over *all* of our training data.  There are many situations where this will not work (e.g., if you have a really big dataset, you may not be able to fit all of your data in the computer's memory at one time).

An alternative to training where we process data all at once is to process data in batches.  A batch of data is simply a randomly chosen subset of the training data (e.g., we might choose a set of a 100 random images of digits).  In the next exercise you'll be modifying your code to use batch-based.

### Part 2 Exercise 2

Modify the code from Exercise 1 to train using batches.  Your code should take an optimization step for each batch of data.  There are plenty of resources online to help with this, but here is some code that might be helpful (see next cell).  This code wraps the training data in a dataset object and then creates a data loader that will allow us to iterate over batches of data.

Your code should create a plot or some sort of output that helps you understand how batch-based training compares to training on the whole dataset each step.

Make some notes about what you perceive to be different between the two types of training.

In [None]:
trainset = torch.utils.data.TensorDataset(X_train, y_train)
trainloader = torch.utils.data.DataLoader(trainset,
                                          batch_size=50,
                                          shuffle=True)

### Solution

Using batches for training seems to achieve high accuracy much faster.  The training and test losses continue to decline after the accuracy saturates.
 At least for this problem, there doesn't seem to be much of a downside of this approach versus using the whole dataset for each gradient descent step.

In [None]:
# class MLP(nn.Module):
#     def __init__(self, input_size, hidden_size, num_classes):
#         super(MLP, self).__init__()
#         self.linear_1 = nn.Linear(input_size, hidden_size)
#         self.linear_2 = nn.Linear(hidden_size, num_classes)

#     def forward(self, x):
#         out = self.linear_1(x)
#         out = F.sigmoid(out)
#         out = self.linear_2(out)
#         # leave this out since CrossEntropyLoss takes care of it for us
#         # out = F.softmax(out, dim=1)
#         return out

# model = MLP(64, 50, 10).to(device)

# n_epochs_mlp = 100
# learning_rate = 0.05
# criterion = nn.CrossEntropyLoss()
# optimizer = optim.SGD(model.parameters(), lr=learning_rate)
# train_losses_mlp = np.zeros((n_epochs_mlp,))
# test_losses_mlp = np.zeros((n_epochs_mlp,))
# accuracies_mlp = np.zeros((n_epochs_mlp,))

# trainset = torch.utils.data.TensorDataset(X_train, y_train)
# trainloader = torch.utils.data.DataLoader(trainset,
#                                           batch_size=50,
#                                           shuffle=True)
# for epoch in range(n_epochs_mlp):
#     with torch.no_grad():
#         test_outputs = model(X_test)
#         test_loss = criterion(test_outputs, y_test)
#         test_losses_mlp[epoch] = test_loss.item()
#         accuracies_mlp[epoch] = (torch.sum(test_outputs.argmax(dim=1) == y_test) / y_test.shape[0]).item()

#         train_outputs = model(X_train)
#         train_loss = criterion(train_outputs, y_train)
#         train_losses_mlp[epoch] = train_loss.item()

#     for i, (inputs, labels) in enumerate(trainloader):
#         optimizer.zero_grad()
#         outputs = model(inputs)
#         loss = criterion(outputs, labels)
#         loss.backward()
#         optimizer.step()

# plt.figure()
# plt.plot(range(n_epochs_mlp), train_losses_mlp, label='train cross-entropy (mlp)')
# plt.plot(range(n_epochs_mlp), test_losses_mlp, label='test cross-entropy (mlp)')
# plt.plot(range(n_epochs), train_losses, label='train cross-entropy (logistic)')
# plt.plot(range(n_epochs), test_losses, label='test cross-entropy (logistic)')
# plt.xlabel('step')
# plt.legend()
# plt.show()

# plt.figure()
# plt.plot(range(n_epochs_mlp), accuracies_mlp, label='accuracy (mlp)')
# plt.plot(range(n_epochs), accuracies, label='accuracy (logistic)')
# plt.xlabel('step')
# plt.legend()
# plt.show()

# print(f"final MLP accuracy: {accuracies_mlp[-1]}")
# print(f"final MLP training loss: {train_losses_mlp[-1]}")