# Mini-Batch SGD Assignment

## Instructions:

1. Log into [Pomona's Jupyter Hub](https://jupyter.pomona.edu/).
2. Clone this repository (or just pull changes if you already have it).
3. Start Jupyter (don't forget to use the CS 152 environment).
4. Duplicate this file so that you can still pull changes without merging.
5. Complete the "Questions to Answer."
6. Complete the "Things to Try."

## Questions to Answer

You will answer these questions on gradescope. Try to answer these with your partner prior to running or altering any code.

1. How could you make this code run "stochastic gradient descent (SGD)"?

    Add an optimizer line after you instantiate the model

1. How could you make this code run "batch gradient descent (BGD)"?

    Add an optimizer line after you instantiate the model

1. What is the shape of `train_X`?

    60,000 x 28 x 28

1. What is the shape of `train_output`?

    60,000 x 1 x 1

1. What values would you expect to see in the `train_output` tensor?

   Predicted letters or indexes to predicted letters

1. What is the shape of `train_Y`?

   60,000 x 1 x 1 -- i.e. the labels

1. What is the shape of the first linear layer's weight matrix?

   784 x 13

1. How many parameters are in the neural network?

   784

1. What is the purpose of the `with torch.no_grad()` ([documentation](https://pytorch.org/docs/stable/generated/torch.no_grad.html#torch.no_grad)) context manager?

    Disables gradient calculation, such to minimize computational burden. 

1. How do we compute accuracy? Describe what the code is doing.

   The system sums all of the output predictions that match the correct input labels.

    ~~~python
    # Convert network output into predictions (one-hot -> number)
    predictions = valid_output.argmax(1)

    # Sum up total number that were correct
    valid_correct += (predictions == valid_Y).type(torch.float).sum().item()
    ~~~

1. What happens when you rerun the training cell for additional epochs without rerunning any other cells?

    ?

1. What happens if you set the device to "cpu"?

   We will use the CPU rather than GPU for calculations, which may slow down computation.

    ~~~python
    # device = "cuda" if torch.cuda.is_available() else "cpu"
    device = "cpu"
    ~~~

## Things to Try

1. Change the hidden layer activation functions to sigmoid. What were the results?

1. Change the hidden layer activation functions to [something else](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity). What were the results?

1. Change the hidden layer activation functions to `nn.Identify`. What were the results?

1. (Optional) Try adding a [dropout layer](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html#torch.nn.Dropout) after each activation function. What were the results?

1. (Optional) Try changing the dataset to either [KMNIST](https://pytorch.org/vision/0.11/datasets.html#kmnist) or [Fashion-MNIST](https://pytorch.org/vision/0.11/datasets.html#fashion-mnist). What were the results?

1. (Optional) Try out the **inference** process.

    1. Save the model. 
    
    ~~~python
    # All training code above
    model_filename = "A05Model.pth"
    torch.save(model.state_dict(), model_filename)
    ~~~
    
    1. Create a new notebook.
    
    1. Load the saved model.
    
    ~~~python
    # Need to bring over some code from the training file to make this work
    model = NeuralNetwork(layer_sizes)
    model.load_state_dict(torch.load(model_filename))
    model.eval()
    
    # Index of a validation example
    i = 0

    # Example input and output
    x, y = valid_loader.dataset[i][0], valid_loader.dataset[i][1]

    with torch.no_grad():
        output = model(x)
        prediction = output[0].argmax(0)
        print(f"Prediction : {prediction}")
        print(f"Target     : {y}")
    ~~~

# Imports

In [1]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

from torchsummary import summary

from torchvision.datasets import MNIST
from torchvision.transforms import Compose, Normalize, ToTensor

from tqdm.notebook import tqdm_notebook

import pandas as pd

import matplotlib.pyplot as plt

from IPython.display import display
from jupyterthemes import jtplot

jtplot.style(context="paper")

## Set Hyperparameters

In [2]:
# Let's store the MNIST dataset in the root of your user directory
# You can delete it when you are done with this notebook
data_path = "~/data"

# Use the GPUs if they are available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using '{device}' device.")

# Model hyperparameters
neurons_per_layer = [13, 17]

# Mini-Batch SGD hyperparameters
batch_size = 256
num_epochs = 10
learning_rate = 0.01

criterion = nn.CrossEntropyLoss()
activation_function = nn.ReLU

Using 'cuda' device.


## Prepare the MNIST Dataset

In [3]:
def get_mnist_data_loaders(path, batch_size, valid_batch_size=0):

    # MNIST specific transforms
    mnist_mean = (0.1307,)
    mnist_std = (0.3081,)
    mnist_xforms = Compose([ToTensor(), Normalize(mnist_mean, mnist_std)])

    # Training data loader
    train_dataset = MNIST(root=path, train=True, download=True, transform=mnist_xforms)

    # Set the batch size to N if batch_size is 0
    tbs = len(train_dataset) if batch_size == 0 else batch_size
    train_loader = DataLoader(train_dataset, batch_size=tbs, shuffle=True)

    # Validation data loader
    valid_dataset = MNIST(root=path, train=False, download=True, transform=mnist_xforms)

    # Set the batch size to N if batch_size is 0
    vbs = len(valid_dataset) if valid_batch_size == 0 else valid_batch_size
    valid_loader = DataLoader(valid_dataset, batch_size=vbs, shuffle=True)

    return train_loader, valid_loader

In [4]:
train_loader, valid_loader = get_mnist_data_loaders(data_path, batch_size)

print("Training dataset shape   :", train_loader.dataset.data.shape)
print("Validation dataset shape :", valid_loader.dataset.data.shape)

# Notice that each example is 28x28. These are images

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to /home/miba2020/data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:01<00:00, 6.70MB/s]


Extracting /home/miba2020/data/MNIST/raw/train-images-idx3-ubyte.gz to /home/miba2020/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to /home/miba2020/data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 416kB/s]


Extracting /home/miba2020/data/MNIST/raw/train-labels-idx1-ubyte.gz to /home/miba2020/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to /home/miba2020/data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 3.20MB/s]


Extracting /home/miba2020/data/MNIST/raw/t10k-images-idx3-ubyte.gz to /home/miba2020/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to /home/miba2020/data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 14.7MB/s]

Extracting /home/miba2020/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /home/miba2020/data/MNIST/raw

Training dataset shape   : torch.Size([60000, 28, 28])
Validation dataset shape : torch.Size([10000, 28, 28])





In [None]:
# Let's plot a few images as an example
num_to_show = 8
images = train_loader.dataset.data[:num_to_show]
labels = train_loader.dataset.targets[:num_to_show]

fig, axes = plt.subplots(1, num_to_show)

for axis, image, label in zip(axes, images, labels):
    axis.imshow(image.squeeze(), cmap="Greys")
    axis.tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
    axis.set_xticks([])
    axis.set_yticks([])
    axis.set_title(f"Label: {label}")

In [None]:
# Let's look at the underlying data for a single image
train_loader.dataset.data[0]

In [None]:
# You can almost make out the "5" in the output above
# Let's make it a bit more clear
image = train_loader.dataset.data[0]
image_df = pd.DataFrame(image.squeeze().numpy())
image_df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

## Create a Neural Network

In [None]:
class NeuralNetwork(nn.Module):
    def __init__(self, layer_sizes, act_func):
        super(NeuralNetwork, self).__init__()

        # The first "layer" just rearranges the Nx28x28 input into Nx784
        first_layer = nn.Flatten()

        # The hidden layers include:
        # 1. a linear component (computing Z) and
        # 2. a non-linear comonent (computing A)
        hidden_layers = [
            nn.Sequential(nn.Linear(nlminus1, nl), act_func())
            for nl, nlminus1 in zip(layer_sizes[1:-1], layer_sizes)
        ]

        # The output layer must be Linear WITHOUT an activation. See:
        #   https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
        output_layer = nn.Linear(layer_sizes[-2], layer_sizes[-1])

        # Group all layers into the sequential container
        all_layers = [first_layer] + hidden_layers + [output_layer]
        self.layers = nn.Sequential(*all_layers)

    def forward(self, X):
        # Since we've wrapped all layers in nn.Sequential, we just have to
        # call one method and not manually pass the input forward
        return self.layers(X)

In [None]:
# The input layer size depends on the dataset
n0 = train_loader.dataset.data.shape[1:].numel()

# The output layer size depends on the dataset
nL = len(train_loader.dataset.classes)

# Preprend the input and append the output layer sizes
layer_sizes = [n0] + neurons_per_layer + [nL]
model = NeuralNetwork(layer_sizes, activation_function).to(device)

summary(model);

## Train Classifier

In [None]:
# Information for plots
fig, ax = plt.subplots()
dh = display(fig, display_id=True)

train_losses = []
valid_losses = []

for epoch in tqdm_notebook(range(num_epochs), desc="Training epochs"):

    #
    # Training
    #
    
    model.train()

    train_N = len(train_loader.dataset)
    num_train_batches = len(train_loader)
    train_dataiterator = iter(train_loader)

    train_loss_mean = 0

    # for batch in progress_bar(range(num_train_batches), parent=mb):
    for batch in tqdm_notebook(range(num_train_batches), desc="Training batches", leave=False):

        # Grab the batch of data and send it to the correct device
        train_X, train_Y = next(train_dataiterator)
        train_X, train_Y = train_X.to(device), train_Y.to(device)

        # Compute the output
        train_output = model(train_X)

        # Compute loss
        train_loss = criterion(train_output, train_Y)

        num_in_batch = len(train_X)
        tloss = train_loss.item() * num_in_batch / train_N
        train_loss_mean += tloss
        train_losses.append(train_loss.item())

        # Compute partial derivatives
        model.zero_grad()
        train_loss.backward()

        # Update parameters
        with torch.no_grad():
            for param in model.parameters():
                param -= learning_rate * param.grad

    #
    # Validation
    #
    
    model.eval()

    valid_N = len(valid_loader.dataset)
    num_valid_batches = len(valid_loader)

    valid_loss_mean = 0
    valid_correct = 0

    with torch.no_grad():

        # valid_loader is probably just one large batch, so not using progress bar
        for valid_X, valid_Y in valid_loader:

            valid_X, valid_Y = valid_X.to(device), valid_Y.to(device)

            valid_output = model(valid_X)

            valid_loss = criterion(valid_output, valid_Y)

            num_in_batch = len(valid_X)
            vloss = valid_loss.item() * num_in_batch / valid_N
            valid_loss_mean += vloss
            valid_losses.append(valid_loss.item())

            # Convert network output into predictions (one-hot -> number)
            predictions = valid_output.argmax(1)

            # Sum up total number that were correct
            valid_correct += (predictions == valid_Y).type(torch.float).sum().item()

    valid_accuracy = 100 * (valid_correct / valid_N)

    # 
    # Report information
    # 
    
    tloss = f"Train Loss = {train_loss_mean:.4f}"
    vloss = f"Valid Loss = {valid_loss_mean:.4f}"
    vaccu = f"Valid Accuracy = {(valid_accuracy):>0.1f}%"
    print(f"[{epoch+1:>2}/{num_epochs}] {tloss}; {vloss}; {vaccu}")

    # 
    # Update plot
    # 
    
    max_loss = max(max(train_losses), max(valid_losses))
    min_loss = min(min(train_losses), min(valid_losses))
    
    x_margin = 0.2
    x_bounds = [0 - x_margin, num_epochs + x_margin]

    y_margin = 0.1
    y_bounds = [min_loss - y_margin, max_loss + y_margin]

    train_xaxis = torch.linspace(0, epoch + 1, len(train_losses))
    valid_xaxis = torch.linspace(1, epoch + 1, len(valid_losses))
    graph_data = [[train_xaxis, train_losses], [valid_xaxis, valid_losses]]

    ax.clear()
    
    ax.set_xlabel("Epoch")
    ax.set_ylabel("Loss")

    ax.set_xlim(x_bounds)
    ax.set_ylim(y_bounds)

    ax.plot(train_xaxis, train_losses, label="Train")
    ax.plot(valid_xaxis, valid_losses, label="Valid")
    ax.legend(loc="upper right")

    dh.update(fig)