# MNIST Digit Classification
Now that you have some basic familiarity with Pytorch and torch tensors, we can start to work with Pytorch to create an image classifier with a neural network. 

## Getting Started
We first need to import all the necessary modules from Pytorch that we'll need to make our image classification neural network. 

In [4]:
import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

## Defining Hyperparameters
Hyperparameters are how you as an engineer control the performance of the model itself. 

In [3]:
input_size = 784
num_classes = 10 
num_epochs = 2
batch_size = 100 
lr = 1e-3

An explanation of each hyperparameter:


*   ```input_size```: The size of the input data
*   ```num_classes```: The number of output classes
*   ```num_epochs```: The number of times the training data will be passed over
*   ```batch_size```: The number of images we process at each training step
*   ```lr```: The Learning Rate at which the optimizer updates each weight

## Downloading MNIST
We will now download and load the MNIST dataset. We load data  to train on, as well as test data to track the model's performance. Here we use the dataset module from Pytorch in order to download and load MNIST. We also use the transform package to make everything Torch tensors. 

In [5]:
train_data = datasets.MNIST(root = './data', train = True,
                        transform = transforms.ToTensor(), download = True)

test_data = datasets.MNIST(root = './data', train = False,
                       transform = transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

Processing...
Done!


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


Next we use a Pytorch dataloader so we can efficiently send batches of training data from our dataset to our model. The Pytorch dataloader is a memory efficient way to accomplish this.

In [6]:
train_loader = torch.utils.data.DataLoader(dataset = train_data,
                                             batch_size = batch_size,
                                             shuffle = True)

test_loader = torch.utils.data.DataLoader(dataset = test_data,
                                      batch_size = batch_size, 
                                      shuffle = False)

### Visualize Data

In [None]:
classes = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(train_loader)
images, labels = dataiter.next()

# show images
imshow(utils.make_grid(images))
print(' '.join('%d' % classes[labels[j]] for j in range(batch_size)))

## Define Network Architecture
After loading the data, we now need to define our network architecture using the torch nn module. We do this by inheriting from the nn.Module class. Our initial basic model is very simple. We have one hidden layer with 500 neurons, connecting our input to our output. The activation function we use is ReLU.

In [7]:
# Create a 1-Layer Neural Network called Net
class Net(nn.Module):
  def __init__(self, input_size, num_classes):
    super(Net,self).__init__()
    self.fc1 = nn.Linear(input_size, 500)
    self.relu = nn.ReLU()
    self.fc2 = nn.Linear(500, num_classes)
  
  def forward(self,x):
    out = self.fc1(x)
    out = self.relu(out)
    out = self.fc2(out)
    return out

For each network you define you must specify the layers and order of each layers in the class in the constructor. You specify how the forward propagation works for each layer using the forward function. The backpropagation step will then be automatically done by Pytorch with order of execution specified as the reverse of the forward function. 



## Load the Network

In [8]:
net = Net(input_size, num_classes)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
net.to(device)


Net(
  (fc1): Linear(in_features=784, out_features=500, bias=True)
  (relu): ReLU()
  (fc2): Linear(in_features=500, out_features=10, bias=True)
)


## Define a Loss Function and Instantiate an Optimizer

In [9]:
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

We define a loss function variable and then instantiate an optimizer according to our network parameters with the learning rate we specified in our hyperparameters. Here we use the ADAM optimizer instead of the gradient descent algorithm we studied in class. They both still accomplish the same thing for us, which is updating the weights of our network. If you'd like to know the details of ADAM and optimization in general, I encourage you to read through [this textbook chapter](https://d2l.ai/chapter_optimization/).

## Start the Training Loop
After loading our data, specifying the model architecture, defining a loss function, and specifying an optimizer, we can move to actually training the model. The following code demonstrates how to run the training loop.

In [10]:
# Iterate through all Epochs
for epoch in range(num_epochs):
  # Iterate through training dataset
  for i, data in enumerate(train_loader, 0):
    # Flatten images and load images/labels
    images, labels = data[0].cuda(), data[1].cuda()
    images = images.view(-1, input_size)
    # Zero collected gradients at each step
    optimizer.zero_grad()
    # Forward Propagate
    outputs = net(images)
    # Calculate Loss
    loss = loss_function(outputs, labels)
    # Back propagate
    loss.backward()
    # Update weights
    optimizer.step()
    
    # Print statistics on every 100th iteration
    if (i+1) % 100 == 0:
      print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                 %(epoch+1, num_epochs, i+1, 
                   len(train_data)//batch_size, loss.item()))

Epoch [1/2], Step [100/600], Loss: 0.2462
Epoch [1/2], Step [200/600], Loss: 0.2000
Epoch [1/2], Step [300/600], Loss: 0.3208
Epoch [1/2], Step [400/600], Loss: 0.2939
Epoch [1/2], Step [500/600], Loss: 0.1455
Epoch [1/2], Step [600/600], Loss: 0.0836
Epoch [2/2], Step [100/600], Loss: 0.0577
Epoch [2/2], Step [200/600], Loss: 0.0342
Epoch [2/2], Step [300/600], Loss: 0.0702
Epoch [2/2], Step [400/600], Loss: 0.0867
Epoch [2/2], Step [500/600], Loss: 0.0984
Epoch [2/2], Step [600/600], Loss: 0.1198


## Evaluate the Performance of your Model
The following code snippet demonstrates how to evaluate the performane of the model's current weights using the test data we loaded earlier.

In [11]:
correct = 0
total = 0
with torch.no_grad():
    for test_data in test_loader:
        images, labels = test_data[0].cuda(), test_data[1].cuda()
        images = images.view(-1, input_size)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Accuracy of the network on the 10000 test images: 97 %


## Utility Functions
Below are two utility functions that wrap the training loop and test accuracy functions into singular methods. This should help you to more easily iterate over different hyperparameter configurations for the assignment. The second cell illustrates the usage of both functions with our initial model. 

In [12]:
def fit(model, loss_fn, optimizer, train_loader, batch_size, num_epochs, input_size, stat_count=100, device=None):
    if device is not None:
        model.to(device)
    else:
        device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
        model.to(device)
    # Iterate through all Epochs
    for epoch in range(num_epochs):
        # Iterate through training dataset
        for i, data in enumerate(train_loader, 0):
            # Flatten images and load images/labels onto GPU
            images, labels = data[0].to(device), data[1].to(device)
            images = images.view(-1, input_size)
            # Zero collected gradients at each step
            optimizer.zero_grad()
            # Forward Propagate
            outputs = model(images)
            # Calculate Loss
            loss = loss_fn(outputs, labels)
            # Back propagate
            loss.backward()
            # Update weights
            optimizer.step()
            
            # Print statistics on every stat_count iteration
            if (i+1) % stat_count == 0:
                print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f'
                            %(epoch+1, num_epochs, i+1, 
                            len(train_loader), loss.item()))

def test_accuracy(model, test_loader, input_size, device=None):
    if device is not None:
        model.to(device)
    else:
        device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
        model.to(device)
    correct = 0
    total = 0
    with torch.no_grad():
        for test_data in test_loader:
            images, labels = test_data[0].cuda(), test_data[1].cuda()
            images = images.view(-1, input_size)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    print('Accuracy of the network on the 10000 test images: %d %%' % (
        100 * correct / total))



In [13]:
# Define Hyperparams
input_size = 784 # img_size = (28,28)
num_classes = 10 
num_epochs = 10 
batch_size = 100 
lr = 1e-3

# Define Model
class Net(nn.Module):
  def __init__(self, input_size, num_classes):
    super(Net,self).__init__()
    self.fc1 = nn.Linear(input_size, 500)
    self.relu = nn.ReLU()
    self.fc2 = nn.Linear(500, num_classes)
  
  def forward(self,x):
    out = self.fc1(x)
    out = self.relu(out)
    out = self.fc2(out)
    return out

# Instantiate Model and move to GPU
net = Net(input_size, num_classes)

# Define Loss Function/Optimizer
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
fit(model=net, loss_fn=loss_function, optimizer=optimizer, train_loader=train_loader, batch_size=batch_size, num_epochs=2, input_size=input_size)
test_accuracy(model=net, test_loader=test_loader, input_size=input_size)

Epoch [1/2], Step [100/600], Loss: 0.3373
Epoch [1/2], Step [200/600], Loss: 0.3298
Epoch [1/2], Step [300/600], Loss: 0.1599
Epoch [1/2], Step [400/600], Loss: 0.1604
Epoch [1/2], Step [500/600], Loss: 0.1186
Epoch [1/2], Step [600/600], Loss: 0.1885
Epoch [2/2], Step [100/600], Loss: 0.1134
Epoch [2/2], Step [200/600], Loss: 0.1032
Epoch [2/2], Step [300/600], Loss: 0.1905
Epoch [2/2], Step [400/600], Loss: 0.1657
Epoch [2/2], Step [500/600], Loss: 0.1897
Epoch [2/2], Step [600/600], Loss: 0.1556
Accuracy of the network on the 10000 test images: 96 %
