# VDL Exericse 1

**Group Name:** ...


**Participants:**

- Name 1 (Matrikl. Nr. 1)
- Name 2 (Matrikl. Nr. 2)
- ...


## Handwritten Digit Recognition in PyTorch

In this exercise, we will use PyTorch to classify image sof handwritten digits. We will use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/), which contains 28x28 images of handwritten digits from 0 to 9.

We will download the dataset and create train, validation and test splits. The train set will be used to actually train our network, where as the validation set will only be used to visualize network's performance **during** training. Finally, the test set will be used to evaluate the accuracy of our network **after** the training has finished.

Look for `TODO` tags in the notebook, and complete the missing code. There are 6 such tags in total, and each of them can give you up to 0.5 bonus mark.

### Necessary Imports

In [None]:
# Import necessary packages
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time

In [None]:
import os
from google.colab import drive

### Download the Dataset

Torchvision provides many built-in datasets [here](https://pytorch.org/vision/stable/datasets.html) which can be directly downloaded and imported from the `torchvision.datasets` package. 

The first few lines of the code below download the train and test sets of the MNIST dataset. Complete the tasks described below by filling in the parts marked with `TODO:`.

**Tasks:**
1. Split the train data into two parts; a train and a validation set with 80:20 ratio. See [`torch.utils.data.random_split`](https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split). *Hint: Print the length of the original `trainset` to decide on the size of splits.*
2. Create three [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) objects for loading each of the datasets with a batch size of 64. Shuffle the train and validation data, but not the test data.

In [None]:
from torch.utils.data import random_split
from torchvision import datasets, transforms


# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])


# Download the dataset
trainset = datasets.MNIST('data/MNIST/', download=True, train=True, transform=transform)
testset = datasets.MNIST('data/MNIST/', download=True, train=False, transform=transform)


# TODO: Split the 'trainset' into train and validation sets, and 
generator=torch.Generator().manual_seed(42)


# TODO: Create the three data loaders
batch_size=64


# Print the lengths of all three datasets
print('Train:', len(trainset), 'Val:', len(valset), 'Test:', len(testset))

### Exploring The Data

**Tasks:**

3. Visualize samples from the dataset.

In [None]:
dataiter = iter(trainloader)
images, labels = dataiter.next()
print(type(images))
print(images.shape)
print(labels.shape)

In [None]:
plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');

In [None]:
# TODO: Use `plt` to visualize all 64 images in the first batch
# of the training set as an 8x8 grid.
figure = plt.figure()


### Defining The Neural Network

**Tasks:**
4. Create the network layers as shown below in the `CustomModel` class.
5. Implement the forward method.

![](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mlp_mnist.png)

In [None]:
from torch import nn
from torch.nn import Linear, LogSoftmax

class CustomModel(nn.Module):
  def __init__(self):
    super().__init__()
    # TODO: Build the network layers shown in the diagram above
    input_size =  
    hidden_sizes = []
    output_size = 

  def forward(self, x):
    # TODO: Implement the forward pass
    return x


# Build a feed-forward network
model = CustomModel()
print(model)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
model.to(device)

### Train the Network

**Tasks:**
6. Write the validation step.

In [None]:
from torch import optim

criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

In [None]:
def train_step(x, y, model, optimizer, criterion):
  """ One training step.

  Args:
    x: a batch of input tensors from the training set.
    y: a batch of labels corresponding to the inputs.
    model: the network to train
    optimizer: the optimizer to use
    criterion: the loss criterion

  Returns:
    the loss computer for this training step
  """
  x = x.view(x.shape[0], -1) # Reshape to vector of size 784 (28x28=784)
  optimizer.zero_grad()      # Clear existing gradients
  output = model(x.cuda())   # Forward pass
  loss = criterion(output, labels.cuda())
  loss.backward()            # Compute new gradients
  optimizer.step()           # Update weights
  return loss

In [None]:
def val_step(x, y, model, optimizer, criterion):
  """ One validation step.

  Note that there are no gradient computations or weight updates 
  during validation.

  Args:
    x: a batch of input tensors from the validation set.
    y: a batch of labels corresponding to the inputs.
    model: the network to validate
    optimizer: the optimizer to use
    criterion: the loss criterion

  Returns:
    the loss computer for this validation step
  """
  # TODO: Implement the validation step
  return loss

In [None]:
time0 = time()
epochs = 15
for e in range(epochs):
    # Training loop
    model.train()  # set to train mode
    running_loss = 0
    for images, labels in trainloader:
        loss = train_step(images, labels, model, optimizer, criterion)
        running_loss += loss.item()

    print("ep {}:\ttrain={:.4f}".format(e, running_loss/len(trainloader)), end='')

    # Validation loop
    model.eval()  # set to evaluation mode
    running_loss = 0
    for images, labels in valloader:
        with torch.no_grad():
            loss = val_step(images, labels, model, optimizer, criterion)
            running_loss += loss.item()

    print("\tval={:.4f}".format(running_loss/len(valloader)))
print("\nTraining Time (in minutes) =",(time()-time0)/60)


In [None]:
def view_classify(img, ps):
    ''' Function for viewing an image and it's predicted classes.
    '''
    ps = ps.cpu().data.numpy().squeeze()

    fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)
    ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
    ax1.axis('off')
    ax2.barh(np.arange(10), ps)
    ax2.set_aspect(0.1)
    ax2.set_yticks(np.arange(10))
    ax2.set_yticklabels(np.arange(10))
    ax2.set_title('Class Probability')
    ax2.set_xlim(0, 1.1)
    plt.tight_layout()

In [None]:
images, labels = next(iter(testloader))

img = images[0].view(1, 784)
# Turn off gradients to speed up this part
with torch.no_grad():
    logps = model(img.cuda())

# Output of the network are log-probabilities, need to take exponential for probabilities
ps = torch.exp(logps)
probab = list(ps.cpu().numpy()[0])
print("Predicted Digit =", probab.index(max(probab)))
view_classify(img.view(1, 28, 28), ps)

### Model Evaluation

**Bonus Task:** Improve the test accuracy to above 95% by modifying network architecture or training parameters (e.g. optimizer, loss function, learning rate, etc.).

In [None]:
correct_count, all_count = 0, 0
for images,labels in testloader:
  for i in range(len(labels)):
    img = images[i].view(1, 784)
    # Turn off gradients to speed up this part
    with torch.no_grad():
        logps = model(img.cuda())

    # Output of the network are log-probabilities, need to take exponential for probabilities
    ps = torch.exp(logps)
    probab = list(ps.cpu().numpy()[0])
    pred_label = probab.index(max(probab))
    true_label = labels.numpy()[i]
    if(true_label == pred_label):
      correct_count += 1
    all_count += 1

print("Number Of Images Tested =", all_count)
print(f"\nModel Accuracy = {(correct_count/all_count*100)}%")