# Your First AI Project: Recognising Handwritten Numbers with PyTorch
_Written by Yiding Song for Team Enigma. Licensed under the MIT License._

<a target="_blank" href="https://colab.research.google.com/github/harrow-css/2023-multithreading/blob/main/ai/week-4/img-recognition.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

[PyTorch](https://pytorch.org/) is an open-source [Python](https://www.python.org/) library that makes designing and training neural networks very easy. Other libraries like [TensorFlow](https://www.tensorflow.org/) and [Flax](https://flax.readthedocs.io/) exist as well, but we're going to stick with PyTorch because it's really easy to use!

## Setting Up

In [None]:
# Importing PyTorch and helpful modules
import torch
from torch import nn    # nn is PyTorch's neural network module
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

# Other helpful libraries for visualisation
from tqdm import tqdm
import matplotlib.pyplot as plt

In [None]:
# Sometimes you'd want to run your model on a GPU to boost performance.
# The following code finds the right device to execute your code on
# depending on hardware

device = (
    "cuda" if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Device: {device}")

In [None]:
# Hyperparameters: batch size
BATCH_SIZE = 64

**The machine learning worflow**

<img src="https://www.mermaidchart.com/raw/c07f9f84-515d-4807-9d54-0011a6bc6e67?theme=light&version=v0.1&format=svg" style="height: 500px;"></img>

## The Data: MNIST

We are going to use handwritten digits from the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database) in order to train our model. It's one of the earliest machine learning benchmarks, and is also the first AI system I built!

In [None]:
# Get data using torchvision datasets

train_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [None]:
# Helper function to visualise dataset; don't worry too much

def visualise_samples(data, n_rows=5, n_cols=5, figsize=(8, 8)):
    fig = plt.figure(figsize=figsize)
    for i in range(1, n_rows*n_cols+1):
        img, lab = data[i]
        fig.add_subplot(n_rows, n_cols, i)
        plt.axis("off")
        plt.title(f'Label: {lab}')
        plt.imshow(img.squeeze(), cmap="gray")
    plt.show()

In [None]:
visualise_samples(train_data)

In [None]:
# Batch data into batches, so that model can learn from multiple
# data points at once!

train_dataloader = DataLoader(train_data, batch_size=BATCH_SIZE)
test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE)

## Building the Model

Here we're going to define our machine learning model.

In [None]:
class MLP(nn.Module):
    def __init__(self, hidden=512, activation_fn=nn.ReLU):
        super().__init__()

        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(28*28, hidden)
        self.linear2 = nn.Linear(hidden, hidden)
        self.linear3 = nn.Linear(hidden, 10)
        
        self.activation = activation_fn()

    def forward(self, x):
        x = self.flatten(x)
        x = self.activation(self.linear1(x))
        x = self.activation(self.linear2(x))
        logits = self.activation(self.linear3(x))
        return logits

In [None]:
model = MLP().to(device)

In [None]:
model(train_data[0][0].to(device))

## Model Optimisation

This is where we're going to take care of gradient descent!

In [None]:
# Hyperperamters
LEARNING_RATE = 1e-3
EPOCHS = 10

### Loss Function

In [None]:
loss_fn = nn.CrossEntropyLoss()

### Optimiser

In [None]:
optim = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE)

### Training Loop

In [None]:
def train_step(x, y, model, loss_fn, optim):

    pred = model(x)         # Compute model output
    loss = loss_fn(pred, y) # Compute loss with respect to target output
    
    loss.backward()         # Backpropagate gradients through model
    optim.step()            # Use gradient to update model weights
    optim.zero_grad()       # Reset gradients (just a PyTorch thing)

    return loss

In [None]:
def train_loop(dataloader, model, loss_fn, optim):
    # Training mode for batchnorm, dropout, etc.
    model.train()

    with tqdm(dataloader) as pbar:
        for x, y in pbar:
            x, y = x.to(device), y.to(device)
            loss = train_step(x, y, model, loss_fn, optim)
            pbar.set_postfix({'loss': f'{loss:>7f}'})

In [None]:
def test_loop(dataloader, model, loss_fn):
    '''
    Code taken from https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html
    '''
    
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for x, y in dataloader:
            x, y = x.to(device), y.to(device)
            pred = model(x)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

## Training!!!

Optimise model using gradient descent.

In [None]:
for ep in range(EPOCHS):
    print(f"\nEpoch {ep+1}:")
    train_loop(train_dataloader, model, loss_fn, optim)
    test_loop(test_dataloader, model, loss_fn)

## Next Steps

Now that you have something that works, can you improve model accuracy on the test set?

**The person with best test set accuracy by Nov 15 2023 (Wed) will receive a Send-Up!** Submit your solution by downloading your code (both `.py` and `.ipynb` is fine) and sending it to me. If you're on Colab, you can download your code by clicking `File` --> `Download` --> `Download .ipynb`.

A few things you might want to play around with:

* Finetuning hyperparameters: e.g. `BATCH_SIZE`, `LEARNING_RATE`, `EPOCHS`.
* Switch the optimiser algorithm: for more information about optimisers, check out [this](https://dev.to/amananandrai/10-famous-machine-learning-optimizers-1e22) and [this](https://pytorch.org/docs/stable/optim.html).
* Change the model architecture: change the number of layers, number of hidden units, type of layer used, etc... Check out [this](https://www.bmc.com/blogs/machine-learning-architecture/) and [this](https://pytorch.org/docs/stable/nn.html) for alternative model architectures.