<a href="https://www.kaggle.com/code/siddp6/mlp-mnist-1?scriptVersionId=138654024" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

In this project, you will build a neural network of your own design to evaluate the MNIST dataset.

Some of the benchmark results on MNIST include can be found [on Yann LeCun's page](http://yann.lecun.com/exdb/mnist/) and include:

88% [Lecun et al., 1998](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf)
95.3% [Lecun et al., 1998](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf)
99.65% [Ciresan et al., 2011](http://people.idsia.ch/~juergen/ijcai2011.pdf)

MNIST is a great dataset for sanity checking your models, since the accuracy levels achieved by large convolutional neural networks and small linear models are both quite high. This makes it important to be familiar with the data.

## Imports

In [None]:
## This cell contains the essential imports you will need – DO NOT CHANGE THE CONTENTS! ##
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
from torch.utils.data import DataLoader
from time import time
from torchvision import datasets

## Load the Dataset

Specify your transforms as a list if you intend to .
The transforms module is already loaded as `transforms`.

MNIST is fortunately included in the torchvision module.
Then, you can create your dataset using the `MNIST` object from `torchvision.datasets` ([the documentation is available here](https://pytorch.org/vision/stable/datasets.html#mnist)).
Make sure to specify `download=True`! 

Once your dataset is created, you'll also need to define a `DataLoader` from the `torch.utils.data` module for both the train and the test set.

In [None]:
import torchvision
import torchvision.transforms as transforms

class MNIST_Transformer(object):
    def __call__(self, data):
        data = transforms.ToTensor()(data)
        data = transforms.Normalize((0.5,), (0.5,))(data)
        
        return data

transform = transforms.Compose([
    MNIST_Transformer()
])


root='data'
train_set = datasets.MNIST(root=root, download=True, train=True, transform=transform)
test_set = datasets.MNIST(root=root, download=True, train=False, transform=transform)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=False)

## Justify your preprocessing

In your own words, why did you choose the transforms you chose? If you didn't use any preprocessing steps, why not?

I chose the following transforms for preprocessing the MNIST dataset:

**transforms.ToTensor()**: This transform converts the images from the MNIST dataset, which are in the form of PIL images, into PyTorch tensors. Working with tensors is essential in PyTorch for efficient computation and automatic differentiation during the training process.

**transforms.Normalize((0.5,), (0.5,))**: This transform normalizes the pixel values of the images. The MNIST dataset contains grayscale images with pixel values ranging from 0 to 255. By applying this normalization, we scale the pixel values to be in the range of [-1, 1]. This standardization helps the model converge faster during training, as it prevents gradient explosions or vanishing due to large discrepancies in pixel values.

## Explore the Dataset
Using matplotlib, numpy, and torch, explore the dimensions of your data.

You can view images using the `show5` function defined below – it takes a data loader as an argument.
Remember that normalized images will look really weird to you! You may want to try changing your transforms to view images.
Typically using no transforms other than `toTensor()` works well for viewing – but not as well for training your network.
If `show5` doesn't work, go back and check your code for creating your data loaders and your training/test sets.

In [None]:
# Explore dataset properties
def dataset_properties(data, name):
    print(f"{name} Set:")
    print("Number of samples:", len(data))
    print(f"Number of batches {name}_loader): {len(data)}")
    print("Data shape (single sample):", data[0][0].shape)
    print("Label of the first sample:", data[0][1])
    print("\n")

In [None]:
dataset_properties(train_set, "train")
dataset_properties(test_set, "test")

In [None]:
## This cell contains a function for showing 5 images from a dataloader – DO NOT CHANGE THE CONTENTS! ##
def show5(img_loader):
    dataiter = iter(img_loader)
    
    batch = next(dataiter)
    labels = batch[1][0:5]
    images = batch[0][0:5]
    for i in range(5):
        print(int(labels[i].detach()))
    
        image = images[i].numpy()
        plt.imshow(image.T.squeeze().T)
        plt.show()

In [None]:
# Explore data
show5(train_loader)

## Build your Neural Network
Using the layers in `torch.nn` (which has been imported as `nn`) and the `torch.nn.functional` module (imported as `F`), construct a neural network based on the parameters of the dataset.
Use any architecture you like. 

*Note*: If you did not flatten your tensors in your transforms or as part of your preprocessing and you are using only `Linear` layers, make sure to use the `Flatten` layer in your network!

In [None]:
class MNIST_NN(nn.Module):
    def __init__(self):
        super(MNIST_NN, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)
    
    def forward(self, x):
        x = self.flatten(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Specify a loss function and an optimizer, and instantiate the model.

If you use a less common loss function, please note why you chose that loss function in a comment.

In [None]:
model = MNIST_NN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

For the MNIST dataset, which is a classification problem, we commonly use the CrossEntropyLoss as the loss function. The CrossEntropyLoss combines the Softmax activation and the negative log-likelihood loss, making it suitable for multi-class classification tasks like MNIST.

For the optimizer, we'll use the stochastic gradient descent (SGD) optimizer, which is a simple and widely used optimization algorithm for training neural networks. It updates the model parameters based on the gradients of the loss with respect to the parameters, scaled by a learning rate.

## Running your Neural Network
Use whatever method you like to train your neural network, and ensure you record the average loss at each epoch. 
Don't forget to use `torch.device()` and the `.to()` method for both your model and your data if you are using GPU!

If you want to print your loss **during** each epoch, you can use the `enumerate` function and print the loss after a set number of batches. 250 batches works well for most people!

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

In [None]:
def train_model(model, train_loader, criterion, optimizer, num_epochs=10, print_every=250):
    model.train()
    model.to(device)
    
    loss_values = []
    
    for epoch in range(num_epochs):
        running_loss = 0.0
        
        for batch_idx, (inputs, targets) in enumerate(train_loader):
            inputs, targets = inputs.to(device), targets.to(device)
            
            # Zero the parameter gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            
            # Backward pass and optimization
            loss.backward()
            optimizer.step()
            
            # Record the loss
            running_loss += loss.item()
            
            # Print the loss every print_every batches
            if batch_idx % print_every == print_every - 1:
                avg_loss = running_loss / print_every
                print(f"Epoch [{epoch+1}/{num_epochs}], Batch [{batch_idx+1}/{len(train_loader)}], Loss: {avg_loss:.4f}")
                running_loss = 0.0
        
        # Calculate the average loss for the epoch
        avg_epoch_loss = running_loss / len(train_loader)
        loss_values.append(avg_epoch_loss)
    
    print("Training finished.")
    return loss_values

In [None]:
loss_values = train_model(model, train_loader, criterion, optimizer, num_epochs=5)

Plot the training loss (and validation loss/accuracy, if recorded).

In [None]:
def plot_loss(loss_values):
    plt.figure(figsize=(8, 5))
    plt.plot(range(1, len(loss_values)+1), loss_values, label='Training Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Training Loss')
    plt.legend()
    plt.grid(True)
    plt.show()

In [None]:
plot_loss(loss_values)

## Testing your model
Using the previously created `DataLoader` for the test set, compute the percentage of correct predictions using the highest probability prediction. 

If your accuracy is over 90%, great work, but see if you can push a bit further! 
If your accuracy is under 90%, you'll need to make improvements.
Go back and check your model architecture, loss function, and optimizer to make sure they're appropriate for an image classification task.

In [None]:
def evaluate_model(model, test_loader):
    # Evaluate the model on the test set
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0

    with torch.no_grad():  # Disable gradient tracking for evaluation
        for data in test_loader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    test_accuracy = 100 * correct / total
    
    return test_accuracy

In [None]:
print(f"Test Accuracy: {evaluate_model(model, test_loader):.2f}%")

## Improving your model

Once your model is done training, try tweaking your hyperparameters and training again below to improve your accuracy on the test set!

In [None]:
# Train the improved-model
num_epochs = 10
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_values = train_model(model, train_loader, criterion, optimizer, num_epochs=num_epochs)
plot_loss(loss_values)
print(f"Test Accuracy: {evaluate_model(model, test_loader):.2f}%")

**Optimiser**: Adam is an adaptive learning rate optimization algorithm that combines the advantages of both AdaGrad and RMSprop. It adapts the learning rates of each parameter during training based on the historical gradients. Adam often converges faster than traditional SGD and can handle sparse gradients well, making it a popular choice for various deep learning tasks.

**epochs**: Increasing the number of epochs, to as we do not see any overfitting in graph.

## Saving your model
Using `torch.save`, save your model for future loading.

In [None]:
# Save the improved model to a file
torch.save(model.state_dict(), 'model_state.pt')
print("Improved model saved successfully.")

In [None]:
scripted = torch.jit.script(model)
torch.jit.save(scripted, 'model_script.pt')