# MNIST Digit Classification

Welcome to `05_mnist_classification` notebook. Here we will explore the techniques and principles necessary for categorizing handwritten digits from the well-known MNIST dataset, a benchmark dataset widely used in the field of machine learning.

This piece will walk through the steps of loading and preprocessing the MNIST dataset, constructing a neural network with a softmax output layer, and training the model to accurately classify the digits. It will also delve into the evaluation metrics used to measure the model's performance, such as accuracy and confusion matrices, to provide a clear understanding of the classification results.

Additionally, this notebook offers an in-depth analysis of various hyperparameters and their effects on model training and accuracy. Here we experiment with different learning rates, batch sizes, and network architectures to demonstrate how these factors influence the convergence and generalization of the model.

## Understanding the MNIST dataset

The MNIST (Modified National Institute of Standards and Technology) dataset is a large collection of handwritten digits, commonly used for training and testing in the field of machine learning. It serves as a benchmark dataset for evaluating algorithms and models, particularly in the area of image classification. The dataset consists of 70,000 grayscale images of digits, split into 60,000 training images and 10,000 testing images, each of which is 28x28 pixels in size. The pixels are represented as integers in the range of 0 to 255, where 0 corresponds to a white pixel (background) and 255 corresponds to a black pixel (foreground).

Some of its key features include:

1. **Diversity and simplicity:** The images in the MNIST dataset cover a wide variety of handwriting styles, providing a comprehensive set of examples for each digit (0-9). Despite its simplicity, the dataset contains enough variability in the handwriting to pose a challenging problem for classification models. This variability makes it an excellent testbed for machine learning algorithms, allowing researchers to assess how well their models generalize across different handwriting styles.

2. **Standardized format:** Each image in the dataset is normalized and centered in a fixed-size 28x28 pixel grid. This standardization facilitates uniformity, ensuring that the models trained on the dataset can focus on learning the underlying patterns rather than adjusting for size and position variations. The images are also grayscale, which reduces the computational complexity compared to colored images while retaining enough information for accurate classification.

3. **Labels and class distribution:** The dataset is accompanied by labels for each image, indicating the correct digit (0-9) represented. This labeled aspect makes the MNIST dataset a supervised learning dataset, where models can be trained using the input images and their corresponding labels. The distribution of digits is approximately uniform, ensuring that each digit is well-represented in both the training and testing sets. This uniform distribution helps in training balanced models without bias toward any particular class.

4. **Preprocessing and augmentation:** While the MNIST dataset comes preprocessed, researchers often apply additional preprocessing techniques, such as normalization, to scale pixel values between 0 and 1, and data augmentation to artificially increase the size and variability of the training set. Common augmentation techniques include random rotations, shifts, and scaling, which help models become more robust to variations in the input data.

5. **Accessibility and historical context:** The MNIST dataset is widely accessible and has been extensively used since its introduction in 1998 by Yann LeCun and colleagues. It has become a standard benchmark in the field, allowing for the comparison of new algorithms and models against established results. The historical significance of MNIST lies in its role in the development and evaluation of early neural networks and continues to be a relevant dataset for testing modern deep learning architectures.

## Setting up the environment


##### **Q1: How do you install the necessary libraries for working with the MNIST dataset in PyTorch?**

In [None]:
# !pip install torch torchvision torchaudio
# !pip install numpy matplotlib  # extra libraries for other stuff

##### **Q2: How do you import the required modules for MNIST digit classification?**

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

## Loading and preprocessing the data


##### **Q3: How do you download the MNIST dataset using PyTorch?**

In [6]:
# Define transformations: convert to tensor and normalize
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert image to tensor
    transforms.Normalize((0.5,), (0.5,))  # Normalize to [-1, 1]
])

# Download and load the training data
train_dataset = datasets.MNIST(root='../00-src', train=True, download=True, transform=transform)
print("Training dataset succesfully downloaded")

# Download and load the test data
test_dataset = datasets.MNIST(root='../00-src', train=False, download=True, transform=transform)
print("Testing dataset succesfully downloaded")

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ../00-src\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:01<00:00, 6674226.31it/s] 


Extracting ../00-src\MNIST\raw\train-images-idx3-ubyte.gz to ../00-src\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ../00-src\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 246585.15it/s]


Extracting ../00-src\MNIST\raw\train-labels-idx1-ubyte.gz to ../00-src\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ../00-src\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 2728848.00it/s]


Extracting ../00-src\MNIST\raw\t10k-images-idx3-ubyte.gz to ../00-src\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ../00-src\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<?, ?it/s]

Extracting ../00-src\MNIST\raw\t10k-labels-idx1-ubyte.gz to ../00-src\MNIST\raw

Training dataset succesfully downloaded
Testing dataset succesfully downloaded





##### **Q4: How do you normalize the MNIST data for neural network training?**

In [7]:
# Same as above. i.e.,
# transform = transforms.Compose([
#     transforms.ToTensor(),  # Convert image to tensor
#     transforms.Normalize((0.5,), (0.5,))  # Normalize to [-1, 1]
# ])

##### **Q5: How do you split the MNIST data into training and testing sets?**

In [8]:
# Already done so in the downloading step. i.e.,
# train_dataset = datasets.MNIST(root='../00-src', train=True, download=True, transform=transform)
# print("Training dataset succesfully downloaded")

# test_dataset = datasets.MNIST(root='../00-src', train=False, download=True, transform=transform)
# print("Testing dataset succesfully downloaded")

##### **Q6: How do you create data loaders for the MNIST dataset in PyTorch?**

In [9]:
# For the training set
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

# For the test set
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

## Building the neural network model


##### **Q7: How do you define the architecture of a neural network for MNIST digit classification using `nn.Module` in PyTorch?**

In [None]:
# Define the neural network class
class MNISTClassifier(nn.Module):
    def __init__(self):
        super(MNISTClassifier, self).__init__()
        # Define the layers
        self.fc1 = nn.Linear(28 * 28, 128)  # Input layer (784) to hidden layer (128)
        self.fc2 = nn.Linear(128, 64)       # Hidden layer (128) to another hidden layer (64)
        self.fc3 = nn.Linear(64, 10)        # Hidden layer (64) to output layer (10 for 10 classes)

    def forward(self, x):
        # Flatten the image to a vector of size 28*28
        x = x.view(-1, 28 * 28)
        # Apply first fully connected layer with ReLU activation
        x = F.relu(self.fc1(x))
        # Apply second fully connected layer with ReLU activation
        x = F.relu(self.fc2(x))
        # Output layer with logits (raw scores)
        x = self.fc3(x)
        return x

# Instantiate the network
model = MNISTClassifier()

##### **Q8: How do you initialize the weights and biases of the neural network?**

In [11]:
# Doing so manually (i.e., directly access the weights and biases of the layers and set them manually)
import torch.nn.init as init

class MNISTClassifier(nn.Module):
    def __init__(self):
        super(MNISTClassifier, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)
        
        # Initialize weights and biases
        self._initialize_weights()

    def _initialize_weights(self):
        # Initialize weights using Xavier uniform distribution
        init.xavier_uniform_(self.fc1.weight)  # sets the weights by sampling from a uniform distribution with a specific range, ensuring that the variance of the inputs and outputs is maintained across layers
        init.xavier_uniform_(self.fc2.weight)
        init.xavier_uniform_(self.fc3.weight)
        
        # Initialize biases to zero
        nn.init.constant_(self.fc1.bias, 0)  # ...as they do not have the same variance issues as weights
        nn.init.constant_(self.fc2.bias, 0)
        nn.init.constant_(self.fc3.bias, 0)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Instantiate the network
model = MNISTClassifier()

In [None]:
# Another predefined initialization option (aside from Xavier): Kaiming
layer = "e.g."

kaiming = init.kaiming_uniform_(layer.weight, nonlinearity='relu')  # ensures that the variance of the outputs is maintained across layers, which helps in preventing exploding/vanishing gradients

##### **Q9: How do you choose activation functions for the layers in your neural network?**

In [None]:
# ReLu: Commonly used for hidden layers to introduce non-linearity
relu_layer = nn.ReLU()

# Sigmoid: Often used in the output layer for binary classification to get probabilities
sigmoid_layer = nn.Sigmoid()

# Tanh: Sometimes used in hidden layers to center the data around zero
tanh_layer = nn.Tanh()

# Leaky ReLU: A variant of ReLU that allows a small, non-zero gradient when the unit is not active
leaky_relu_layer = nn.LeakyReLU(negative_slope=0.01)

# Softmax: Used in the output layer for multi-class classification problems to get a probability distribution over classes
softmax_layer = nn.Softmax(dim=1)

## Defining the loss function and optimizer


##### **Q10: How do you select the appropriate loss function for MNIST digit classification?**

##### **Q11: How do you configure an optimizer for training the neural network?**

## Training the neural network model


##### **Q12: How do you set up the training loop for the MNIST neural network in PyTorch?**

##### **Q13: How do you train the neural network on the MNIST dataset?**

##### **Q14: How do you monitor training progress during the training process?**

## Evaluating the model


##### **Q15: How do you make predictions using the trained MNIST neural network?**

##### **Q16: How do you calculate the accuracy of the MNIST neural network model?**

##### **Q17: How do you visualize the performance of the MNIST neural network model?**

##### **Q18: How do you create a confusion matrix to evaluate the performance of the MNIST digit classification model?**

## Saving and loading the model


##### **Q19: How do you save the trained MNIST neural network model in PyTorch?**

##### **Q20: How do you load a saved MNIST neural network model in PyTorch?**

## Hyperparameter tuning and optimization


##### **Q21: How do you perform hyperparameter tuning to improve the performance of the MNIST neural network?**

##### **Q22: What regularization techniques can you implement to prevent overfitting in the MNIST neural network?**

##### **Q23: How do you use learning rate scheduling to adjust the learning rate during training?**

## Handling model improvements


##### **Q24: How do you apply data augmentation techniques to the MNIST dataset?**

##### **Q25: How do you fine-tune the MNIST neural network model for better performance?**

##### **Q26: How do you evaluate the improvements made to the MNIST neural network model?**

## Conclusion


## Further exercises


##### **Q27: How do you experiment with different neural network architectures for MNIST digit classification?**

##### **Q28: How do you apply data augmentation techniques to improve model robustness?**

##### **Q29: How do you test the MNIST neural network model on different digit datasets?**

##### **Q30: How do you integrate more advanced regularization methods into the MNIST neural network model?**

##### **Q31: How do you deploy the MNIST neural network model for real-time digit recognition?**