# CNN IMPLEMENTATION TO CLASSIFY HANDWRITTEN DIGITS 

IMPLEMENTATION CONCEPT : 

I will be using CNN model to detect hand written digits from MNIST dataset. We will tune different model parameters and find out the best model for our usecase. Then we will save that model and load it into my flask application for Deployment

## What is MNIST dataset ?

The MNIST (Modified National Institute of Standards and Technology) dataset is a large collection of handwritten digits. It consists of 70,000 grayscale images of digits (0-9), each sized 28x28 pixels. The dataset is split into a training set of 60,000 images and a test set of 10,000 images. Each image is labeled with the corresponding digit it represents.

## What is CNN ?

A Convolutional Neural Network (CNN) is a deep learning model designed for image processing tasks. It consists of convolutional layers that use filters to scan the image and extract features, activation functions like ReLU to introduce non-linearity, and pooling layers to reduce spatial dimensions and computational load. Following these, fully connected layers combine the extracted features to make final predictions. The output layer, typically using a softmax function for classification, provides the probability distribution over class labels

## How am I going to use CNN ?

I will be using CNN model to learn the patterns within the images in the dataset. I will be using multiple convolution layers, droupout layers and activation functions to learn the patterns. For learning Optimization, i will be normalizing the data and also use batch normalization and different number of epochs.

In [None]:
#  -- CODING STARTS HERE --

In [None]:
# THESE ARE THE LIBRARIES NEEEDED TO RUN THE CNN MODEL IN PYTORCH.
# SKLEARN IS USED TO IMPORT THE DATASET.
# MATPLOTLIB IS USED TO PLOT THE IMAGES.

import torch
import torch.nn as nn
from sklearn.datasets import fetch_openml
import tensorflow as tf
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader, random_split
import torch.optim as optim
from torchvision import datasets, transforms


DATASET MANUPILATION

Loading dataset using pytorch only 

In [4]:
# Define a transform to make them in tensor 
transform = transforms.Compose([
    transforms.ToTensor(),  # Converts PIL Image or numpy.ndarray to tensor
])

# Load the full MNIST dataset
full_train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
full_test_data = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Extract x_train, y_train, x_test, y_test
x_train = full_train_data.data.float() / 255.0
y_train = full_train_data.targets
x_test = full_test_data.data.float() / 255.0
y_test = full_test_data.targets

# Optionally, split into training and validation sets
# x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.1, random_state=42)

# Print shapes for verification
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print("x_test shape:", x_test.shape)
print("y_test shape:", y_test.shape)


In [None]:
# THE IMAGES ARE NORMALIZED TO A VALUE BETWEEN 0 AND 1.
# THIS IS DONE TO IMPROVE THE TRAINING PROCESS.
# WHAT HAPPENS IF NOT NORMALIZED?

# ----------------------------------------------------------------
# If we do not normalize the data: The learning process may become slower and less efficient.
# as features with larger ranges can dominate gradient updates, leading to slower convergence. 
# This can result in poor model performance, 
# as models using gradient descent may be disproportionately influenced by features with larger scales.
#  Additionally, the lack of normalization complicates hyperparameter. Making it difficult to compare the importance of different features. 
# ----------------------------------------------------------------



x_train, x_test = x_train / 255.0, x_test / 255.0

In [None]:
# WE USE PRINT STATEMENTS TO CHECK THE SHAPE OF THE TRAINING AND TESTING DATA.
# THE SHAPE OF THE TRAINING DATA IS (60000, 28, 28) AND THE TESTING DATA IS (10000, 28, 28).
# THE SHAPE OF THE TRAINING LABELS IS (60000,) AND THE TESTING LABELS IS (10000,).
# THIS HELPS US IDENTIFY WHAT INPUT PARATMETERS AND OUTPUT PARAMETERS TO USE FOR THE CNN MODEL


print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_test shape: {y_test.shape}")

In [None]:
# MATPLOTLIB IS USED TO PLOT THE IMAGES/ GRAPHS TO VISUALIZE THE DATA.
# THE FUNCTION BELOW PLOTS THE FIRST 5 SAMPLES FROM THE TRAINING SET.

def plot(x, y, num_samples=5):
    plt.figure(figsize=(10, 2)) # Set the size of the plot
    for i in range(num_samples): 
        plt.subplot(1, num_samples, i + 1) # Create a subplot
        plt.imshow(x[i], cmap='gray') # Display an image
        plt.title(f"Label: {y[i]}")     # Set the title of the image
        plt.axis('off')
    plt.show()

# Plot the first 5 samples from the training set
plot(x_train, y_train)

In [None]:
# CONVERSION OF NUMPY ARRAYS TO TENSORS IN PYTORCH
# THIS IS NEEDED BECAUSE PYTORCH WORKS WITH TENSORS. 
# IT IS OPTIMIZED FOR CALCULATIONS ON TENSORS.
# IT ALLOWS FOR PARALLEL COMPUTATIONS
# THE TENSORS ARE USED TO CREATE THE DATASET AND DATALOADER IN THE NEXT CODE

x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
x_test_tensor = torch.tensor(x_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

In [None]:
# ADDING A CHANNEL DIMENSION TO TENSORS
# THIS IS BEACUSE The MNIST IMAGES ARE GRAY SCALE OF SHAPE (28, 28). 
# PyTorch models typically expect an additional channel dimension (1, 28, 28)

x_train_tensor = x_train_tensor.unsqueeze(1)
x_test_tensor = x_test_tensor.unsqueeze(1)

In [None]:
# HERE WE CHECK THE TYPE AND SHAPE OF THE TENSORS
# WE CAN SEE THAT THE TENSORS HAVE BEEN CONVERTED TO FLOAT32 AND LONG DATA TYPES.
# THE SHAPE OF THE TRAINING DATA IS (60000, 1, 28, 28) AND THE TESTING DATA IS (10000, 1, 28, 28).
# THIS IS THE FORMAT THAT PYTORCH REQUIRES

print(type(x_train_tensor), x_train_tensor.shape)
print(type(y_train_tensor), y_train_tensor.shape)
print(type(x_test_tensor), x_test_tensor.shape)
print(type(y_test_tensor), y_test_tensor.shape)

In [None]:
# CREATE A FULL TRAINING DATASET USING THE TENSORS
# HERE WE COMBINE THE DATA AND ITS LABEL. 
# THIS MAKES IT EASIER FOR ME TO MANAGE THE DATA WHILE SENDING IT TO THE MODEL.
full_train_dataset = TensorDataset(x_train_tensor, y_train_tensor)


# AS A REQUIREMENT IN THE ASSIGNMENT. WE SPLIT THE FULL TRAINING DATASET INTO TRAINING AND VALIDATION SETS.
# THE TRAINING DATASET IS 80% OF THE FULL TRAINING DATASET AND THE VALIDATION DATASET IS 20% OF THE FULL TRAINING DATASET.
# THIS IS DONE TO EVALUATE THE MODEL PERFORMANCE ON UNSEEN DATA.
train_size = int(0.8 * len(full_train_dataset))
val_size = len(full_train_dataset) - train_size
train_dataset, val_dataset = random_split(full_train_dataset, [train_size, val_size])


In [None]:
# PYTORCH IS EZ WITH DATALOADER
# DATALOADER IS USED TO LOAD THE DATA IN BATCHES.
# THIS IS DONE TO IMPROVE THE TRAINING PROCESS.
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=1000, shuffle=False)
test_loader = DataLoader(TensorDataset(x_test_tensor, y_test_tensor), batch_size=1000, shuffle=False)

CHECKING OF CUDA GPU AVALABILITY

In [None]:
# THIS LINE OF CODE CHECKS FOR THE DEVICE AVAILABLE.
# IF A GPU IS AVAILABLE, THE DEVICE IS SET TO CUDA. OTHERWISE, THE DEVICE IS SET TO CPU.
# THIS CHECK IS IMPORTANT BECAUSE PYTORCH CAN RUN ON BOTH CPU AND GPU.
# RUNNING ON GPU IS FASTER THAN RUNNING ON CPU.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# THESE ARRAYS TRACK THE EVALUATION METRICS OF THE MODEL.
# THIS WILL HELP US IDENTIFY THE BEST MODEL.

arruracy = []
precision = []
recall = []
F1Score = []
allmodels = []

## MODEL ARCHITECTURE AND DESIGN

MODEL 1 DEFINATION

In [None]:
# THE CNN MODEL IS DEFINED BELOW
# THE CNN MODEL CONSISTS OF TWO CONVOLUTIONAL LAYERS AND TWO FULLY CONNECTED LAYERS
# A MAX POOLING LAYER IS USED TO REDUCE THE DIMENSIONALITY OF THE DATA
# THIS MAX POOL LAYER IS APPLIED BETWEEN THE CONVOLUTIONAL LAYERS AND THE FULLY CONNECTED LAYERS

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
    
    # THIS FUNCTION DESCRIBES THE FORWARD PASS OF THE CNN MODEL 
    # THE FORWARD FUNCTION DESCRIBES HOW THE DATA FLOWS THROUGH THE NETWORK
    # THE RELU ACTIVATION FUNCTION IS USED AFTER EACH CONVOLUTIONAL LAYER AND FULLY CONNECTED LAYER
    # THE RELU FUNCTION IS USED TO INTRODUCE NON-LINEARITY INTO THE MODEL
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


MODEL 1 TRAINING

In [None]:
# HERE IS THE CODE THAT ITERATES THROUGH THE MODEL AND TRAINS IT
# THE MODEL IS INTILIAZED WITH THE CNN CLASS
# THEN THE FOLLOWING ARE DEFINED:
# THE LOSS FUNCTION (CROSS ENTROPY LOSS)
# THE OPTIMIZER (ADAM OPTIMIZER)
# THE LEARNING RATE (0.001)
# THE NUMBER OF EPOCHS (8)


# THEN EVALUATION IS PERFORMED: 
# THE TRAINING LOOP ITERATES THROUGH THE TRAINING DATA AND UPDATES THE WEIGHTS OF THE MODEL
# THE MODEL IS THEN EVALUATED ON THE VALIDATION DATA TO CHECK FOR OVERFITTING

model0 = CNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model0.parameters(), lr=0.001)

# Training loop
num_epochs = 8
for epoch in range(num_epochs):
    model0.train()
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model0(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}")

    # Validation step
    model0.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:
            outputs = model0(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print(f"Validation Loss: {val_loss / len(val_loader):.4f}, Accuracy: {100 * correct / total:.2f}%")


In [None]:
# TEHSE LIBRAIES ARE NEEDED TO EVALUATE THE MODEL
# THE ACCURACY, PRECISION, RECALL, F1 SCORE AND CONFUSION MATRIX ARE USED TO EVALUATE THE MODEL

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# THIS FUNCTION EVALUATES THE MODEL ON THE TEST DATA
# THEN IT PRINTS IS ACCURACY, PRECISION, RECALL, F1 SCORE AND CONFUSION MATRIX
# THE CONFUSION MATRIX IS PLOTTED TO VISUALIZE THE PERFORMANCE OF THE MODEL


def evaluate(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    all_labels = []
    all_preds = []
    with torch.no_grad():
        for images, labels in test_loader:
            if torch.cuda.is_available():
                images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(predicted.cpu().numpy())
    
    accuracy = 100 * correct / total
    print(f"Accuracy: {accuracy:.2f}%")

    # Calculate precision, recall, and F1-score
    precision = precision_score(all_labels, all_preds, average='macro')
    recall = recall_score(all_labels, all_preds, average='macro')
    f1 = f1_score(all_labels, all_preds, average='macro')
    print(f"Precision: {precision:.4f}, Recall: {recall:.4f}, F1-score: {f1:.4f}")

    # save them into the arrays for tracking
    arruracy.append(accuracy)
    precision.append(precision)
    recall.append(recall)
    F1Score.append(f1)
    

    # Confusion matrix
    cm = confusion_matrix(all_labels, all_preds)
    plt.figure(figsize=(8,6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title('Confusion Matrix')
    plt.show()


MODEL TESTING

In [None]:
evaluate(model0, test_loader)

# HOW TO IMPROVE THE MODEL ?

I have created a basic CNN model, choose some hyperparameters for it and training on the dataset. Then I evaluated the model in unseen test data. This was to get the basic idea od how my model responds to CNN architecture and what should i be doing next

NOTE :  Although i believe the complexity of the model i enough, It is the requirment to make the model deeper. hence i Will increase the convolution layers, batch normalization layers and Pooling layers in the model.

MODEL 2

In [None]:
    # THE CNN MODEL IS DEFINED BELOW
    # THE CNN MODEL CONSISTS OF THREE CONVOLUTIONAL LAYERS AND TWO FULLY CONNECTED LAYERS
    # A MAX POOLING LAYER IS USED TO REDUCE THE DIMENSIONALITY OF THE DATA
    # THIS MAX POOL LAYER IS APPLIED BETWEEN THE CONVOLUTIONAL LAYERS AND THE FULLY CONNECTED LAYERS
    # A DROPOUT LAYER IS USED TO PREVENT OVERFITTING

    # THE FIRST CONV LAYER HAS 32 FILTERS, THE SECOND CONV LAYER HAS 64 FILTERS AND THE THIRD CONV LAYER HAS 128 FILTERS
    # THE STRIDE OF 1 MEANS THAT THE FILTER MOVES ONE PIXEL AT A TIME
    # THE PADDING OF 1 MEANS THAT THE INPUT IMAGE IS PADDDED WITH ZEROS TO MAINTAIN THE SAME DIMENSIONALITY
    # THE KERNEL SIZE OF 3 MEANS THAT THE FILTER SIZE IS 3X3

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.dropout = nn.Dropout(p=0.5)
        self.fc1 = nn.Linear(128 * 3 * 3, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
    
    # THIS FUNCTION DESCRIBES THE FORWARD PASS OF THE CNN MODEL
    # THE FORWARD FUNCTION DESCRIBES HOW THE DATA FLOWS THROUGH THE NETWORK
    # THE RELU ACTIVATION FUNCTION IS USED AFTER EACH CONVOLUTIONAL LAYER AND FULLY CONNECTED LAYER
    # THE RELU FUNCTION IS USED TO INTRODUCE NON-LINEARITY INTO THE MODEL

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = self.pool(F.relu(self.bn3(self.conv3(x))))
        x = x.view(-1, 128 * 3 * 3)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x

MODEL 2 TRAINING

In [None]:
# HERE IS THE CODE THAT ITERATES THROUGH THE MODEL AND TRAINS IT
# THE MODEL IS INTILIAZED WITH THE CNN CLASS
# THEN THE FOLLOWING ARE DEFINED:
# THE LOSS FUNCTION (CROSS ENTROPY LOSS)
# THE OPTIMIZER (ADAM OPTIMIZER)
# THE LEARNING RATE (0.001)
# THE NUMBER OF EPOCHS (8)


# THEN EVALUATION IS PERFORMED: 
# THE TRAINING LOOP ITERATES THROUGH THE TRAINING DATA AND UPDATES THE WEIGHTS OF THE MODEL
# THE MODEL IS THEN EVALUATED ON THE VALIDATION DATA TO CHECK FOR OVERFITTING

model1 = CNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model1.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model1.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device) 
        optimizer.zero_grad()
        outputs = model1(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}")

    # Validation step
    model1.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device) 
            outputs = model1(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print(f"Validation Loss: {val_loss / len(val_loader):.4f}, Accuracy: {100 * correct / total:.2f}%")

In [None]:
evaluate(model1, test_loader)

WHAT HAPPENED?

Here we see that the accuracy of the model decreased as we increased the complexity. I.e. number of layers. This can be a good sign as model is learning the patterns in the images frovided rather then just overfitting.

# TWEAKING MODEL AND USING DIFFERENT ACTIVATION FUNCTIONS

1. Activation function Sogmoid

In [None]:
# THE CNN MODEL IS DEFINED BELOW
# THE CNN MODEL CONSISTS OF THREE CONVOLUTIONAL LAYERS AND TWO FULLY CONNECTED LAYERS
# A MAX POOLING LAYER IS USED TO REDUCE THE DIMENSIONALITY OF THE DATA
# THIS MAX POOL LAYER IS APPLIED BETWEEN THE CONVOLUTIONAL LAYERS AND THE FULLY CONNECTED LAYERS
# A DROPOUT LAYER IS USED TO PREVENT OVERFITTING

# THE FIRST CONV LAYER HAS 32 FILTERS, THE SECOND CONV LAYER HAS 64 FILTERS AND THE THIRD CONV LAYER HAS 128 FILTERS
# THE STRIDE OF 1 MEANS THAT THE FILTER MOVES ONE PIXEL AT A TIME
# THE PADDING OF 1 MEANS THAT THE INPUT IMAGE IS PADDDED WITH ZEROS TO MAINTAIN THE SAME DIMENSIONALITY
# THE KERNEL SIZE OF 3 MEANS THAT THE FILTER SIZE IS 3X3

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.dropout = nn.Dropout(p=0.5)
        self.fc1 = nn.Linear(128 * 3 * 3, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)




    # THIS FUNCTION DESCRIBES THE FORWARD PASS OF THE CNN MODEL
    # THE FORWARD FUNCTION DESCRIBES HOW THE DATA FLOWS THROUGH THE NETWORK
    # THE RELU ACTIVATION FUNCTION IS USED AFTER EACH CONVOLUTIONAL LAYER AND FULLY CONNECTED LAYER
    # THE RELU FUNCTION IS USED TO INTRODUCE NON-LINEARITY INTO THE MODEL
    
    def forward(self, x):
        x = self.pool(torch.sigmoid(self.bn1(self.conv1(x))))
        x = self.pool(torch.sigmoid(self.bn2(self.conv2(x))))
        x = self.pool(torch.sigmoid(self.bn3(self.conv3(x))))
        x = x.view(-1, 128 * 3 * 3)
        x = torch.sigmoid(self.fc1(x))
        x = self.dropout(x)
        x = torch.sigmoid(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x

In [None]:

# HERE IS THE CODE THAT ITERATES THROUGH THE MODEL AND TRAINS IT
# THE MODEL IS INTILIAZED WITH THE CNN CLASS
# THEN THE FOLLOWING ARE DEFINED:
# THE LOSS FUNCTION (CROSS ENTROPY LOSS)
# THE OPTIMIZER (ADAM OPTIMIZER)
# THE LEARNING RATE (0.001)
# THE NUMBER OF EPOCHS (10)


# THEN EVALUATION IS PERFORMED: 
# THE TRAINING LOOP ITERATES THROUGH THE TRAINING DATA AND UPDATES THE WEIGHTS OF THE MODEL
# THE MODEL IS THEN EVALUATED ON THE VALIDATION DATA TO CHECK FOR OVERFITTING

model2 = CNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model2.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model2.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model2(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}")

    # Validation step
    model2.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model2(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print(f"Validation Loss: {val_loss / len(val_loader):.4f}, Accuracy: {100 * correct / total:.2f}%")

In [None]:
evaluate(model2, test_loader)

2.  Activation function Tanh

In [None]:
# THE CNN MODEL IS DEFINED BELOW
# THE CNN MODEL CONSISTS OF THREE CONVOLUTIONAL LAYERS AND TWO FULLY CONNECTED LAYERS
# A MAX POOLING LAYER IS USED TO REDUCE THE DIMENSIONALITY OF THE DATA
# THIS MAX POOL LAYER IS APPLIED BETWEEN THE CONVOLUTIONAL LAYERS AND THE FULLY CONNECTED LAYERS
# A DROPOUT LAYER IS USED TO PREVENT OVERFITTING

# THE FIRST CONV LAYER HAS 32 FILTERS, THE SECOND CONV LAYER HAS 64 FILTERS AND THE THIRD CONV LAYER HAS 128 FILTERS
# THE STRIDE OF 1 MEANS THAT THE FILTER MOVES ONE PIXEL AT A TIME
# THE PADDING OF 1 MEANS THAT THE INPUT IMAGE IS PADDDED WITH ZEROS TO MAINTAIN THE SAME DIMENSIONALITY
# THE KERNEL SIZE OF 3 MEANS THAT THE FILTER SIZE IS 3X3

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.dropout = nn.Dropout(p=0.5)
        self.fc1 = nn.Linear(128 * 3 * 3, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.pool(torch.tanh(self.bn1(self.conv1(x))))
        x = self.pool(torch.tanh(self.bn2(self.conv2(x))))
        x = self.pool(torch.tanh(self.bn3(self.conv3(x))))
        x = x.view(-1, 128 * 3 * 3)
        x = torch.tanh(self.fc1(x))
        x = self.dropout(x)
        x = torch.tanh(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x

In [None]:
# HERE IS THE CODE THAT ITERATES THROUGH THE MODEL AND TRAINS IT
# THE MODEL IS INTILIAZED WITH THE CNN CLASS
# THEN THE FOLLOWING ARE DEFINED:
# THE LOSS FUNCTION (CROSS ENTROPY LOSS)
# THE OPTIMIZER (ADAM OPTIMIZER)
# THE LEARNING RATE (0.001)
# THE NUMBER OF EPOCHS (10)
# ACTIVATION FUNCTION (TANH)


# THEN EVALUATION IS PERFORMED: 
# THE TRAINING LOOP ITERATES THROUGH THE TRAINING DATA AND UPDATES THE WEIGHTS OF THE MODEL
# THE MODEL IS THEN EVALUATED ON THE VALIDATION DATA TO CHECK FOR OVERFITTING

model3 = CNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model3.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model3.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model3(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}")

    # Validation step
    model3.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model3(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print(f"Validation Loss: {val_loss / len(val_loader):.4f}, Accuracy: {100 * correct / total:.2f}%")

In [None]:
evaluate(model3, test_loader)

3. Activation function Leaky ReLu

In [None]:
# THE CNN MODEL IS DEFINED BELOW
# THE CNN MODEL CONSISTS OF THREE CONVOLUTIONAL LAYERS AND TWO FULLY CONNECTED LAYERS
# A MAX POOLING LAYER IS USED TO REDUCE THE DIMENSIONALITY OF THE DATA
# THIS MAX POOL LAYER IS APPLIED BETWEEN THE CONVOLUTIONAL LAYERS AND THE FULLY CONNECTED LAYERS
# A DROPOUT LAYER IS USED TO PREVENT OVERFITTING

# THE FIRST CONV LAYER HAS 32 FILTERS, THE SECOND CONV LAYER HAS 64 FILTERS AND THE THIRD CONV LAYER HAS 128 FILTERS
# THE STRIDE OF 1 MEANS THAT THE FILTER MOVES ONE PIXEL AT A TIME
# THE PADDING OF 1 MEANS THAT THE INPUT IMAGE IS PADDDED WITH ZEROS TO MAINTAIN THE SAME DIMENSIONALITY
# THE KERNEL SIZE OF 3 MEANS THAT THE FILTER SIZE IS 3X3

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.dropout = nn.Dropout(p=0.5)
        self.fc1 = nn.Linear(128 * 3 * 3, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.pool(F.leaky_relu(self.bn1(self.conv1(x))))
        x = self.pool(F.leaky_relu(self.bn2(self.conv2(x))))
        x = self.pool(F.leaky_relu(self.bn3(self.conv3(x))))
        x = x.view(-1, 128 * 3 * 3)
        x = F.leaky_relu(self.fc1(x))
        x = self.dropout(x)
        x = F.leaky_relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        return x

In [None]:
# HERE IS THE CODE THAT ITERATES THROUGH THE MODEL AND TRAINS IT
# THE MODEL IS INTILIAZED WITH THE CNN CLASS
# THEN THE FOLLOWING ARE DEFINED:
# THE LOSS FUNCTION (CROSS ENTROPY LOSS)
# THE OPTIMIZER (ADAM OPTIMIZER)
# THE LEARNING RATE (0.001)
# THE NUMBER OF EPOCHS (10)
# ACTIVATION FUNCTION (Leaky RELU)


# THEN EVALUATION IS PERFORMED: 
# THE TRAINING LOOP ITERATES THROUGH THE TRAINING DATA AND UPDATES THE WEIGHTS OF THE MODEL
# THE MODEL IS THEN EVALUATED ON THE VALIDATION DATA TO CHECK FOR OVERFITTING

model4 = CNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model4.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model4.train()
    running_loss = 0.0
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model4(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss / len(train_loader):.4f}")

    # Validation step
    model4.eval()
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in val_loader:
            outputs = model4(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print(f"Validation Loss: {val_loss / len(val_loader):.4f}, Accuracy: {100 * correct / total:.2f}%")

In [None]:
evaluate(model4, test_loader)

# FINAL FINDINGS

We see that the model3 performs the best on training data amoung all of the models. We will use this Model in our Flask application 

In [None]:
torch.save(model3, r"C:\Users\inspi\OneDrive - FAST National University\Melior\Github\Melior_Internship\CNN_Implementation\my_flask_app\model\mnist_cnn.pth")
