<a href="https://colab.research.google.com/github/purpleiron/MySchoolProjects/blob/main/ICT303_Assignment_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  **ICT303 - Assignment 1**

**Your name: <enter here your full name>**

**Student ID: <enter here your student ID>**

**Email: <enter here your email address>**


## **1. Description**

We would like to develop, using Multilayer Perceptron (MLP), a computer program that takes images of handwritten text, finds the written characters in the image and displays the written characters.

To achieve this, we will proceed in steps:

1. Develop and train an MLP for the recognition of handwritten characters from images. In the first instance, the images are assumed to contain only one handwritten.
2. Train and test the MLP, and evaluate its performance by using loss curves and proper accuracy/performance measures
3. Improve the performance of the MLP by tuning its hyper parameters.
4. Extend the program you developed to localize (detect) and recognize handwritten characters in an image that contains multiple handwritten characters.

For this purpose, we will use the following dataset for training, validation and testing: https://www.kaggle.com/datasets/dhruvildave/english-handwritten-characters-dataset.

You are required to justify every design choice. Justifications should be theoretical and validated with experiments.

It is important that you start as earlier as possible. Coding is usually easy. However, training neural networks and tuning its hyper-parameters takes time.

##**2. Marking Guide**##

- The overal structure of the program - it should follow the structure we used so far in the labs **[30 Marks]**. This includes:
  - A class that defines the network architecture that extends the class `nn.Module`. It should have a constructor method (`__init__()`) and a forward function (`forward()`)
  - The Trainer class
  - A main function

- Training working and running on GPU **[10 marks]**

- Curves for training loss and validation loss plotted and training stopped when the network starts to overfit (i.e., when the validation loss starts to increase). You must use TensorBoard to visualize curves and monitor performance **[10 marks]**

- Testing code properly working. **[10 marks]**

- Hyper parameters finetuned and the best ones selected. **[10 marks]**

- Quality of the dicussions **[20 marks]**: did the student discuss various design choices, including the hyperparamters or any choices they made to improve the performance? Any design choice should be properly justified.

- Extension to the localization of the characters **[10 marks]**

## **3. What to submit**

You need to upload to LMS the notebook as well as a folder that contains the .py files you created. All classes should be implemented in .py files. The notebook will sever as a documentation of your work as well as the codes that demonstrated the training, validation and testing of your MLP models that you created.




#Import necessary dependencies

In [102]:
# Import necessary libraries
import torch
from torch import nn
from torchvision import datasets, transforms
from torch.utils.data import random_split
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm
import matplotlib.pyplot as plt
import zipfile


import os
import pandas as pd
from PIL import Image
from torch.utils.data import Dataset, DataLoader

from collections import Counter

print('CUDA available:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))

from google.colab import drive
drive.mount('/content/drive')
with zipfile.ZipFile("/content/drive/My Drive/archive.zip", 'r') as zip_ref:
  zip_ref.extractall("/content/dataset_folder")


CUDA available: True
GPU: Tesla T4
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#Data Transformation/preprocessing

I took a look at the images in the dataset and they were black and white. This should allow me to convert the images to greyscale and save on training time. I'm not sure how good it will be if the test images will not be in grayscale, but I can make the test images grayscale as well. If you want to transform the images to a different resolution, just change the parameters in transforms.Resize()

#Load Dataset

#Defining the model



In [103]:
class MLP(nn.Module):
    def __init__(self, input_size=224*224*3, output_size=62, lr=0.001):
        super(MLP, self).__init__()
        # Justification: The input size is 28*28 because each image in your dataset is 28x28 pixels.
        # The output size is 62, which might correspond to the number of classes you have (10 digits + 52 letters).

        self.layers = nn.Sequential(

            nn.Flatten(),

            nn.Linear(input_size, 512),
            # Justification: The first hidden layer has 512 neurons. This number is chosen to provide a
            # good balance between model complexity and computational efficiency.

            nn.ReLU(),
            # Justification: ReLU is used as the activation function because it helps with faster convergence
            # and alleviates the vanishing gradient problem compared to sigmoid or tanh.

            nn.Linear(512, 256),
            # Justification: A second hidden layer with 256 neurons is used to increase the model's ability to
            # capture non-linear relationships in the data.

            nn.ReLU(),
            # Justification: Another ReLU for non-linear activation.

            nn.Linear(256, output_size),
            # Justification: The output layer size corresponds to the number of classes.

            nn.LogSoftmax(dim=1)
            # Justification: LogSoftmax is used in the output layer to obtain log-probabilities which are more
            # numerically stable for the subsequent calculation of the negative log likelihood loss during training.


        )

        self.lr = lr

    def forward(self, x):
        return self.layers(x)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), self.lr)
        # Justification: The Adam optimizer is used as it combines the best properties of the AdaGrad and RMSProp
        # algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems.

    def loss(self, y_hat, y):
        fn = nn.NLLLoss()
        return fn(y_hat, y)
        # Justification: NLLLoss (Negative Log Likelihood Loss) is used as the loss function for multi-class
        # classification problems when combined with LogSoftmax in the output layer. It is efficient and
        # calculates the loss between the predicted log-probabilities and the ground truth labels.




#Defining the Trainer class

In [104]:
class Trainer:
    def __init__(self, model, optimizer, criterion, train_loader, val_loader=None, num_epochs=25, patience=5, device='cuda'):
        """
        Initialize the Trainer with model, optimizer, criterion, data loaders, and training configurations.
        """
        # Justification for changes:
        # - Added model, optimizer, criterion as parameters for flexibility and explicitness.
        # - Included train_loader and val_loader for separate training and validation data handling.
        # - Added device parameter for flexibility between CPU and GPU training.
        self.model = model.to(device)
        self.optimizer = optimizer
        self.criterion = criterion
        self.train_loader = train_loader
        self.val_loader = val_loader
        self.num_epochs = num_epochs
        self.patience = patience
        self.device = device
        self.best_val_loss = float('inf')
        self.patience_counter = 0
        self.writer = SummaryWriter()  # Default log_dir is fine; no need to customize without specific requirement.

    def train_epoch(self):
        """
        Train the model for one epoch.
        """
        self.model.train()
        running_loss = 0.0
        for inputs, labels in tqdm(self.train_loader, desc="Training"):
            inputs, labels = inputs.to(self.device), labels.to(self.device)
            self.optimizer.zero_grad()
            outputs = self.model(inputs)
            loss = self.criterion(outputs, labels)
            loss.backward()
            self.optimizer.step()
            running_loss += loss.item()
        average_loss = running_loss / len(self.train_loader)
        self.writer.add_scalar('train_loss', average_loss)
        return average_loss

    def validate(self):
        """
        Validate the model on the validation dataset.
        """
        self.model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for inputs, labels in tqdm(self.val_loader, desc="Validation"):
                inputs, labels = inputs.to(self.device), labels.to(self.device)
                outputs = self.model(inputs)
                loss = self.criterion(outputs, labels)
                val_loss += loss.item()
        average_val_loss = val_loss / len(self.val_loader)
        self.writer.add_scalar('val_loss', average_val_loss)
        return average_val_loss

    def fit(self):
        """
        Fit the model to the data.
        """
        for epoch in range(self.num_epochs):
            train_loss = self.train_epoch()
            print(f'Epoch {epoch+1}/{self.num_epochs}, Train Loss: {train_loss:.4f}')

            if self.val_loader:
                val_loss = self.validate()
                print(f'Epoch {epoch+1}/{self.num_epochs}, Validation Loss: {val_loss:.4f}')

                if val_loss < self.best_val_loss:
                    self.best_val_loss = val_loss
                    self.patience_counter = 0
                else:
                    self.patience_counter += 1
                    if self.patience_counter >= self.patience:
                        print('Early stopping triggered')
                        break
        self.writer.close()



#Main function

In [105]:
def main():
    # Data loading and transformation
    dataset_path = '/content/dataset_folder/'
    transform = transforms.Compose([
        #transforms.Grayscale(num_output_channels=1),
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
    ])
    dataset = datasets.ImageFolder(root=dataset_path, transform=transform)




    # Splitting dataset and creating DataLoaders
    train_size = int(0.8 * len(dataset))
    val_size = len(dataset) - train_size
    train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=64)

    # Model initialization
    model = MLP(input_size=224*224*3, output_size=62, lr=0.001)
    criterion = nn.NLLLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

    def print_label_distribution(loader):
      labels_list = []
      for _, labels in loader:
          labels_list.extend(labels.tolist())
      label_counts = Counter(labels_list)
      for label, count in label_counts.items():
          print(f"Label: {label}, Count: {count}")

    print(f"Total dataset size: {len(dataset)}")
    print(f"Training dataset size: {len(train_dataset)}")
    print(f"Validation dataset size: {len(val_dataset)}")


    #print("Training set label distribution:")
    #print_label_distribution(train_loader)

    #print("\nValidation set label distribution:")
    #print_label_distribution(val_loader)

    # Training
    trainer = Trainer(
        model=model,
        optimizer=optimizer,
        criterion=criterion,
        train_loader=train_loader,
        val_loader=val_loader,
        num_epochs=25,
        patience=5,
        device='cuda'
    )
    trainer.fit()


#Running

In [106]:
if __name__ == "__main__":
    main()

Total dataset size: 3410
Training dataset size: 2728
Validation dataset size: 682


Training:   5%|▍         | 2/43 [00:03<01:04,  1.56s/it]


KeyboardInterrupt: 