# Image Classification HackPack

In this notebook, we will be constructing a neural network to classify images. We will be training the model on a data set of images. This example is classifying rock, paper, scissors images.

## Prerequisites

We will need to install certain python packages before we begin, including: PyTorch, Pandas, MatPlotLib and SkLearn.
- Install these packages by opening command line and running: pip install torch pandas matplotlib scikit-learn

## Import all required packages

Once you have created a new python file, we need to import all of the packages that we have just installed, so that we are able to use them within this project. As well as importing these packages, we will also be importing specific functions etc. from these packages.

In [None]:
import torch
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import os
from sklearn.model_selection import train_test_split
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from PIL import Image
from sklearn.metrics import precision_score, recall_score, f1_score


## Creating/finding a dataset

For this project, you can either download an existing dataset from online or go one step further and build a custom dataset! 🔥🔥

If you choose to work with an existing dataset, some excellent datasets are listed below:

MNIST - contains 60,000 training images of handwritten digits 0-9

CIFAR-10 - contains 5,000 training images per class (10 classes). Classes consists of animals and everyday items (airplane, cat, horse, truck etc...)

CIFAR-100 - similar structure to CIFAR-10, yet contains training images for 100 classes, grouped into 20 superclasses

If you choose to create a custom dataset, you will need to source both training and testing photos of each classification (e.g. rock,paper,scissors). To reduce the amount of photos needed, I advise you choose a few simple classes, examples could include:

- Pen, Pencil
- Different keyboard keys
- Facial Expressions
- Cups, bottles
- Images taken inside, images taken outside
- Hand gestures
- Different fabrics

The more complex classifications you aim to use, the larger your dataset will need to be!
To make the dataset effective despite a limited size, consider the tips listed below.

- Keep the background simple, if applicable
- Use good lighting
- Ensure the classes look visually different and are easily distinguishable
- Keep the number of photos per class roughly the same
- Capture different angles of each class
- Use the same pixel dimensions for every image

While collecting these images, all images of each classification should be stored in a separate folder.



## Transform values to RGB values

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

## Load images for training

Here, we need to define the path to the folder containing our training images. This folder should contain individual sub-folders for each class of images.

We then define a list for our images, as well as a list of labels. This means that the first label will be the category/ classification of the first image etc.


In [None]:
train_data_path = ".." # Path to your dataset

image_list = []
label_list = []

class_list_ = ["Rock","Paper","Scissors"] # Our list of classifications

for category in class_list_:
    for image in os.listdir(f"{train_data_path}{category}"):
        image_list.append(f"{train_data_path}{category}/{image}")
        label_list.append(f"{category}")

## Define the data frame

We now need to define the data frame that our neural network will use to train. We use the image_list and label_list from before to do this.

In [None]:
df = pd.DataFrame()

df["image"] = image_list
df["label"] = label_list

## Split the dataset 

Our dataset needs to be split into two sections, images to be used for training and images to be used for training. We can define the ratio that this split will follow.
- TIP: You can try adjusting this ratio and see its affect on accuracy!

In [None]:
ratio = 0.20 # 20% of data will be for testing

train_df, test_df = train_test_split(df, test_size=ratio, stratify=df["label"], random_state=42)

## Augmenting data

This is where we transform our relatively small data set into a HUGE dataset! This gives the model more data to learn from, leading to higher potential accuracy.

Firstly, we define the image size we want to normalise our images to. Using a smaller number will pixelate the images before they are used for training, this removes some data but will allow us to train our model a lot quicker.

We then transform the images in both our training and testing sets, applying various different transformations. As previously mentioned, this simulates a much larger data set that we already have!



In [None]:
image_size = 28

training_transform = transforms.Compose([transforms.Resize((image_size, image_size)),
                                            transforms.RandomRotation(10),
                                             transforms.ToTensor(),
                                         transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

test_transform = transforms.Compose([transforms.Resize((image_size, image_size)),
                                             transforms.ToTensor(),
                                     transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

## Custom Training Data Class

This is where the magic begins to happen. We need to define the class that is used for our training data. We can define the BATCH_SIZE to determine the amount of images processed in each batch during training (feel free to adjust this number and see its affect on accuracy).

Once we have defined our class, we use it (as well as the batch size) to define our train/ test objects and image loaders.

In [None]:
BATCH_SIZE = 10

class CustomTrainingData(Dataset):
    def __init__(self, csv_df, class_list, transform=None):
        self.df = csv_df
        self.transform = transform
        self.class_list = class_list

    def __len__(self):
        return self.df.shape[0]

    def __getitem__(self, index):
        try:
            image_path = self.df.iloc[index]["image"]
            image = Image.open(image_path).convert('RGB')
        except Exception as e:
            return self.__getitem__((index + 1) % len(self.df)) 

        label = self.class_list.index(self.df.iloc[index]["label"])

        if self.transform:
            image = self.transform(image)

        return image, label

    
train_data_object = CustomTrainingData(train_df, class_list_, training_transform)
test_data_object = CustomTrainingData(test_df, class_list_, test_transform)

train_loader = DataLoader(train_data_object, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
test_loader = DataLoader(test_data_object, batch_size=BATCH_SIZE, shuffle=False, num_workers=0)

## Early Stopping

While our model is training on our images, it is possible that it begins to simply memorise the training data, rather than learning patterns. This means that the model becomes extremely accurate on training data, but incredibly inaccurate on any new, unseen data. This problem is called "overfitting" and leads to significant drops in accuracy. We need to prevent this if we want an accurate model!

To prevent this, we define a class below to implement "early stopping". Early stopping keeps track of the loss in accuracy for each epoch (round) of training. If the accuracy doesnt improve within 3 rounds (defined by patience=3), we stop the training and save the most accurate model. This prevents the model from beginning to memorise data instead of learning patterns, increasing accuracy!

In [None]:

class EarlyStopping:
    def __init__(self, patience=3, delta=0):
        self.counter = 0
        self.best_model_state = None
        self.patience = patience
        self.delta = delta
        self.best_score = None
        self.early_stop = False

    def __call__(self, val_loss, model):
        score = -val_loss
        if self.best_score is None:
            self.best_score = score
            self.best_model_state = model.state_dict()
        elif score < self.best_score + self.delta:
            self.counter += 1
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.best_model_state = model.state_dict()
            self.counter = 0

    def load_best_model(self, model):
        model.load_state_dict(self.best_model_state)

## Neural Network Structure 🔥🔥

This is the core of our project; this is where we define HOW data travels through our neural network. We do this by defining the types of layers we are using.

There are many types of layers that serve a unique purpose. These layers work together to pass data through the neural network in each round of training, hopefully increasing the accuracy each time. 

https://www.geeksforgeeks.org/deep-learning/layers-in-artificial-neural-networks-ann/ provides an amazing explanation of the purpose of different types of layers.


In [None]:

class ImageClassifier(nn.Module):
    def __init__(self, num_classes, input_size=(image_size, image_size), channels=3): 
        super(ImageClassifier, self).__init__()

        self.input_size = input_size
        self.channels = channels

        # Convolutional layers
        self.conv1 = nn.Conv2d(channels, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)

        # Batch normalization layers
        self.bn1 = nn.BatchNorm2d(32)
        self.bn2 = nn.BatchNorm2d(64)
        self.bn3 = nn.BatchNorm2d(128)
        self.bn4 = nn.BatchNorm2d(256)

        # Max pooling layer
        self.pool = nn.MaxPool2d(2, 2)

        # Dropout layer
        self.dropout = nn.Dropout(0.5)

        # Calculate the size of the flattened features
        self._to_linear = None
        self._calculate_to_linear(input_size)

        # Fully connected layers
        self.fc1 = nn.Linear(self._to_linear, 512)
        self.fc2 = nn.Linear(512, num_classes)
        
    def _calculate_to_linear(self, input_size):
        # This function calculates the size of the flattened features
        x = torch.randn(1, self.channels, *input_size)
        self.conv_forward(x)

    def conv_forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = self.pool(F.relu(self.bn3(self.conv3(x))))
        x = self.pool(F.relu(self.bn4(self.conv4(x))))

        if self._to_linear is None:
            self._to_linear = x[0].shape[0] * x[0].shape[1] * x[0].shape[2]
        return x

    def forward(self, x):
        x = self.conv_forward(x)

        # Flatten the output for the fully connected layer
        x = x.view(-1, self._to_linear)

        # Fully connected layers with ReLU and dropout
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.fc2(x)

        return x
    

## Hyper parameters and initialisation

Now that we have defined the layers and structure of our neural network, we need to do some more steps before we define our training loop. Below, we define some hyper parameters that are used in our training loop. 
- Epochs - The number of "rounds" of training
- Learning rate - 
- Num Classes - The number of classifications we have (e.g. rock, paper, scissors)
- Input Size - The pixel dimensions of the images we are inputting for training
- Channels - Represents the pieces of data for each pixel (3 for RGB)

We then check if the program has access to a faster processor (e.g. a GPU), and use this if possible.

Finally, we initialise the model using our customised ImageClassifier class!

In [None]:
EPOCHS = 100
LEARNING_RATE = 0.001
NUM_CLASSES = len(class_list_)
INPUT_SIZE = (image_size, image_size)
CHANNELS = 3

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# INIT MODEL
model = ImageClassifier(NUM_CLASSES, INPUT_SIZE, CHANNELS).to(device)

## Loss and early stopping

The final step before defining our training loop is to define our loss function and implement or early stopping class from earlier. The loss function is the metric that early stopping will use to decide if our model is making progress or should be halted.

In [None]:
# LOSS FUNCTION
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

early_stopping = EarlyStopping()

## THE TRAINING LOOP ❗❗❗

We have finally reached the point where we can define the loop that will be train our model, tying together all of our hard work so far! A lot is happening here but below is a summary of each step that this code takes in each epoch of training.

- 1. model.train() passes data through the model, attempting to learn patterns to improve accuracy
- 2. The loss is calculated for that epoch of training and is printed to the console
- 3. The model is evaluated for accuracy
- 4. EarlyStopping decides whether the model is making progress or should be halted.
- 5. The metrics that describe the current accuracy and performance of the model are calculated and printed to the console.

Once the training loop ends - meaning either 100 epochs have run or early stopping has halted the training - the program lets us know by printing "Training finished!" to the console.

In [None]:

# Training loop
for epoch in range(EPOCHS):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images = images.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f'Epoch [{epoch+1}/{EPOCHS}], Loss: {running_loss/len(train_loader):.4f}')

    # Validation
    model.eval()
    all_predictions = []
    all_labels = []
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            all_predictions.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

        early_stopping(running_loss, model)
        if early_stopping.early_stop:
            print("Early stopping")
            break

    early_stopping.load_best_model(model)
    
    # Calculate metrics
    accuracy = 100 * sum(np.array(all_predictions) == np.array(all_labels)) / len(all_labels)
    precision = precision_score(all_labels, all_predictions, average='weighted')
    recall = recall_score(all_labels, all_predictions, average='weighted')
    f1 = f1_score(all_labels, all_predictions, average='weighted')

    # Print metrics
    print(f'Epoch [{epoch+1}/{EPOCHS}]')
    print(f'Accuracy on test set: {accuracy:.2f}%')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')

print('Training finished!')

## Saving the model

Congrats!! You have succesfully built a neural network in python that classifies images! Now we simply save the model and then it can be used to predict the classification of new images.

In [None]:
# Save the model
torch.save(model.state_dict(), 'classifier.pth')

## Making predictiion

We can now use our model to make predictions on new images!

In [None]:
# LOAD THE MODEL

model = ImageClassifier(num_classes=3, input_size=(128, 128))
model.load_state_dict(torch.load('model.pth'))
model.eval()  # Evaluation Mode


# PREPARE THE IMAGE

img = Image.open('test.jpg')  # Load a new, unseen image from your file system

transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

image_tensor = transform(img).unsqueeze(0)

# MAKE PREDICTION

with torch.no_grad():
    outputs = model(image_tensor)
    _, predicted = torch.max(outputs, 1)
    prediction = class_names[predicted.item()]

print(f"Predicted class: {prediction}") # Outputs the predicted classification of the unseen image