# Applying Machine Learning to Image classification - Demo


Running this notebook yourself:
* Dependencies - need to install first
    * Sklearn    - https://scikit-learn.org/stable/install.html
    * Jupyterlab - https://jupyter.org/install
    * Numpy      - https://numpy.org/install/
    * Pytorch & Torchvision    - https://pytorch.org/get-started/locally/
* Testing
    * For linux, run `jupyter-lab` on a terminal in the same directory as this notebook
        * `jupyter-notebook` should also work.
    * For windows, running `jupyter-lab` from the command prompt should work but I have not tested this. 
        * See https://jupyter.org/install for more details

In [None]:
import numpy as np 
import matplotlib.pyplot as plt
import time
import random
from sklearn import metrics
from sklearn.metrics import ConfusionMatrixDisplay
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# 1. Load and visualise data.

The data used here is the Fashion-MNIST dataset, more information about the dataset is in this paper:

[1] Han Xiao, Kashif Rasul, Roland Vollgraf. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. [arXiv:1708.07747](https://arxiv.org/abs/1708.07747)

The dataset is prepared in a similar way to MNIST (handwritten text dataset) but here the images are of items of clothing. The images are of size 28x28 pixels, and are divided into 10 classes:
0. T-shirt
1. Trousers
2. Pullover
3. Dress
4. Coat
5. Sandal
6. Shirt
7. Sneaker
8. Bag
9. Ankle boot

In [None]:
# Load training data
train_set = torchvision.datasets.FashionMNIST(root='.', download=True, train=True, transform=transforms.ToTensor())

train_image = np.array(train_set.data)
train_label = np.array(train_set.targets)
class_name = train_set.classes

# Load testing data
test_set = torchvision.datasets.FashionMNIST(root='.', download=True, train=False, transform=transforms.ToTensor())

test_image = np.array(test_set.data)
test_label = np.array(test_set.targets)

### What's the balance between training and testing data?

In [None]:
print("training input dimensions: ", train_image.shape)
print("training label dimensions: ", train_label.shape)
print("test input dimensions: ", test_image.shape)
print("test label dimensions: ", test_label.shape)

### What does our data actually look like?

Each row shows 10 samples for one class (row 1 shows 10 `T-shirt` images, row 2 shows 10 `Trousers` images, etc).

In [None]:
grouped_images = [[] for _ in range(10)]

for i in range(len(train_label)):
    grouped_images[train_label[i]].append(train_image[i])

plt.figure(figsize=(10,10))
for g in range(10):
    for i in range(10):
        plt.subplot(10, 10, g*10 + i + 1)
        plt.imshow(grouped_images[g][i], cmap='gray', vmin=0, vmax=255)
        plt.axis("off")


### Is our dataset balanced?

In [None]:
for i in range(10):
    print(F"Size of class {i}: {len(grouped_images[i])}")

## 2. Apply machine learning to the problem

### Build the CNN

Our network architecture is based on LeNet (shown below), which has a number of convolutional layers followed by a few fully connected layers.

![](lenet.png)

In [None]:
class FashionCNN(nn.Module):
    
    def __init__(self, l1=6, l2=16, l3=120, l4=84):
        super().__init__()
        
        # Take single grayscale image, make 6 feature maps with convolutions
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=l1, kernel_size=5, padding=2)
        # Pool 28x28 outputs to be 14x14
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Take 6 14x14 pools, make 16 10x10 feature maps
        self.conv3 = nn.Conv2d(in_channels=l1, out_channels=l2, kernel_size=5)
        # Pool 10x10 layers to be 5x5
        self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Use 5x5 convolution to make 16 5x5 featuremaps into neurons
        self.conv5 = nn.Conv2d(in_channels=l2, out_channels=l3, kernel_size=5)
        
        self.fc6 = nn.Linear(l3, l4)
        
        self.fc7 = nn.Linear(l4, 10)
        
        self.arch = F"leaky_relu_var_params_({l1}, {l2}, {l3}, {l4})"
        self.version = 0
        
    def forward(self, x):
        
        # (1, 28, 28) to (6, 28, 28)
        x = F.leaky_relu(self.conv1(x))
        #(6, 28, 28) to (6, 14, 14)
        x = self.pool2(x)
        
        # (6, 14, 14) to (16, 10, 10)
        x = F.leaky_relu(self.conv3(x))
        # (16, 10, 10) to (16, 5, 5)
        x = self.pool4(x)
        
        # (16, 5, 5) to (120, 1, 1)
        x = self.conv5(x)
        
        
        # Reshape data to fit fully connected layer
        # (120, 1, 1) to (1, 120)
        x = x.view(-1, 120)
        
        # (1, 120) to (1, 84)
        x = F.leaky_relu(self.fc6(x))
        
        # (1, 84) to (1, 10)
        x = self.fc7(x)
        return x

### Define hyperparameters to be used in training

In [None]:

loss_fn = nn.CrossEntropyLoss()

# Set MAX_EPOCHs here, but actually stops by evaluating performance on validation set (early stopping)
MAX_EPOCHS = 20

# Optimiser is declared inside train_with_early_stop
# As I'm using the Adam optimiser, which has an adaptive learing rate, I do not set the learning rate myself

BATCH_SIZE = 10

### Train the model

First define helper functions for testing and evaluating the network. Then define a function to train the model.

At each iteration, get a random batch of images and labels from train_image and train_label (the DataLoader does this for us), feed into the network model and perform gradient descent (done by `optimiser.step`).

In [None]:
def test_cnn(cnn, loader):
    predictions = []
    with torch.no_grad():
        for data in loader:
            inputs, labels = data
            outputs = cnn(inputs)
            preds = torch.argmax(outputs.data, 1)
            predictions += preds
    return np.array(predictions)

def eval_acc_cnn(cnn, loader):
    preds = test_cnn(cnn, loader)
    r_labels = []
    for data in loader:
        _, ls = data
        r_labels += ls
        
    r_labels = np.array(r_labels)
    correct = (preds == r_labels).sum().item()
    return 100 * correct / len(r_labels)

def train_with_early_stop(tr_set, val_set, new_net, patience=6):

    optimiser = optim.Adam(net.parameters())
    
    t_l = torch.utils.data.DataLoader(tr_set, batch_size=BATCH_SIZE, shuffle=True, num_workers=2)
    v_l = torch.utils.data.DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)
    
    iterations_without_increase = 0
    best_dict = new_net.state_dict()
    best_val_acc = 0.0
    for epoch in range(MAX_EPOCHS):
        end_training = False
        running_loss = 0.0
        for i, data in enumerate(t_l, 0):
            inputs, labels = data

            optimiser.zero_grad()

            # Forward progagation
            outputs = net(inputs)

            # Calculate loss
            loss_val = loss_fn(outputs, labels)

            # Backward propagation
            loss_val.backward()

            # Update parameters
            optimiser.step()

            # Print statistics
            running_loss += loss_val.item()
            if i % 1000 == 999:
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 1000))
                running_loss = 0.0

                # Check for accuracy change
                val_acc = eval_acc_cnn(new_net, v_l)
                print(F'Val acc: {val_acc}')
                if val_acc > best_val_acc:
                    best_val_acc = val_acc
                    best_dict = new_net.state_dict()
                    iterations_without_increase = 0
                else:
                    iterations_without_increase += 1
                    if iterations_without_increase >= patience:
                        print('Training ended')
                        end_training = True
                        break
                print()
        if end_training:
            break
    new_net.load_state_dict(best_dict)
    return new_net, best_val_acc


net = FashionCNN()

train_sub_set, val_sub_set = torch.utils.data.random_split(train_set, [54000, 6000])

start_train = time.time()
print("Training starting")
net, val_acc = train_with_early_stop(train_sub_set, val_sub_set, net)
end_train = time.time()

print(F"Training took {end_train-start_train}")

### Try the network on the test set


In [None]:
test_loader = torch.utils.data.DataLoader(test_set, batch_size=BATCH_SIZE,
                                         shuffle=False)


start_test = time.time()
# defined in 2.3
all_preds = test_cnn(net, test_loader)

end_test = time.time()
print(F'Testing took {end_test - start_test} seconds')

### Calculate the accuracy of the predictions

In [None]:

labels = np.array(test_loader.dataset.targets)

correct = (all_preds == labels).sum().item()

print('Accuracy on test set %.2f %%' % (100 * correct / len(labels)))

### Visualise the performance by looking at the confusion matrix

In [None]:
conf_matrix = metrics.confusion_matrix(labels, all_preds)
print(conf_matrix)

ConfusionMatrixDisplay(conf_matrix, class_name).plot()

## Try the network out on real date

I've taken 3 pictures of my own clothes, which fit into the classes the network was trained on. The pictures are formatted in roughly the same way as the training data.

### Display the images

In [None]:
my_t_shirt = plt.imread("real_images/t_shirt_scaled.JPG")
my_trainer = plt.imread("real_images/trainer_scaled.JPG")
my_trousers = plt.imread("real_images/trousers_scaled.JPG")

fig, axis = plt.subplots(1, 3)

axis[0].imshow(my_t_shirt, cmap='gray', vmin=0, vmax=255)
axis[0].set_axis_off()
axis[0].set_title("My T-Shirt")

axis[1].imshow(my_trainer, cmap='gray', vmin=0, vmax=255)
axis[1].set_axis_off()
axis[1].set_title("My Trainer")

axis[2].imshow(my_trousers, cmap='gray', vmin=0, vmax=255)
axis[2].set_axis_off()
axis[2].set_title("My Trousers")

### Try out the network on the real images

In [None]:


real_images_tensor = torch.FloatTensor([[my_t_shirt], [my_trainer], [my_trousers]])
real_images_outputs = net(real_images_tensor)

real_images_predictions = torch.argmax(real_images_outputs, 1)

print(F"T-Shirt prediction: {class_name[real_images_predictions[0]]} (class {real_images_predictions[0]})")
print(F"Trainer prediction: {class_name[real_images_predictions[1]]} (class {real_images_predictions[1]})")
print(F"Trousers prediction: {class_name[real_images_predictions[2]]} (class {real_images_predictions[2]})")


### Did it work?