##  Classification - Lost in the Closet


### 1 Introduction

Dataset - Fashion_MNIST

In [192]:
!pip install torchvision
!pip install matplotlib



-------------------------------------------------------------------------------------------------------------------------------

### 3 Lost in the closet (Classification)
**You are an artist who secluded yourself for years to come up with the perfect design for a new brand
of clothes. However, your time off from civilisation was not so beneficial since you cannot distinguish
a T-shirt from a dress or a sneaker from a sandal any more. In order to address that issue, you choose
to train a Convolutional Neural Network (using PyTorch) that will help you identify each cloth to
match the perfect design you created. In order to train it, you decide to rely on the dataset fashion
MNIST (https://github.com/zalandoresearch/fashion-mnist).**
**You can access the data using the following lines (we strongly advise you to copy this code from the
fashion mnist.py file attached to this coursework)**

#### Installing Required Packages

In [193]:
import torchvision
import torchvision.transforms as transforms
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import os

**1. Given the problem, what is the most appropriate loss function to use? Provide the name of the loss, its formula and the formula interpretation in your report.**

**2. First convolutional layer: Kernel size (5 × 5), Stride size (1 × 1) and 32 output channels. Activation function)**

**3. Max pooling layer: Kernel size (2 × 2) and Stride size (2 × 2).**

**4. Second convolutional layer: Kernel size (5×5), Stride size (1 × 1) and 64 output channels.**

**5. Max pooling layer: Kernel size (2 × 2) and Stride size (2 × 2).**

**6. First fully-connected layer with input size being the output size of max pooling layer in 5. (flattened, i.e. 1024) and output size 1024.**

**7. Second fully-connected layer with input size being the output size of fully connected layer in 6. (i.e. 1024) and output size 256.**

**8. Output layer with input size being the output size of fully-connected layer in 7. (i.e. 256) and output size 10.**

**For training, initialise your weights using the Xavier Uniform initialisation, use ReLU as the
activation function, a learning rate of 0.1 with the SGD optimiser. You will train your neural
network for 30 epochs. In your report, provide the following: (a) final (train and test) accuracy
obtained; (b) plot of the accuracy on the training and test sets per each epoch, comment on the
speed of performance changes across epochs; (c) plot of the train loss per epoch (total sum of
per batch losses for each epoch) and comment on the speed of decr**ease.

In [194]:
train_set = torchvision.datasets.FashionMNIST(root=".", train=True, download=True, transform=transforms.ToTensor())
test_set = torchvision.datasets.FashionMNIST(root=".", train=False, download=True, transform=transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=32, shuffle=False)

# Fix the seed to be able to get the same randomness across runs and hence reproducible outcomes
torch.manual_seed(0)
# If you are using CuDNN, otherwise you can just ignore
torch.cuda.manual_seed(0)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

In [195]:
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")#using T4 GPU to make it faster

# Store results in "outputs" folder
if not os.path.exists("outputs"):
    os.makedirs("outputs")
    print("Outputs folder created")
else:
    print("Outputs folder already exists")

Outputs folder already exists


In [196]:
class Fashion_MNIST_CNN(nn.Module):
    def __init__(self, activation="relu", dropout_rate=0.0):
        super(Fashion_MNIST_CNN, self).__init__()

        self.num_classes = 10
        if activation == "relu":
            self.activation = nn.ReLU()
        elif activation == "elu":
            self.activation = nn.ELU()
        elif activation == "tanh":
            self.activation = nn.Tanh()
        elif activation == "sigmoid":
            self.activation = nn.Sigmoid()

        #3.1 Input = 28 * 28 * 1
        self.cnn_layer = nn.Sequential(
            #3.2 First convolution layer:Kernel size(5 x 5), Stride size(1 x 1) and 32 output channels.
            #Activation function => ReLU
            nn.Conv2d(1, 32, kernel_size=5, stride=1),
            self.activation,

            #3.3 Max pooling layer: Kernel size(2 x 2) and Stride size(2 x 2).
            nn.MaxPool2d(kernel_size=2, stride=2),

            #3.4 Second convolutional layer: Kernel size (5×5), Stride size (1 × 1) and 64 output channels.
            #Activation function => ReLU
            nn.Conv2d(32, 64, kernel_size=5, stride=1),
            self.activation,

            #3.5 Max pooling layer: Kernel size (2 × 2) and Stride size (2 × 2).
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.fcc = nn.Sequential(
            #3.6 First fully-connected layer with input size being the output size of max pooling layer in 5. (flattened, i.e. 1024) and output size 1024.
            #input = flattened output of second convolutional layer = 64 * 4 * 4
            #64 = output channels of second convolutional layer
            #4 * 4 = output size of second convolutional layer
            #output = 1024
            #Activation function = ReLU
            nn.Linear(64 * 4 * 4, 1024), #
            self.activation,

            #3.7 Second fully-connected layer with input size being the output size of fully connected layer in 6. (i.e. 1024) and output size 256.
            #input = 1024
            #output = 256
            #activation function = ReLU
            nn.Linear(1024, 256),
            self.activation,

            #Add dropout layer
            nn.Dropout(p=dropout_rate),

            #3.8 Output layer with input size being the output size of fully-connected layer in 7 and output size 10.
            #output layer
            #input = 256
            #output = self.num_classes
            #Activation function => softmax
            nn.Linear(256, self.num_classes)
        )
    
    def forward(self, x):
        # first convolutional layer
        x = self.cnn_layer(x)

        # flatten the output of second convolutional layer
        x = x.view(x.size(0), -1)

        # fully connected layers
        x = self.fcc(x)

        return x

In [197]:
def evaluation(model, dataloader):
    total, correct = 0,0
    model.eval()
    # TO DO
    for data in dataloader:
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, pred = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (pred == labels).sum().item()

    accuracy = 100 * correct / total
    return accuracy

In [198]:
def weights_init(layer):
    # xaiver uniform initialisation
    if isinstance(layer, nn.Linear) or isinstance(layer, nn.Conv2d):
        torch.nn.init.xavier_normal_(layer.weight)

In [199]:
def train(model, train_loader, test_loader, lr, epochs):
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    train_losses = []
    train_acc = []
    test_acc = []

    for epoch in range(epochs):
        running_loss = 0.0

        model.train()
        for i, data in enumerate(train_loader, 0):
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        train_accuracy = evaluation(model, train_loader)
        train_acc.append(train_accuracy)
        train_losses.append(running_loss)

        test_accuracy = evaluation(model, test_loader)
        test_acc.append(test_accuracy)

        print(f"Epoch [{epoch + 1}/{epochs}] - Train Loss: {running_loss:.4f} - Train Acc: {train_accuracy:.2f}% - Test Acc: {test_accuracy:.2f}%")

    return model, train_losses, train_acc, test_acc

In [200]:
#3.1
def fashion_mnist_model_training(activation="relu", lr=0.1, epochs=30, dropout_rate=0.0):
    model = Fashion_MNIST_CNN(activation=activation, dropout_rate=dropout_rate).to(device)
    model.apply(weights_init)
    # start training
    model, train_losses, train_acc, test_acc = train(model, train_loader, test_loader, lr=lr, epochs=epochs)
    torch.save(model.state_dict(), "outputs/model.pth")

    #3.2 (a) Final (train and test) accuracy obtained
    final_train_accuracy = train_acc[-1]
    final_valid_accuracy = test_acc[-1]
    print(f"Final loss: {train_losses[-1]}")
    print(f"Final train accuracy: {final_train_accuracy}")
    print(f"Final valid accuracy: {final_valid_accuracy}")


    #(b) Plot accuracy on the training and test sets per each epochtorch.nn.CrossEntropyLoss

    plt.figure()
    plt.plot(train_acc, label="Train Accuracy")
    plt.plot(test_acc, label="Validation Accuracy")
    plt.title("Accuracy vs Epoch")
    plt.xlabel("Epoch")
    plt.ylabel("Accuracy")
    plt.legend()
    acc_filename = f"outputs/accuracy_{activation}{lr}{dropout_rate}.jpg"
    plt.savefig(acc_filename)
    plt.close()  # Close the figure to prevent it from displaying in a notebook environment
    
    
    #(c) Plot of the train loss per epoch
    plt.figure()
    plt.plot(train_losses, label="Training Loss")
    plt.title("Loss vs Epoch")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.legend()
    loss_plot_filename = f"outputs/loss_{activation}{lr}{dropout_rate}.jpg"
    plt.savefig(loss_plot_filename)
    plt.close()
    

**3.3 Run three experiments each time changing all the current activation functions to one of the
following: Tanh, Sigmoid and ELU. In your report, provide only the final classification accuracy
values (train and test) per activation function and comment on the result**

In [201]:
# Question 3.3
def experimenting_with_activations():
    # Try different activations and see how they affect the training and test accuracy
    activations = ["relu", "elu", "tanh", "sigmoid"]
    #print("---------------------------------------------------")
    print("Experimenting with different activations")
    for activation in activations:
        print(f"Using Activation: {activation}")
        fashion_mnist_model_training(activation=activation)
    #print("----------------------------------------------------")

**3.4 Keeping ReLU, use 5 different learning rates: 0.001, 0.1, 0.5, 1, 10. In your report, provide the
final train loss, as well as the final accuracy values for both train and test for each learning rate
and comment on the trade-offs between speed and stability of convergence. Comment on why
you get the Nan loss if any.**

In [202]:
# Question 3.4
def experimenting_with_learning_rates():
    # Try different learning rates and see how they affect the training and test accuracy
    learning_rates = [0.001, 0.1, 0.5, 1, 10]
    activation = "relu"
    print("------------------------------------------------------")
    print("Experimenting with different learning rates")
    for lr in learning_rates:
        print(f"Using Learning rate: {lr} and Activation: {activation}")
        fashion_mnist_model_training(activation=activation, lr=lr)
    print("-------------------------------------------------------")


**3.5 Add a dropout of 0.3 rate on the second fully connected layer (keeping ReLU and learning rate
0.1). In your report, provide the final train and test accuracy values and explain how the dropout
affects the performance**

In [203]:
# Question 3.5
def experimenting_dropout():
    # Check how dropout affects the performance
    dropout_rate = 0.3
    lr = 0.1
    activation = "relu"
    print("-------------------------------------------------------")
    print("Experimenting with dropout")
    print(f"Using dropout rate: {dropout_rate}")
    fashion_mnist_model_training(activation=activation, lr=lr, epochs=30, dropout_rate=dropout_rate)

In [204]:
# Question 3.3
experimenting_with_activations()

Experimenting with different activations
Using Activation: relu
Epoch [1/30] - Train Loss: 979.4951 - Train Acc: 87.01% - Test Acc: 85.76%
Epoch [2/30] - Train Loss: 614.6555 - Train Acc: 89.08% - Test Acc: 87.53%
Epoch [3/30] - Train Loss: 520.5550 - Train Acc: 89.98% - Test Acc: 88.19%
Epoch [4/30] - Train Loss: 465.1674 - Train Acc: 91.62% - Test Acc: 89.37%
Epoch [5/30] - Train Loss: 418.1237 - Train Acc: 92.61% - Test Acc: 89.68%
Epoch [6/30] - Train Loss: 374.2790 - Train Acc: 93.77% - Test Acc: 90.35%
Epoch [7/30] - Train Loss: 342.5053 - Train Acc: 94.40% - Test Acc: 90.50%
Epoch [8/30] - Train Loss: 306.7180 - Train Acc: 94.41% - Test Acc: 90.68%
Epoch [9/30] - Train Loss: 279.6757 - Train Acc: 95.62% - Test Acc: 90.67%
Epoch [10/30] - Train Loss: 248.6360 - Train Acc: 95.96% - Test Acc: 91.06%
Epoch [11/30] - Train Loss: 224.8042 - Train Acc: 96.72% - Test Acc: 91.32%
Epoch [12/30] - Train Loss: 203.5414 - Train Acc: 96.88% - Test Acc: 90.89%
Epoch [13/30] - Train Loss: 183.0

In [205]:
#Question 3.4
experimenting_with_learning_rates()

------------------------------------------------------
Experimenting with different learning rates
Using Learning rate: 0.001 and Activation: relu
Epoch [1/30] - Train Loss: 3143.8315 - Train Acc: 66.77% - Test Acc: 66.64%
Epoch [2/30] - Train Loss: 1550.0608 - Train Acc: 69.69% - Test Acc: 68.83%
Epoch [3/30] - Train Loss: 1362.8149 - Train Acc: 74.72% - Test Acc: 73.99%
Epoch [4/30] - Train Loss: 1253.4206 - Train Acc: 75.74% - Test Acc: 75.27%
Epoch [5/30] - Train Loss: 1173.3805 - Train Acc: 77.32% - Test Acc: 76.54%
Epoch [6/30] - Train Loss: 1115.1412 - Train Acc: 79.83% - Test Acc: 78.49%
Epoch [7/30] - Train Loss: 1064.5458 - Train Acc: 78.56% - Test Acc: 77.87%
Epoch [8/30] - Train Loss: 1023.2054 - Train Acc: 79.89% - Test Acc: 78.60%
Epoch [9/30] - Train Loss: 984.2584 - Train Acc: 82.12% - Test Acc: 81.10%
Epoch [10/30] - Train Loss: 952.5296 - Train Acc: 80.89% - Test Acc: 79.63%
Epoch [11/30] - Train Loss: 925.4501 - Train Acc: 82.97% - Test Acc: 82.12%
Epoch [12/30] - Tr

In [206]:
#Question 3.5
experimenting_dropout()

-------------------------------------------------------
Experimenting with dropout
Using dropout rate: 0.3
Epoch [1/30] - Train Loss: 1062.9818 - Train Acc: 84.95% - Test Acc: 84.14%
Epoch [2/30] - Train Loss: 649.4358 - Train Acc: 89.45% - Test Acc: 87.96%
Epoch [3/30] - Train Loss: 560.0340 - Train Acc: 89.14% - Test Acc: 87.39%
Epoch [4/30] - Train Loss: 501.0179 - Train Acc: 90.74% - Test Acc: 88.72%
Epoch [5/30] - Train Loss: 455.7441 - Train Acc: 92.33% - Test Acc: 90.26%
Epoch [6/30] - Train Loss: 414.8983 - Train Acc: 92.97% - Test Acc: 90.14%
Epoch [7/30] - Train Loss: 383.0813 - Train Acc: 92.96% - Test Acc: 89.98%
Epoch [8/30] - Train Loss: 354.0021 - Train Acc: 93.70% - Test Acc: 90.25%
Epoch [9/30] - Train Loss: 319.4971 - Train Acc: 94.65% - Test Acc: 90.97%
Epoch [10/30] - Train Loss: 302.9978 - Train Acc: 95.27% - Test Acc: 90.62%
Epoch [11/30] - Train Loss: 269.1865 - Train Acc: 95.31% - Test Acc: 90.86%
Epoch [12/30] - Train Loss: 255.3331 - Train Acc: 95.80% - Test A