# **Practical**
This notebook implements a Convolutional Neural Network (CNN) in PyTorch for image classification using the FashionMNIST dataset.

---
### Overview:
- Importing necessary libraries.
- Loading and preprocessing the dataset.
- Defining CNN architectures with variations.
- Training and evaluating the models.
- Visualizing the training and testing accuracy.

## **1. Importing Libraries**
The necessary libraries for building and training the CNN are imported. We use:
- `torch` for creating and training neural networks.
- `numpy` for numerical computations.
- `matplotlib` for visualization.
- `torchvision` for accessing datasets and transformations.

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

## **2. Check CUDA Availability**
We check if a GPU is available for faster training.

In [None]:
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    print('CUDA is not available.  Training on CPU ...')
else:
    print('CUDA is available!  Training on GPU ...')

## **3. Data Loading and Preprocessing**
We define:
- `num_workers`: Number of subprocesses for data loading.
- `batch_size`: Number of samples per batch.
- Transformations: Convert images to tensors and apply normalization.
- Load the FashionMNIST dataset and shuffle training indices.

In [None]:
num_workers = 0 
batch_size = 60

train_transform = transforms.ToTensor()
test_transform = transforms.ToTensor()

train_data = datasets.FashionMNIST('FashionMNIST/raw/train-images-idx3-ubyte', train=True, download=True, transform=train_transform)
test_data = datasets.FashionMNIST('FashionMNIST/raw/t10k-images-idx3-ubyte', train=False, download=True, transform=test_transform)

# obtain training indices
num_train = len(train_data)
train_idx = list(range(num_train))
np.random.shuffle(train_idx)

# define samplers for obtaining training
train_sampler = SubsetRandomSampler(train_idx)

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

classes = ["T-shirt/top", \
           "Trouser",\
           "Pullover",\
           "Dress",\
           "Coat",\
           "Sandal",\
           "Shirt",\
           "Sneaker",\
           "Bag",\
           "Ankle boot"\
           ]

## **4. Data Visualization**
We display a batch of training images to visually confirm the data is loaded correctly.

In [None]:
def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    plt.imshow(np.transpose(img, (1, 2, 0)), cmap='gray')

dataiter = iter(train_loader)
images, labels = next(dataiter)
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 10, idx + 1, xticks=[], yticks=[])
    imshow(images[idx])
    ax.set_title(classes[labels[idx]])

## **5. Defining CNN Architectures**
We define three variations of CNN models with dropout, batch normalization, and without any modifications. These architectures are essential for comparing different regularization techniques.


### 1. Base Class (`BaseNet`)

The `BaseNet` class contains the core components that are shared across the different models. It defines:
- Two convolutional layers (`conv1` and `conv2`).
- An average pooling layer (`pool`).
- Three fully connected layers (`fc1`, `fc2`, `fc3`).

Additionally, it implements a `common_forward` method that processes the input through these layers in the forward pass. This method can be inherited by other models to reuse the common structure.


In [None]:
class BaseNet(nn.Module):
    def __init__(self):
        super(BaseNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5, padding=2)
        self.conv2 = nn.Conv2d(6, 16, 5, padding=0)
        self.pool = nn.AvgPool2d(2, 2)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def common_forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return F.log_softmax(x, dim=1)

### 2. Derived Models

- **`Net_dropout`**: This model inherits from `BaseNet` and adds a dropout layer before the fully connected layers to prevent overfitting. Dropout is applied during training by randomly setting some of the activations to zero.


In [None]:
class Net_dropout(BaseNet):
    def __init__(self):
        super(Net_dropout, self).__init__()
        self.dropout = nn.Dropout(0.4)
    
    def forward(self, x):
        x = self.common_forward(x)
        x = self.dropout(x.view(-1, 84))  # Apply dropout before the fully connected layers
        return x

- **`Net`**: A simpler model that directly uses the `common_forward` method from `BaseNet` without any additional modifications.

In [None]:
class Net(BaseNet):
    def __init__(self):
        super(Net, self).__init__()
    
    def forward(self, x):
        return self.common_forward(x)

- **`Net_BatchNorm`**: This model also inherits from `BaseNet`, but it introduces batch normalization after each convolutional layer. Batch normalization helps improve the training of deep networks by normalizing the input to each layer, leading to faster convergence.


In [None]:
class Net_BatchNorm(BaseNet):
    def __init__(self):
        super(Net_BatchNorm, self).__init__()
        self.bn1 = nn.BatchNorm2d(6)
        self.bn2 = nn.BatchNorm2d(16)
    
    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = x.view(-1, 16 * 5 * 5)
        return self.common_forward(x)

## **6. Training Various CNN Models**
After defining the models, we proceed with training them using different configurations. Each model is trained for 15 epochs with the following setup:

### 1. Model Definition and Optimizer Setup
We define the models in the `models` list, which includes:
- `model_origin`: The basic model without dropout or batch normalization.
- `model_dropout`: The model with dropout to prevent overfitting.
- `model_batchnorm`: The model with batch normalization for improved training.
- `model_weightdecay`: The basic model with added weight decay for regularization.

For each model, we specify the loss function (`CrossEntropyLoss`) and the optimizer (`Adam`). In the case of `model_weightdecay`, weight decay is added to the optimizer to penalize large weights.


In [None]:
# create a various complete CNN
model_origin = Net()
model_dropout = Net_dropout()
model_batchnorm = Net_BatchNorm()
model_weightdecay = Net()

models = [(model_origin,"model_origin"), (model_dropout,"model_dropout"),
          (model_batchnorm,"model_batchnorm"),(model_weightdecay,"model_weightdecay")]

print(model_origin)
print(model_dropout)
print(model_batchnorm)
print(model_weightdecay)

### 2. Training Loop
We train each model for 15 epochs. During each epoch, we:
- Set the model to training mode.
- Load the data and perform a forward pass to compute the output.
- Calculate the loss and perform a backward pass to update the model parameters.
- Track the training loss and save the model weights after each epoch.


In [None]:
#training the different models
for i in range(4):

  #defining the wanted model
  model = models[i][0]
  print(models[i][1])

  # specify loss function (categorical cross-entropy)
  criterion = nn.CrossEntropyLoss()

  # specify optimizer
  if i != 3:
    optimizer = optim.Adam(model.parameters(), lr=0.002)
  else:
    optimizer = optim.Adam(model.parameters(), weight_decay = 0.0002, lr=0.002)

  # move tensors to GPU if CUDA is available
  if train_on_gpu:
      model.cuda()

  # number of epochs to train the model
  n_epochs = 15

  for epoch in range(1, n_epochs+1):

      # keep track of training and validation loss
      train_loss = 0.0

      ###################
      # train the model #
      ###################
      model.train()
      for data, target in train_loader:
          # move tensors to GPU if CUDA is available
          if train_on_gpu:
              data, target = data.cuda(), target.cuda()
          # clear the gradients of all optimized variables
          optimizer.zero_grad()
          # forward pass: compute predicted outputs by passing inputs to the model
          output = model(data)
          # calculate the batch loss
          loss = criterion(output, target)
          # backward pass: compute gradient of the loss with respect to model parameters
          loss.backward()
          # perform a single optimization step (parameter update)
          optimizer.step()
          # update training loss
          train_loss += loss.item()*data.size(0)

      # calculate average losses
      train_loss = train_loss/len(train_loader.sampler)

      # print training/validation statistics
      print('Epoch: {} \tTraining Loss: {:.6f}'.format(
          epoch, train_loss))

      # save model
      torch.save(model.state_dict(), f'{models[i][1]}_{epoch}.pt')

## **7. Accuracy Evaluation of Each Model**

After training the models, we evaluate their performance on both training and testing datasets. For each model:
- We load the saved model weights from each epoch.
- We calculate the accuracy on both the training and testing datasets.
- We print the accuracy for each class and the overall accuracy.

In [None]:
# Accuracy of each model
accuracy = []

# Testing various models
for idx in range(4):
    model = models[idx][0]
    print(f"\n{models[idx][1]}\n")

    # List of accuracy per epoch with items: [train_accuracy, test_accuracy]
    accuracy_model = []

    for epoch in range(1, n_epochs+1):
        print(f"\n*** Epoch number: {epoch} ***\n")
        # Load the model
        model.load_state_dict(torch.load(f'{models[idx][1]}_{epoch}.pt'))
        # Change model into evaluation mode
        model.eval()

        # Specify loss function (categorical cross-entropy)
        criterion = nn.CrossEntropyLoss()

        # List of current epoch's accuracies
        epoch_accuracy = []

        # Switch between the training and testing loaders
        word_list = ["Train", "Test"]
        for idx_word, testing_data in enumerate([train_loader, test_loader]):

            ###################
            # Test the model #
            ###################
            test_loss = 0.0
            class_correct = list(0. for i in range(10))
            class_total = list(0. for i in range(10))

            for data, target in testing_data:
                # Move tensors to GPU if CUDA is available
                if train_on_gpu:
                    data, target = data.cuda(), target.cuda()
                # Forward pass: compute predicted outputs by passing inputs to the model
                output = model(data)
                # Calculate the batch loss
                loss = criterion(output, target)
                # Update test loss
                test_loss += loss.item()*data.size(0)
                # Convert output probabilities to predicted class
                _, pred = torch.max(output, 1)
                # Compare predictions to true label
                correct_tensor = pred.eq(target.data.view_as(pred))
                correct = np.squeeze(correct_tensor.numpy()) if not train_on_gpu else np.squeeze(correct_tensor.cpu().numpy())
                # Calculate test accuracy for each object class
                for i in range(min(batch_size, len(target.data))):
                    label = target.data[i]
                    class_correct[label] += correct[i].item()
                    class_total[label] += 1

            # Average test loss
            test_loss = test_loss/len(test_loader.dataset)
            print(f'--- {word_list[idx_word]} Loss: {test_loss:.6f} ---\n')

            for i in range(10):
                if class_total[i] > 0:
                    print(f'{word_list[idx_word]} Accuracy of {classes[i]}: {100 * class_correct[i] / class_total[i]:2d}% ({int(np.sum(class_correct[i]))}/{int(np.sum(class_total[i]))})')
                else:
                    print(f'{word_list[idx_word]} Accuracy of {classes[i]}: N/A (no training examples)')

            # Define the current overall accuracy and add it to "epoch_accuracy"
            overall_accuracy = 100. * np.sum(class_correct) / np.sum(class_total)
            epoch_accuracy.append(overall_accuracy)

            print(f'\n ### {word_list[idx_word]} Accuracy (Overall): {overall_accuracy:2d}% ({int(np.sum(class_correct))}/{int(np.sum(class_total))}) ###\n')

        # Adding epoch accuracies to "accuracy_model"
        accuracy_model.append(epoch_accuracy)

    # Adding accuracy_model to "accuracy"
    accuracy.append(accuracy_model)


## **8. Plotting Model Accuracy**

After evaluating the models, we plot the training and testing accuracies over all epochs for each model. This allows us to visualize the performance of each model during the training process.

In [None]:
# Iterating through models' accuracies
for idx, model_acc in enumerate(accuracy):
    print(models[idx][1])

    # Extracting train and test accuracies
    epoch_train_acc = [i[0] for i in model_acc]
    epoch_test_acc = [i[1] for i in model_acc]

    # Plotting accuracy over epochs
    plt.plot([i for i in range(1, n_epochs + 1)], epoch_train_acc, label='train')
    plt.plot([i for i in range(1, n_epochs + 1)], epoch_test_acc, label='test')
    plt.legend()
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy [%]')
    plt.title(f'Accuracy - {models[idx][1]}')
    plt.show()

    # Displaying final and maximum accuracies
    print(f"Final train accuracy: {epoch_train_acc[-1]}")
    print(f"Final test accuracy: {epoch_test_acc[-1]}\n")
    print(f"Max train accuracy: {max(epoch_train_acc)} at epoch number {epoch_train_acc.index(max(epoch_train_acc))+1}")
    print(f"Max test accuracy: {max(epoch_test_acc)} at epoch number {epoch_test_acc.index(max(epoch_test_acc))+1}\n")


## **9. Visualizing Model Predictions**

Finally, we visualize the predictions of the models on a batch of test images. We show the true labels and predicted labels for each image, and color the labels green if the prediction is correct or red if it is incorrect.

In [None]:
# Obtain one batch of test images
dataiter = iter(test_loader)
images, labels = next(dataiter)

# Move model inputs to CUDA if GPU is available
if train_on_gpu:
    images = images.cuda()

# Testing various models
for i in range(3):
    if i < 3:
        # Defining the wanted model
        model = models[i][0]
        print(models[i][1])

        # Get sample outputs
        output = model(images)
        # Convert output probabilities to predicted class
        _, preds_tensor = torch.max(output, 1)
        preds = np.squeeze(preds_tensor.numpy()) if not train_on_gpu else np.squeeze(preds_tensor.cpu().numpy())

        # Display test images with labels
        for idx in range(6):
            plt.subplot(2, 3, idx+1)
            plt.imshow(images[idx].cpu().numpy().transpose((1, 2, 0)))
            plt.title(f"True: {classes[labels[idx]]}\nPred: {classes[preds[idx]]}", color="green" if preds[idx] == labels[idx] else "red")
            plt.xticks([])
            plt.yticks([])

        plt.show()
