<a href="https://colab.research.google.com/github/SzymonNowakowski/Machine-Learning-2024/blob/master/Lab11-autoencoders.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 11 - Autoencoders
### Author: Szymon Nowakowski


# Introduction
--------------

Autoencoders can be thought of as nonlinear extensions of PCA. In this class, we’ll train an autoencoder on the MNIST dataset and compare its encoded representation to the PCA space we constructed earlier (remember our very first class?). This comparison will help us see whether the autoencoder captures the structure of the data more effectively.

Next, we’ll put the trained autoencoder to practical use: image denoising.

You’ll also notice that throughout this session, we’re treating the images in a class-diagnostic, unsupervised manner—focusing on the structure of the data itself, rather than on labels.

# Reading MNIST Dataset
----------------------------------

In [None]:
import torch
import torchvision
from matplotlib import pyplot

transform = torchvision.transforms.Compose(
    [ torchvision.transforms.ToTensor(), #Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
      torchvision.transforms.Normalize((0.1307), (0.3081))])

trainset = torchvision.datasets.MNIST(root='./data',
                                      train=True,
                                      download=True,
                                      transform=transform)

trainloader = torch.utils.data.DataLoader(trainset,
                                          batch_size=2048,
                                          shuffle=True)   #we do shuffle it to give more randomizations to training epochs

testset = torchvision.datasets.MNIST(root='./data',
                                     train=False,
                                     download=True,
                                     transform=transform)

testloader = torch.utils.data.DataLoader(testset,
                                         batch_size=1,
                                         shuffle=False)

# Tensor Sizes
-------------------

Recall:
- Batched labels are of order one. The first (and only) index is a sample index within a batch.
- Image batches have order 4. The first index is a sample index within a batch, but a second index has size 1 and thus it is always 0.
  - This index represents a Channel number inserted here by `ToTensor()` transformation, always 0.
  - It should be retained because we want to use convolutional layers, which explicitly require this order. For RGB images we have 3 channels, for B&W images we have only one channel.


# Encoder and Decoder Networks
-----------------

Autoencoder is an Encoder followed by a Decoder.

Both components are typically CNN neural networks.

We will loosely base their structure on LeNet5 neural network. You can find the definition of LeNet5 [here](https://en.wikipedia.org/wiki/LeNet#/media/File:Comparison_image_neural_networks.svg).


In [None]:
import torch.nn as nn
import torch.nn.functional as F

class Encoder(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()

        # Convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)      # 1x28x28 -> 6x24x24
        self.pool1 = nn.AvgPool2d(kernel_size=2, stride=2)                        # 6x24x24 -> 6x12x12

        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)     # 6x12x12 -> 16x8x8
        self.pool2 = nn.AvgPool2d(kernel_size=2, stride=2)                        # 16x8x8  -> 16x4x4

        self.conv3 = nn.Conv2d(in_channels=16, out_channels=120, kernel_size=4)   # 16x4x4 ->  120x1x1

        # Fully connected layers
        self.fc1 = nn.Linear(in_features=120, out_features=84)
        self.fc2 = nn.Linear(in_features=84, out_features=num_classes)

        # Optional dropout
        self.dropout = nn.Dropout(0.05)

    def forward(self, x):
        # Convolutional feature extraction
        x = F.relu(self.conv1(x))
        x = self.pool1(x)

        x = F.relu(self.conv2(x))
        x = self.pool2(x)

        x = F.relu(self.conv3(x))  # Output: (batch_size, 120, 1, 1)

        x = x.view(x.size(0), -1)  # Flatten to (batch_size, 120)

        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        #x = self.dropout(x)

        return x


# Training Loop
----------------------

In [None]:
# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Working on {device}")

net = LeNet5().to(device)
optimizer = torch.optim.Adam(net.parameters(), 0.001)   #initial and fixed learning rate of 0.001.

net.train()    #it notifies the network layers (especially batchnorm or dropout layers, which we don't use in this example) that we are doing traning
for epoch in range(16):  #  an epoch is a training run through the whole data set

    loss = 0.0
    for batch, data in enumerate(trainloader):
        batch_inputs, batch_labels = data

        batch_inputs = batch_inputs.to(device)  #explicitly moving the data to the target device
        batch_labels = batch_labels.to(device)

        optimizer.zero_grad()

        batch_outputs = net(batch_inputs)   #this line calls the forward(self, x) method of the LeNet5 object. Please note,
                                            # the nonlinear activation after the last layer is NOT applied
        loss = torch.nn.functional.cross_entropy(batch_outputs, batch_labels, reduction = "mean") #instead, nonlinear softmax is applied internally in THIS loss function
        print("epoch:", epoch, "batch:", batch, "current batch loss:", loss.item())
        loss.backward()       #this computes gradients as we have seen in previous workshops
        optimizer.step()     #but this line in fact updates our neural network.
                                ####You can experiment - comment this line and check, that the loss DOE

Working on cuda
epoch: 0 batch: 0 current batch loss: 2.307467222213745
epoch: 0 batch: 1 current batch loss: 2.2976996898651123
epoch: 0 batch: 2 current batch loss: 2.2881007194519043
epoch: 0 batch: 3 current batch loss: 2.28049373626709
epoch: 0 batch: 4 current batch loss: 2.2679836750030518
epoch: 0 batch: 5 current batch loss: 2.256859302520752
epoch: 0 batch: 6 current batch loss: 2.2422878742218018
epoch: 0 batch: 7 current batch loss: 2.221681833267212
epoch: 0 batch: 8 current batch loss: 2.199702739715576
epoch: 0 batch: 9 current batch loss: 2.1724071502685547
epoch: 0 batch: 10 current batch loss: 2.1372950077056885
epoch: 0 batch: 11 current batch loss: 2.0963642597198486
epoch: 0 batch: 12 current batch loss: 2.0645651817321777
epoch: 0 batch: 13 current batch loss: 2.0240578651428223
epoch: 0 batch: 14 current batch loss: 1.9557939767837524
epoch: 0 batch: 15 current batch loss: 1.8908413648605347
epoch: 0 batch: 16 current batch loss: 1.821674108505249
epoch: 0 batch:

# Testing
----------------------

In [None]:
good = 0
wrong = 0

net.eval()              #it notifies the network layers (especially batchnorm or dropout layers, which we don't use in this example) that we are doing evaluation
with torch.no_grad():   #it prevents that the net learns during evalution. The gradients are not computed, so this makes it faster, too
    for batch, data in enumerate(testloader): #batches in test are of size 1
        datapoint, label = data

        prediction = net(datapoint.to(device))                  #prediction has values representing the "prevalence" of the corresponding class
        classification = torch.argmax(prediction)    #the class is the index of maximal "prevalence"

        if classification.item() == label.item():
            good += 1
        else:
            wrong += 1

print("accuracy = ", good/(good+wrong))

accuracy =  0.9778
