Assignment 9: Convolutional Auto-Encoder
========================================


Microsoft Forms Document: https://forms.office.com/r/ugv3L3jv8i

The task of this assignment is to compute a valuable deep feature representation for the handwritten digits of the MNIST dataset, without making use of their labels.
For this purpose, we implement a convolutional auto-encoder that learns a $K=10$-dimensional deep feature representation of each digit.
This representation can then be used to reconstruct images using the decoder part.

Task 1: Datasets
----------------

We will make use of the default implementations of the MNIST dataset.
As usual, we will need the training and validation set splits of MNIST, including data loaders.
The batches of the training set should be of size $B=32$, validation set batches should contain 100 samples.


In [None]:
import torch
import torchvision

# training set and data loader
train_set = ...
train_loader = ...

# validation set and data loader
validation_set = ...
validation_loader = ...

Task 2: Encoder Network
-----------------------

For the encoder network, we will rely on a similar implementation from the last exercise, which is a convolutional network with two convolutional and one fully-connected layers.
The output of the encoder network determines the deep feature representation, which we will define to be $K=10$-dimensional.

There is one main difference to the network from Assignment 8, which is the way we perform our down sampling.
Instead of choosing a maximum pooling layer, we use a stride of 2 in our convolutions.
The ReLU activation function should be applied after each convolution.

In [None]:
class Encoder (torch.nn.Module):
  def __init__(self, Q1, Q2, K):
    # call base class constrcutor
    super(Encoder,self).__init__()
    # convolutional define layers
    self.conv1 = ...
    self.conv2 = ...
    # activation functions will be re-used for the different stages
    self.act = ...
    # define fully-connected layers
    self.flatten = ...
    self.fc = ...

  def forward(self, x):
    # get the deep feature representation
    deep_feature = ...
    return deep_feature

Task 3: Decoder Network
-----------------------

The decoder network will obtain a deep feature representation as resulting from the encoder network.
It will learn to undo all the steps from the encoder, in order to produce an image that is of comparable size as the original images.
For this purpose, we require our decoder network to have:

* one fully-connected layer that produces the same number of outputs as are the input of the fully-connected layer of the encoder
* we apply the ReLU activation function
* then, the batch must be reshaped to the same dimensionality as the output of the `conv2` layer of the encoder
* we apply a fractionally-strided convolution using the `torch.nn.ConvTranspose2d` that uses the same parameters as the `conv2` layer of the encoder; you might need to adapt the `output_padding`
* we apply the ReLU activation function
* we apply a fractionally-strided convolution using the same parameters as the `conv1` layer of the encoder; `output_padding` might be required to be applied

Finally, the goal is to have the output to be restricted between 0 and 1.
Think of possible ways of doing that, and apply the way that seems most reasonable.

Implement a network class that provides the required functionality.
Implement both a constructor `__init__` and a `forward` function.


In [None]:
class Decoder (torch.nn.Module):
  def __init__(self, Q1, Q2, K):
    # call base class constrcutor
    super(Decoder,self).__init__()
    # fully-connected layer
    self.fc = ...
    # convolutional layers
    self.deconv1 = ...
    self.deconv2 = ...
    # activation function
    self.act = ...

  def forward(self, x):
    # reconstruct the output image
    output = ...
    return output

Task 4: Joint Auto-Encoder Network
-----------------------------------

Implement an auto-encoder network that includes bot the encoder and the decoder.
Implement both a constructor `__init__` and a `forward` function.


In [None]:
class AutoEncoder(torch.nn.Module):
  def __init__(self, Q1, Q2, K):
    super(AutoEncoder,self).__init__()
    self.encoder = ...
    self.decoder = ...

  def forward(self,x):
    # encode input
    deep_feature = ...
    # decode to output
    reconstructed = ...
    return reconstructed

Test 1: Output Sizes
--------------------

Instantiate the auto-encoder network with $Q_1 = 32$, $Q_2 = 32$ and $K=10$.
Create an input $\mathbf X$ in the size that the `AutoEncoder` network requires.
Provide that input to the (untrained) encoder part of the auto-encoder network to extract the deep feature representation.
Check that the deep feature is in the desired size (K=10) 
Provide the deep feature to the (untrained) decoder part of the auto-encoder network.
Check that the output is of dimension $28\times28$, and its values are between 0 and 1.

In [None]:
# run on cuda device?
device = torch.device("cuda")
# create network
network = ...

# create or select a sample
...

# use encoder to encode image and check its size
...

# use decoder to generate an image and check its size and value range
...

Task 5: Training Loop
---------------------

To train the auto-encoder network, we will use the $L_2$ distance between the output and the input of the network as a loss function.
This loss function is implemented in `torch.nn.MSELoss`.

Since training an auto-encoder is tricky, we will make use of the Adam optimizer.
Choose a learning rate of $\eta=0.001$.

Implement the training loop for 10 epochs.
Compute the average training loss and validation loss and print them at the end of each epoch.

Note: If the training and validation loss do not decrease during training, try to reduce the learning rate (to $\eta=0.0005$ or even lower) and re-start the training.
You will need to re-initialize the network, too, i.e. by re-running the previous cell.

In [None]:
# Adam optimizer with appropriate learning rate
optimizer = ...
loss = ...

for epoch in range(10):
  # evaluate average loss for training and validation set
  train_loss = validation_loss = 0.

  for x,_ in train_loader:
    # compute network output
    y = ...
    # compute loss between output and input
    J = ...
    # perform update
    ...
    # accumulate loss
    train_loss += ...


  # compute validation loss
  with torch.no_grad():
    for x,t in validation_loader:
      # compute network output
      y = ...
      # compute loss
      J = ...
      # accumulate loss
      validation_loss += ...


  # print average loss for training and validation
  print(f"\rEpoch {epoch}; train: {train_loss/len(train_set):1.5f}, val: {validation_loss/len(validation_set):1.5f}")

Task 6: Reconstruction Result
-----------------------------

Now we want to see if we can reconstruct images from their originals.
For this purpose, we select the first batch of our validation set images that contains 100 samples.
We forward this batch through our auto-encoder network and plot the reconstructed samples next to the original samples.

We will plot all the samples into a single plot, where we have 10 rows, each of which containing 10 pairs of original and reconstructed samples.

In [None]:
# get first validation set batch
input, _ = next(iter(validation_loader))

# compute outputs for all samples
output = ...

# plot images
from matplotlib import pyplot
pyplot.rcParams['image.cmap'] = 'gray'

pyplot.figure(figsize = (20,10))
for i in range(10):
  for j in range(10):
    pyplot.subplot(10, 20, i*20+2*j+1)
    pyplot.imshow(...)
    pyplot.axis("off")
    pyplot.subplot(10, 20, i*20+2*j+2)
    pyplot.imshow(...)
    pyplot.axis("off")

Task 7: Mean Vector per Class
-----------------------------

To see if the network has learned a reasonable representation for our 10 digits, we extract the mean deep feature vectors for each of the 10 classes.
We forward all samples of our validation set through the encoder part of our trained auto-encoder network, and compute a class-wise average of the deep features.

In [None]:
means = ...

with torch.no_grad():
  # compute means
  for x, t in validation_loader:
    # extract deep features from encoder
    deep_features = ...
    # accumulate deep features for each class
    ...

# compute means
...

assert means.shape == (10, 10)

Task 8: Decode Mixtures of Classes
----------------------------------

For each pair of class indexes, we compute the average of the deep feature representations of these two classes.
This results in a total of $10*10=100$ deep feature representations.

We use the decoder part of our trained auto-encoder network to reconstruct images from the deep feature representations.
We plot them in a grid of size $10*10$.
Note that the diagonal represents the reconstruction of the mean deep features for all classes, while non-diagonal elements show mixtures of two classes.

In [None]:
# compute mixtures of each two classes  
mixtures = ...
# use network decoder to generate images
images = ...

pyplot.figure(figsize=(10,10))
#  and plot
for o1 in range(10):
  for o2 in range(10):
    pyplot.subplot(10,10,o1*10+o2+1)
    pyplot.imshow(...)
    pyplot.axis("off")