# Convolutional autoencoders

In this script we write a CNN autoencoder and apply it to the task of image desnoising.

In [None]:
import torch
import torch.nn as nn
from torchvision import transforms, datasets
from torch.utils.data import DataLoader, Subset
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
import copy

# Define the data repository
data_dir = 'data/'

In [None]:
# Initialization function
def init_weights(m):
    if isinstance(m, nn.Linear) or isinstance(m, nn.Conv2d):
        torch.nn.init.xavier_uniform_(m.weight.data)
        if m.bias is not None:
            m.bias.data.fill_(0.01)
    return

## Transposed convolution

As seen in the previous lab, convolution usually reduces the size of the image. However, in some applications (e.g., image synthesis from low-dimension features) it is usefull to increase it. That's notably needed for autoencoders, since after projecting the data into a compact latent representation, we need to expand this representation back into the image space.

This is exactly what [transposed convolution](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html#torch.nn.ConvTranspose2d) does. Simply put, a transposed convolution adds some zeros inside the input image (and on the edge) to artificially increase the size.

Convolution             |  Transposed convolution
:-------------------------:|:-------------------------:
![](https://miro.medium.com/max/294/1*BMngs93_rm2_BpJFH2mS0Q.gif)  |  ![](https://miro.medium.com/max/395/1*Lpn4nag_KRMfGkx1k6bV-g.gif)

<center><a href="https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d">Source</a></center>

On the left, we use convolution with a kernel size of 3 and stride of 2. On the right, we then use a transpoed convolution with the same parameters, and its effect is to procude an image with the same size as the original.

In [None]:
# Load the provided example image as black and white, and normalize it so the values ranges in [0, 1]
image_np = io.imread('tdp.jpeg', as_gray=True)
print(image_np.shape)
image_np = image_np / np.max(image_np)

# Display the image
plt.imshow(image_np, cmap='gray')
plt.show()

# Convert the image into a pytorch float tensor (and unsqueeze it to the proper size)
image_t = torch.tensor(image_np).float().unsqueeze(0).unsqueeze(0)

NameError: name 'io' is not defined

In [None]:
num_channels_in = 1
num_channels_out = 1

# First, let us apply a convolution with a kernel size of (3, 6) and a stride of 4.
my_conv = nn.Conv2d(num_channels_in, num_channels_out, kernel_size=(3, 6), stride=4)
output = my_conv(image_t)
print('Original image: ', image_t.shape)
print('Output of the convolution : ', output.shape)

# Display the output after convolution
plt.imshow(output.detach().squeeze(), cmap='gray')
plt.show()

In [None]:
# Now we reproduce an image with the same size as the original input
my_convt = nn.ConvTranspose2d(num_channels_in, num_channels_out, kernel_size=(3, 6), stride=4)
image_convt = my_convt(output)
print('After applying transposed convolution : ', image_convt.shape)

# Display the output after transposed convolution
plt.imshow(image_convt.detach().squeeze(), cmap='gray')
plt.show()

As you can see, transposed convolution does not invert convolution (it's not "deconvolution"): it only guarantees that the size will be equal to that of the image before convolution (but not its pixel values).

## Dataset

In [None]:
# Load the MNIST dataset
data_transforms = transforms.Compose([transforms.ToTensor(),
                                      transforms.Normalize((0.1307,), (0.3081,))])
train_data = datasets.MNIST(data_dir, train=True, download=True, transform=data_transforms)
test_data = datasets.MNIST(data_dir, train=False, download=True, transform=data_transforms)
num_classes = len(train_data.classes)

# Take a subset of the train/test data
train_data = Subset(train_data, torch.arange(400))
test_data = Subset(test_data, torch.arange(50))

# Create dataloaders
batch_size = 8
train_dataloader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

## CNN autoencoder

As we saw it in lab 4, autoencoders are networks that project the input data into a low-dimension space, and then expand this compact representation back into the input space. In CNN autoencoders, the encoder uses convolutions to reduce the image size, while the decoder uses transposed convolutions to expand it. We propose the following architecture:

- The encoder consists of two layers, with a convolution function, a RELU and a max pooling. The convolution functions have a kernel size of 3 and a padding of 1. They use 16 and 4 output channels, respectively. The max pooling functions use a kernel size of 2.

- The decoder consists of two layers, with a transposed convolution and an activation function (RELU for the first layer, Sigmoid for the second layer). The transposed convolution functions have a kernel size of 2 and a stride of 2. They use 16 and 1 output channels, respectively.

In [None]:
# TO DO: write the CNN autoencoder module ('__init__' and 'forward' methods)

class CNNAutoencoder(nn.Module):
    def __init__(self):
        super(CNNAutoencoder, self).__init__()

    
    def forward(self, x):

        
        return y

NameError: name 'nn' is not defined

In [None]:
# Instantiate the CNN autoencoder, initialize it and print the number of parameters
model_cnn_ae = CNNAutoencoder()
torch.manual_seed(0)
model_cnn_ae.apply(init_weights)
print('Total number of parameters:', sum(p.numel() for p in model_cnn_ae.parameters()))

In [None]:
# Before training, make sure that the output of the autoencoder has the same shape as its input
image_batch_example = next(iter(train_dataloader))[0]
out = model_cnn_ae(image_batch_example)
print(image_batch_example.shape)
print(out.shape)

In [None]:
# TO DO: write the training function.
# It's similar to the MLP autoencoder (lab 4 - uses the ADAM optimizer), but images don't need to be vectorized


In [None]:
# TO DO:
# - define the trainig/optimizer parameters: 50 epochs, MSE loss function, learning_rate=0.001
# - train the model


In [None]:
# Vizualization: apply the autoencoder to a batch of images

# Load it
image_batch_example = next(iter(train_dataloader))[0]

# Pass it to the autoencoder 
predicted_batch_example = model_cnn_ae_tr(image_batch_example).detach()

# Plot the original and predicted images
for ib in range(batch_size):
    plt.figure()
    plt.subplot(1, 2, 1)
    plt.imshow(image_batch_example[ib, :].squeeze(), cmap='gray_r')
    plt.xticks([]), plt.yticks([])
    plt.title('Original image')
    plt.subplot(1, 2, 2)
    plt.imshow(predicted_batch_example[ib, :].squeeze(), cmap='gray_r')
    plt.xticks([]), plt.yticks([])
    plt.title('Predicted image')
    plt.show()

<span style="color:red">**Q3**</span> Put one of the plots above in your report.

## Denoising autoencoder

We now consider an application of autoencoders, which is image denoising. For this task, we can use the same CNN autoencoder model as before: the only thing that changes is the data on which it is trained.

<center><a href="https://en.wikipedia.org/wiki/Total_variation_denoising">
    <img src="https://upload.wikimedia.org/wikipedia/en/e/e8/ROF_Denoising_Example.png"></a></center>


In [None]:
# First, here is a function that adds some noise to an input image batch
def add_noise(inputs,noise_factor=0.8):
     noisy = inputs+torch.randn_like(inputs) * noise_factor
     noisy = torch.clamp(noisy,0.,1.)
     return noisy

# Add noise to the image_batch_example
noisy_images_batch = add_noise(image_batch_example)

# Plot the original and noisy images
for ib in range(batch_size):
    plt.figure()
    plt.subplot(1, 2, 1)
    plt.imshow(image_batch_example[ib, :].squeeze(), cmap='gray_r')
    plt.xticks([]), plt.yticks([])
    plt.title('Original image')
    plt.subplot(1, 2, 2)
    plt.imshow(noisy_images_batch[ib, :].squeeze(), cmap='gray_r')
    plt.xticks([]), plt.yticks([])
    plt.title('Noisy image')
    plt.show()

Since we don't want to create a new dataset, we will simply add noise to the images during training. Therefore, the training function should be the same as before, except we add noise to the images between passing them to the model. Then, we compute the loss between the (true) clean images and the output of the model.

In [None]:
# TO DO: write the training function for the denoising autoencoder


In [None]:
# TO DO:
# - instantiate CNN autoencoder and initialize its parameters
# - define the training parameters: 50 epochs, MSE loss function, learning_rate=0.001
# - train the model


In [None]:
# Vizualization: apply the denoising autoencoder to a batch of noisy images

# Pass it to the autoencoder 
predicted_batch_example = model_cnn_den_ae_tr(noisy_images_batch).detach()

# Plot the original and predicted images
for ib in range(batch_size):
    plt.figure()
    plt.subplot(1, 3, 1)
    plt.imshow(image_batch_example[ib, :].squeeze(), cmap='gray_r')
    plt.xticks([]), plt.yticks([])
    plt.title('Original image')
    plt.subplot(1, 3, 2)
    plt.imshow(noisy_images_batch[ib, :].squeeze(), cmap='gray_r')
    plt.xticks([]), plt.yticks([])
    plt.title('Noisy image')
    plt.subplot(1, 3, 3)
    plt.imshow(predicted_batch_example[ib, :].squeeze(), cmap='gray_r')
    plt.xticks([]), plt.yticks([])
    plt.title('Predicted image')
    plt.show()

<span style="color:red">**Q4**</span> Put one of the plots above in your report.