# Exercise: Undercomplete Autoencoder with Convolutional Layers

In this exercise, we will implement an undercomplete autoencoder using convolutional layers. We will use the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits.

In [1]:
from torchvision.datasets.mnist import MNIST
from torchvision.transforms import ToTensor, Compose, Normalize
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch
import numpy as np
import matplotlib.pyplot as plt

In [2]:
if torch.backends.cuda.is_built():
    device = torch.device('cuda')
elif torch.backends.mps.is_built():
    device = torch.device('mps')
else:
    device = torch.device('cpu')

print('Using device:', device)

Using device: cuda


Link to the description of the network.

3.1 Encoder Layers:
• encoder1: Convolutional layer with 1 input channel, 16 output channels, a kernel size of 3,
stride of 2, and padding of 1. Relu activation.
• encoder2: Convolutional layer with 16 input channels, 32 output channels, a kernel size of 3,
stride of 2, and padding of 1. Relu activation.
• encoder3: Convolutional layer with 32 input channels, 64 output channels, a kernel size of 7,
stride of 1, and no padding. Relu activation.
• encoder4: Fully connected (linear) layer reducing the dimensionality to z_size. No activa-
tion.
3.2 Decoder Layers:
• decoder1: Fully connected (linear) layer increasing the dimensionality from z_size to 64. Relu
activation.
• decoder2: Transposed convolutional layer with 64 input channels, 32 output channels, a kernel
size of 7, stride of 1, and no padding. Relu activation.
• decoder3: Transposed convolutional layer with 32 input channels, 16 output channels, a kernel
size of 3, stride of 2, padding of 1, and output padding of 1. Relu activation.
• decoder4: Transposed convolutional layer with 16 input channels, 1 output channel, a kernel
size of 3, stride of 2, padding of 1, and output padding of 1. Relu activation.
where z_size is a parameter of the model specifying the size of the embedding space

https://datacloud.di.unito.it/index.php/s/PdbYrWrBPs2tnac


In [3]:
mnist_train = MNIST('./data', download=True, transform=ToTensor())
mnist_test = MNIST('./data', download=True, train=False, transform=ToTensor())


mnist_train_loader = DataLoader(mnist_train, batch_size=64, shuffle=True)
mnist_test_loader = DataLoader(mnist_test, batch_size=64, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 16.3MB/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 486kB/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 4.51MB/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 3.11MB/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [None]:
https://datacloud.di.unito.it/index.php/s/PdbYrWrBPs2tnac

class Model(torch.nn.Module):
    def __init__(self, z_size=3):
       pass

    def encode(self, x):
        pass

    def decode(self, z):
        pass

    def forward(self, x):
        pass

    # Genrate a new random sample drawing representations form a uniform distribution
    # and decoding them
    def generate(self):
        pass


Training is nothing special and will be done with the Adam optimizer and MSE loss.

In [None]:
model = Model(z_size=10).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# loss function
def loss_function(x_hat, x):
    pass

# Here the code to train the model
# ...

# Plotting one reconstructed and one generated image

In [None]:
img = model(mnist_test[0][0].to(device)).clip(0, 1).cpu().detach().numpy()
plt.imshow(img.reshape(28, 28), cmap="gray")

In [None]:
gen = model.generate().clip(0, 1).cpu().detach().numpy().reshape(28, 28)
plt.imshow(gen, cmap="gray")

# Plotting a few reconstructed images

In [None]:
plt.figure(figsize=(5, 5))
plt.gray()

imshape = (28, 28)
preds = []

for i in range(25):
    with torch.no_grad():
        preds.append(model(mnist_test[i][0].to(device)).cpu().float().numpy())

for i in range(25):
    ax = plt.subplot(5, 5, i + 1)
    ax.imshow(preds[i].reshape(imshape), cmap="gray")
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

plt.show()