# 4. Variational AutoEncoder Demo (FC)

### About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.1 (29/08/2023)

**Requirements:**
- Python 3 (tested on v3.11.4)
- Matplotlib (tested on v3.7.2)
- Numpy (tested on v1.25.2)
- Torch (tested on v2.0.1+cu118)
- Torchvision (tested on v0.15.2+cu118)
- We also strongly recommend setting up CUDA on your machine! (At this point, honestly, it is almost mandatory).

### Imports and CUDA

In [1]:
# Imports
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from torchvision.utils import save_image
import matplotlib.pyplot as plt

In [2]:
# CUDA check
CUDA = True
device = "cuda" if (torch.cuda.is_available() and CUDA) else "cpu"
print(torch.cuda.is_available())
print(device)

True
cuda


### Dataset and dataloaders

As seen many times before...

In [3]:
# Data Preprocessing
# - ToTensor
# - Image Normalization
transform = transforms.Compose([transforms.ToTensor()])

In [4]:
# Train datasets/dataloaders
train_set = torchvision.datasets.MNIST(root='./data', \
                                       train = True, \
                                       download = True, \
                                       transform = transform)
train_loader = torch.utils.data.DataLoader(train_set, \
                                           batch_size = 32, \
                                           shuffle = False)

In [5]:
# Test datasets/dataloaders
test_set = torchvision.datasets.MNIST(root = './data', \
                                      train = False, \
                                      download = True, \
                                      transform = transform)
test_loader = torch.utils.data.DataLoader(test_set, \
                                          batch_size = 4, \
                                          shuffle = False)

### Model

Conv2d and ConvTranspose2d layers used for encoder and decoder parts, linear layers for mean and variance computation and sampling method added.

In [6]:
# Define Variational AutoEncoder Model for MNIST
class MNIST_VAE(nn.Module):
    
    def __init__(self, image_channels, init_channels, kernel_size, latent_dim):
        super().__init__()
 
        # Encoder with stacked Conv
        self.enc1 = nn.Conv2d(image_channels, init_channels, kernel_size, \
                              stride = 2, padding = 1)
        self.enc2 = nn.Conv2d(init_channels, init_channels*2, kernel_size, \
                              stride = 2, padding = 1)
        self.enc3 = nn.Conv2d(init_channels*2, init_channels*4, kernel_size, \
                              stride = 2, padding = 1)
        self.enc4 = nn.Conv2d(init_channels*4, 64, kernel_size, \
                              stride = 2, padding = 0)
        
        # FC layers for learning representations
        self.fc1 = nn.Linear(64, 128)
        self.fc_mu = nn.Linear(128, latent_dim)
        self.fc_log_var = nn.Linear(128, latent_dim)
        self.fc2 = nn.Linear(latent_dim, 64)
        
        # Decoder, simply mirroring the encoder with ConvTranspose
        self.dec1 = nn.ConvTranspose2d(64, init_channels*8, kernel_size, \
                                       stride = 1, padding = 0)
        self.dec2 = nn.ConvTranspose2d(init_channels*8, init_channels*4, kernel_size, \
                                       stride = 2, padding = 1)
        self.dec3 = nn.ConvTranspose2d(init_channels*4, init_channels*2, kernel_size, \
                                       stride = 2, padding = 1)
        self.dec4 = nn.ConvTranspose2d(init_channels*2, image_channels, kernel_size, \
                                       stride = 2, padding = 1)
        
        
    def sample(self, mu, log_var):
        """
        mu: mean from the encoder's latent space
        log_var: log variance from the encoder's latent space
        """
        
        # Standard deviation
        std = torch.exp(0.5*log_var)
        
        # randn_like is used to produce a vector with same dimensionality as std
        eps = torch.randn_like(std)
        
        # Sampling
        sample = mu + (eps * std)
        return sample
    
    
    def forward(self, x):
        
        # Encoder
        x = F.relu(self.enc1(x))
        x = F.relu(self.enc2(x))
        x = F.relu(self.enc3(x))
        x = F.relu(self.enc4(x))
        
        # Pooling
        batch, _, _, _ = x.shape
        x = F.adaptive_avg_pool2d(x, 1).reshape(batch, -1)
        
        # FC layers to get mu and log_var (mean and log-variance)
        hidden = self.fc1(x)
        mu = self.fc_mu(hidden)
        log_var = self.fc_log_var(hidden)
        
        # Get the latent vector through reparameterization
        z = self.sample(mu, log_var)
        z = self.fc2(z)
        z = z.view(-1, 64, 1, 1)
 
        # Decoding
        x = F.relu(self.dec1(z))
        x = F.relu(self.dec2(x))
        x = F.relu(self.dec3(x))
        x = torch.sigmoid(self.dec4(x))
        return x, mu, log_var

### Training function?

Open question: how would we train such a model?