# Variational Autoencoders


## Autencoders

Autoencoders are a type of neural network that is trained to reconstruct its input. They are often used for dimensionality reduction, anomaly detection, and generative modeling.

The basic architecture of an autoencoder consists of two main components:

Encoder: The encoder is a neural network that takes the input data and maps it to a lower-dimensional latent space. The encoder is typically a feedforward neural network with one or more hidden layers.
Decoder: The decoder is a neural network that takes the latent representation and maps it back to the original input space. The decoder is also typically a feedforward neural network with one or more hidden layers.

<img src="../images/ae.png" alt="Autoencoder" width="600"/>

## Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) are a type of deep learning model that is used for unsupervised learning and dimensionality reduction. They are a combination of an encoder and a decoder, and are trained to learn a probabilistic mapping between the input data and a lower-dimensional latent space.

The main goal of a VAE is to learn a probabilistic representation of the input data, which can be used for tasks such as:

1. Dimensionality reduction: VAEs can be used to reduce the dimensionality of high-dimensional data, such as images or text, to a lower-dimensional representation that is easier to work with.
2. Anomaly detection: VAEs can be used to detect anomalies or outliers in the data by identifying points that are farthest from the mean of the latent space.
3. Generative modeling: VAEs can be used to generate new data samples that are similar to the training data, by sampling from the latent space and passing the samples through the decoder.
4. Data imputation: VAEs can be used to impute missing values in the data by learning a probabilistic model of the data and using it to predict the missing values.

The architecture of a VAE typically consists of two main components:

<img src="../images/vae.png" alt="Variational Autoencoder" width="600"/>


1. Encoder: The encoder is a neural network that takes the input data and maps it to a lower-dimensional latent space. The encoder is trained to minimize the reconstruction error between the input data and the reconstructed data.
2. Decoder: The decoder is a neural network that takes the latent representation and maps it back to the original input space. The decoder is also trained to minimize the reconstruction error between the input data and the reconstructed data.

The key innovation of VAEs is the use of a probabilistic approach to learn the mapping between the input data and the latent space. Specifically, the encoder is trained to learn a probabilistic distribution over the latent space, and the decoder is trained to learn a probabilistic mapping from the latent space to the input space.

VAEs have several advantages over other dimensionality reduction techniques, such as PCA or t-SNE. For example:

1. VAEs can learn complex, non-linear relationships between the input data and the latent space.
2. VAEs can learn to capture high-level features of the data, such as shapes or textures, rather than just low-level features such as edges or lines.
3. VAEs can be used for both dimensionality reduction and generative modeling, making them a versatile tool for a wide range of applications.

VAEs are a powerful tool for unsupervised learning and dimensionality reduction, and have been successfully applied to a wide range of applications, including computer vision, natural language processing, and recommender systems.



In [2]:
!pip install torchvision



In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms


In [4]:
# -------------------------------
# Hyperparameters
# -------------------------------
batch_size = 128
latent_dim = 20   # dimension of the latent space
epochs = 5
learning_rate = 1e-3
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


In [5]:

# -------------------------------
# Dataset and Dataloader
# -------------------------------
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='.', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100.0%


Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100.0%


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


100.0%


Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100.0%

Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw






In [6]:
# -------------------------------
# VAE Model Definition
# -------------------------------
class VAE(nn.Module):
    def __init__(self, latent_dim=20):
        super(VAE, self).__init__()
        # Encoder: takes in [batch, 1, 28, 28], produces parameters of q(z|x)
        # Flatten: 28 * 28 = 784
        self.encoder = nn.Sequential(
            nn.Linear(784, 400),
            nn.ReLU()
        )

        self.fc_mu = nn.Linear(400, latent_dim)
        self.fc_logvar = nn.Linear(400, latent_dim)

        # Decoder: takes in z and produces parameters of p(x|z)
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 400),
            nn.ReLU(),
            nn.Linear(400, 784),
            nn.Sigmoid()  # output pixel values between 0 and 1
        )

    def encode(self, x):
        # x: [batch, 1, 28, 28]
        x = x.view(-1, 784)  # flatten
        h = self.encoder(x)  # [batch, 400]
        mu = self.fc_mu(h)   # [batch, latent_dim]
        logvar = self.fc_logvar(h)  # [batch, latent_dim]
        return mu, logvar

    def reparameterize(self, mu, logvar):
        # reparameterization trick: z = mu + sigma * epsilon
        std = torch.exp(0.5 * logvar)
        epsilon = torch.randn_like(std)
        z = mu + std * epsilon
        return z

    def decode(self, z):
        # z: [batch, latent_dim]
        x_recon = self.decoder(z) # [batch, 784]
        x_recon = x_recon.view(-1, 1, 28, 28)
        return x_recon

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        x_recon = self.decode(z)
        return x_recon, mu, logvar



In [7]:
# Instantiate model and optimizer
model = VAE(latent_dim=latent_dim).to(device)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# -------------------------------
# ELBO Loss Function
# -------------------------------
def loss_function(recon_x, x, mu, logvar):
    # Reconstruction loss: binary cross-entropy
    # Treats the reconstructed image as a Bernoulli distribution parameter
    recon_loss = nn.functional.binary_cross_entropy(
        recon_x.view(-1, 784),
        x.view(-1, 784),
        reduction='sum'
    )

    # KL Divergence: KL(q(z|x) || p(z)) =
    # 0.5 * sum( exp(logvar) + mu^2 - 1 - logvar )
    kl_div = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

    return recon_loss + kl_div


In [8]:
# -------------------------------
# Training Loop
# -------------------------------
model.train()
for epoch in range(1, epochs + 1):
    total_loss = 0
    for batch_idx, (data, _) in enumerate(train_loader):
        data = data.to(device)

        optimizer.zero_grad()
        recon_data, mu, logvar = model(data)
        loss = loss_function(recon_data, data, mu, logvar)
        loss.backward()
        total_loss += loss.item()
        optimizer.step()

    avg_loss = total_loss / len(train_loader.dataset)
    print(f"Epoch {epoch}/{epochs}, Loss: {avg_loss:.4f}")


Epoch 1/5, Loss: 166.3181
Epoch 2/5, Loss: 121.7623
Epoch 3/5, Loss: 114.6449
Epoch 4/5, Loss: 111.7185
Epoch 5/5, Loss: 110.0456


In [9]:

# After training, we can sample from the model by sampling z ~ N(0,I) and decoding:
model.eval()
with torch.no_grad():
    z = torch.randn(batch_size, latent_dim).to(device)
    sample = model.decode(z)
    # 'sample' now
