
# Chapter 2: Core Architectures and Training

This notebook covers fundamental generative architectures including:
- Autoencoders (AEs)
- Variational Autoencoders (VAEs)
- Generative Adversarial Networks (GANs)
- Transformers and Attention Mechanisms

## Learning Objectives

- Implement autoencoders and VAEs in PyTorch
- Understand GAN architecture and training loop
- Explore transformer structure: encoder, decoder, and attention
- Use self-attention and multi-head attention



## Autoencoders

Autoencoders compress data into a latent space and reconstruct the original input.


In [None]:

import torch
from torch import nn

class Autoencoder(nn.Module):
    def __init__(self, input_dim=784, hidden_dim=128):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(hidden_dim, input_dim),
            nn.Sigmoid()
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded



## Variational Autoencoders (VAEs)

VAEs encode the input into a probabilistic latent space and allow for sampling from the distribution.


In [None]:

class VAE(nn.Module):
    def __init__(self, input_dim=784, latent_dim=2):
        super(VAE, self).__init__()
        self.encoder = nn.Sequential(nn.Linear(input_dim, 400), nn.ReLU())
        self.fc_mu = nn.Linear(400, latent_dim)
        self.fc_logvar = nn.Linear(400, latent_dim)
        self.decoder = nn.Sequential(nn.Linear(latent_dim, 400), nn.ReLU(), nn.Linear(400, input_dim), nn.Sigmoid())

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def forward(self, x):
        h1 = self.encoder(x)
        mu, logvar = self.fc_mu(h1), self.fc_logvar(h1)
        z = self.reparameterize(mu, logvar)
        return self.decoder(z), mu, logvar

def vae_loss(reconstructed, original, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(reconstructed, original, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD



## Generative Adversarial Networks (GANs)

GANs consist of two neural networks:
- Generator: produces fake data
- Discriminator: classifies real vs. fake data

They train through adversarial loss.


In [None]:

class Generator(nn.Module):
    def __init__(self, input_dim=100, output_dim=784):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, 256), nn.ReLU(), nn.Linear(256, output_dim), nn.Tanh()
        )

    def forward(self, x):
        return self.model(x)

class Discriminator(nn.Module):
    def __init__(self, input_dim=784):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, 256), nn.LeakyReLU(0.2), nn.Linear(256, 1), nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)



## Transformers and Attention

Transformers rely on **self-attention** mechanisms and are widely used in NLP and multimodal generation.

A minimal transformer model includes:
- Positional encoding
- Multi-head attention
- Feed-forward layers

We'll use Hugging Face Transformers to demonstrate text generation.


In [None]:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

input_text = "The transformer architecture is"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=40)

print("Generated Text:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))



## Attention Calculation (Self-Attention)

The attention score between tokens is computed using the formula:

\[
\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
\]

Where:
- Q = Queries
- K = Keys
- V = Values
- \( d_k \) = dimension of key vectors

Use this in a transformer for capturing context between tokens.



## Exercises

1. Implement a training loop for VAE using MNIST dataset.
2. Implement the GAN training loop and generate digits.
3. Explore encoder-decoder attention using a translation model.
4. Visualize attention weights in a transformer layer.

## References

- PyTorch VAE Example: https://github.com/pytorch/examples/tree/main/vae
- DCGAN Tutorial: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
- Hugging Face Transformers: https://huggingface.co/docs/transformers
