# TP3: Generative Models - Face Generation with DCGAN

**Day 3 - AI for Sciences Winter School**

**Instructor:** Raphael Cousin

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/racousin/ai_for_sciences/blob/main/day3/tp3.ipynb)

---

## Objectives

By the end of this practical, you will understand:

1. **How GANs work**: The adversarial game between Generator and Discriminator
2. **DCGAN architecture**: Deep convolutional networks for image generation
3. **Training dynamics**: Observing the Generator improve over epochs
4. **Latent space**: How random noise maps to meaningful images

---

# Part 1: Understanding GANs

**Generative Adversarial Networks (GANs)** learn to generate new data by playing a game:

```
┌─────────────────┐                    ┌─────────────────┐
│    Generator    │  generates fake →  │  Discriminator  │  → Real or Fake?
│   (the artist)  │      images        │   (the critic)  │
└─────────────────┘                    └─────────────────┘
        ↑                                      ↓
   random noise                         feedback to improve
```

- **Generator**: Takes random noise, produces fake images. Goal: fool the Discriminator.
- **Discriminator**: Sees real and fake images. Goal: tell them apart.

As training progresses, both networks improve. The Generator learns to create increasingly realistic images to fool an increasingly sophisticated Discriminator.

**DCGAN** (Deep Convolutional GAN) uses convolutional layers for stable training on images.

## Setup

In [None]:
!pip install -q git+https://github.com/racousin/ai_for_sciences.git

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
import torchvision.transforms as transforms
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Set seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)

print("Setup complete!")

---

# Part 2: Data Preparation

We'll use the **CelebA dataset** - 200,000 celebrity face images. The faces are cropped, resized to 64x64, and normalized to [-1, 1].

In [None]:
# Configuration
image_size = 64
batch_size = 128

# Image transformations
transform = transforms.Compose([
    transforms.Resize(image_size),
    transforms.CenterCrop(image_size),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize to [-1, 1]
])

# Download CelebA dataset (this may take a few minutes the first time)
train_dataset = torchvision.datasets.CelebA(
    root='./data',
    split='train',
    download=True,
    transform=transform
)

train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=2,
    pin_memory=True
)

print(f'Training samples: {len(train_dataset):,}')
print(f'Batches per epoch: {len(train_loader):,}')
print(f'Image shape: {train_dataset[0][0].shape}')

In [None]:
# Visualize real samples
samples = next(iter(train_loader))[0][:64]
grid = make_grid(samples, nrow=8, normalize=True, value_range=(-1, 1))

plt.figure(figsize=(12, 12))
plt.imshow(grid.permute(1, 2, 0).cpu())
plt.title('Real CelebA Face Images', fontsize=16)
plt.axis('off')
plt.tight_layout()
plt.show()

### Question 1

1. Why do we normalize images to [-1, 1] instead of [0, 1]?
2. What would happen if we used much smaller images (e.g., 16x16)? What about larger (256x256)?

---

# Part 3: Building the DCGAN

DCGAN uses specific architectural guidelines for stability:

- **Convolutional layers** instead of fully connected layers
- **Batch normalization** in both Generator and Discriminator
- **ReLU** in Generator, **LeakyReLU** in Discriminator
- **Tanh** output for Generator, **Sigmoid** for Discriminator

## Generator Architecture

The Generator transforms a random noise vector (latent space) into an image:

```
Noise (100D) → 4x4 → 8x8 → 16x16 → 32x32 → 64x64 RGB
```

In [None]:
class Generator(nn.Module):
    """
    DCGAN Generator for 64x64 RGB images.
    
    Takes a latent vector and progressively upsamples it to a full image.
    Uses transposed convolutions ("deconvolutions") for upsampling.
    """
    def __init__(self, latent_dim=100, ngf=64):
        super().__init__()
        self.latent_dim = latent_dim

        self.main = nn.Sequential(
            # Input: latent_dim x 1 x 1
            nn.ConvTranspose2d(latent_dim, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # State: (ngf*8) x 4 x 4

            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # State: (ngf*4) x 8 x 8

            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # State: (ngf*2) x 16 x 16

            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # State: (ngf) x 32 x 32

            nn.ConvTranspose2d(ngf, 3, 4, 2, 1, bias=False),
            nn.Tanh()
            # Output: 3 x 64 x 64
        )

    def forward(self, noise):
        return self.main(noise)

## Discriminator Architecture

The Discriminator is the mirror image - it takes an image and outputs a probability (real or fake):

```
64x64 RGB → 32x32 → 16x16 → 8x8 → 4x4 → 1 (probability)
```

In [None]:
class Discriminator(nn.Module):
    """
    DCGAN Discriminator for 64x64 RGB images.
    
    Takes an image and outputs probability that it's real.
    Uses strided convolutions for downsampling.
    """
    def __init__(self, ndf=64):
        super().__init__()

        self.main = nn.Sequential(
            # Input: 3 x 64 x 64
            nn.Conv2d(3, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # State: (ndf) x 32 x 32

            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # State: (ndf*2) x 16 x 16

            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # State: (ndf*4) x 8 x 8

            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # State: (ndf*8) x 4 x 4

            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
            # Output: 1 x 1 x 1
        )

    def forward(self, image):
        return self.main(image).view(-1, 1)

In [None]:
# Initialize models
generator = Generator(latent_dim=100, ngf=64).to(device)
discriminator = Discriminator(ndf=64).to(device)

# Weight initialization (DCGAN paper recommendation)
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

generator.apply(weights_init)
discriminator.apply(weights_init)

print(f'Generator parameters: {sum(p.numel() for p in generator.parameters()):,}')
print(f'Discriminator parameters: {sum(p.numel() for p in discriminator.parameters()):,}')

### Question 2

1. Why does the Generator use ReLU while the Discriminator uses LeakyReLU?
2. Why is the final activation Tanh for the Generator and Sigmoid for the Discriminator?
3. What does `ngf=64` control in the architecture?

---

# Part 4: Training Setup

The GAN training objective:

- **Discriminator**: Maximize $\log D(x) + \log(1 - D(G(z)))$
  - Correctly classify real images as real
  - Correctly classify fake images as fake

- **Generator**: Maximize $\log D(G(z))$
  - Make the Discriminator think fake images are real

In [None]:
# Loss and optimizers
criterion = nn.BCELoss()
optimizer_g = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_d = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

# Fixed noise for consistent visualization across epochs
fixed_noise = torch.randn(64, generator.latent_dim, 1, 1).to(device)

print('Training setup complete!')
print(f'Fixed noise shape: {fixed_noise.shape}')

In [None]:
def train_epoch(generator, discriminator, loader, optimizer_g, optimizer_d, criterion, device):
    """Train for one epoch."""
    generator.train()
    discriminator.train()

    d_losses = []
    g_losses = []

    for real_images, _ in tqdm(loader, desc='Training'):
        batch_size = real_images.size(0)
        real_images = real_images.to(device)

        # Labels for real and fake
        real_labels = torch.ones(batch_size, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1).to(device)

        # ============================================
        # Train Discriminator
        # ============================================
        optimizer_d.zero_grad()

        # Loss on real images
        real_output = discriminator(real_images)
        d_loss_real = criterion(real_output, real_labels)

        # Loss on fake images
        noise = torch.randn(batch_size, generator.latent_dim, 1, 1).to(device)
        fake_images = generator(noise)
        fake_output = discriminator(fake_images.detach())  # detach to not update G
        d_loss_fake = criterion(fake_output, fake_labels)

        # Total discriminator loss
        d_loss = d_loss_real + d_loss_fake
        d_loss.backward()
        optimizer_d.step()

        # ============================================
        # Train Generator
        # ============================================
        optimizer_g.zero_grad()

        # Generate new fake images
        noise = torch.randn(batch_size, generator.latent_dim, 1, 1).to(device)
        fake_images = generator(noise)
        fake_output = discriminator(fake_images)

        # Generator wants discriminator to think fakes are real
        g_loss = criterion(fake_output, real_labels)
        g_loss.backward()
        optimizer_g.step()

        d_losses.append(d_loss.item())
        g_losses.append(g_loss.item())

    return np.mean(d_losses), np.mean(g_losses)

### Question 3

1. Why do we use `.detach()` when training the Discriminator on fake images?
2. Why does the Generator use `real_labels` (ones) even though it's generating fake images?

---

# Part 5: Training

Watch the generated faces improve progressively! Early epochs show blurry shapes, later epochs show realistic facial features.

In [None]:
epochs = 20
sample_interval = 2  # Show samples every N epochs

history = {'d_loss': [], 'g_loss': []}

for epoch in range(epochs):
    print(f'\n=== Epoch {epoch+1}/{epochs} ===')

    d_loss, g_loss = train_epoch(generator, discriminator, train_loader,
                                  optimizer_g, optimizer_d, criterion, device)

    history['d_loss'].append(d_loss)
    history['g_loss'].append(g_loss)

    print(f'D Loss: {d_loss:.4f} | G Loss: {g_loss:.4f}')

    # Generate samples at intervals
    if (epoch + 1) % sample_interval == 0 or epoch == 0:
        generator.eval()
        with torch.no_grad():
            fake_images = generator(fixed_noise)

        grid = make_grid(fake_images, nrow=8, normalize=True, value_range=(-1, 1))
        plt.figure(figsize=(10, 10))
        plt.imshow(grid.permute(1, 2, 0).cpu())
        plt.title(f'Generated Faces - Epoch {epoch+1}', fontsize=16)
        plt.axis('off')
        plt.tight_layout()
        plt.show()

print('\nTraining complete!')

In [None]:
# Plot training curves
plt.figure(figsize=(10, 5))
plt.plot(history['d_loss'], label='Discriminator Loss', linewidth=2)
plt.plot(history['g_loss'], label='Generator Loss', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Training Loss Over Time', fontsize=14)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Question 4

1. What happened to the generated images from epoch 1 to epoch 20?
2. Looking at the loss curves, what would indicate that training is stable?
3. What might happen if the Discriminator becomes too good too fast?

---

# Part 6: Generating New Faces

Now we can generate unlimited unique faces by sampling new random noise vectors!

In [None]:
def generate_faces(generator, num_samples=16):
    """Generate random faces."""
    generator.eval()
    with torch.no_grad():
        noise = torch.randn(num_samples, generator.latent_dim, 1, 1).to(device)
        generated = generator(noise)

    grid = make_grid(generated, nrow=4, normalize=True, value_range=(-1, 1))
    plt.figure(figsize=(8, 8))
    plt.imshow(grid.permute(1, 2, 0).cpu())
    plt.title('Generated Faces', fontsize=16)
    plt.axis('off')
    plt.tight_layout()
    plt.show()

# Generate multiple batches
print('Generating random faces...')
for i in range(3):
    print(f'\nBatch {i+1}:')
    generate_faces(generator, num_samples=16)

### Exercise 1: Generate a Large Gallery

Generate a large grid showing the variety of faces the model can create.

In [None]:
# TODO: Generate a gallery of 64 faces in an 8x8 grid
num_faces = 64  # <-- Modify this!

generator.eval()
with torch.no_grad():
    noise = torch.randn(num_faces, generator.latent_dim, 1, 1).to(device)
    generated_faces = generator(noise)

grid = make_grid(generated_faces, nrow=8, normalize=True, value_range=(-1, 1))
plt.figure(figsize=(15, 15))
plt.imshow(grid.permute(1, 2, 0).cpu())
plt.title(f'Generated Face Gallery ({num_faces} unique faces)', fontsize=18)
plt.axis('off')
plt.tight_layout()
plt.show()

---

# Part 7: Exploring the Latent Space

The **latent space** is the 100-dimensional space from which we sample noise. One fascinating property: nearby points in latent space produce similar images!

We can **interpolate** between two points to see smooth transitions.

In [None]:
def interpolate_latent(generator, num_steps=10):
    """
    Interpolate between two random latent vectors.
    Shows smooth transitions between faces.
    """
    generator.eval()

    # Two random starting points
    z1 = torch.randn(1, generator.latent_dim, 1, 1).to(device)
    z2 = torch.randn(1, generator.latent_dim, 1, 1).to(device)

    interpolations = []
    with torch.no_grad():
        for alpha in torch.linspace(0, 1, num_steps):
            z = (1 - alpha) * z1 + alpha * z2
            img = generator(z)
            interpolations.append(img)

    interpolations = torch.cat(interpolations)
    grid = make_grid(interpolations, nrow=num_steps, normalize=True, value_range=(-1, 1))

    plt.figure(figsize=(15, 3))
    plt.imshow(grid.permute(1, 2, 0).cpu())
    plt.title('Latent Space Interpolation (smooth transitions)', fontsize=14)
    plt.axis('off')
    plt.tight_layout()
    plt.show()

# Show multiple interpolations
print('Watch faces morph smoothly!')
for i in range(3):
    print(f'\nInterpolation {i+1}:')
    interpolate_latent(generator, num_steps=10)

### Exercise 2: Longer Interpolations

Try creating longer interpolations with more steps to see finer transitions.

In [None]:
# TODO: Create a longer interpolation with 20 steps
num_steps = 10  # <-- Modify this!

interpolate_latent(generator, num_steps=num_steps)

### Question 5

1. What does smooth interpolation tell us about the latent space structure?
2. If two random points always produce smooth transitions, what does that suggest about the Generator?

---

# Part 8: Exercises

## Exercise 3: Modify the Architecture

The `ngf` parameter controls the number of filters. Try changing it and observe the effect on training.

In [None]:
# TODO: Create a smaller generator with ngf=32 instead of 64
# Compare the number of parameters

ngf_small = 32  # <-- Modify this!

generator_small = Generator(latent_dim=100, ngf=ngf_small).to(device)
generator_large = Generator(latent_dim=100, ngf=64).to(device)

params_small = sum(p.numel() for p in generator_small.parameters())
params_large = sum(p.numel() for p in generator_large.parameters())

print(f'Small Generator (ngf={ngf_small}): {params_small:,} parameters')
print(f'Large Generator (ngf=64): {params_large:,} parameters')
print(f'Reduction: {(1 - params_small/params_large)*100:.1f}%')

## Exercise 4: Effect of Latent Dimension

The latent dimension (default 100) controls the "information capacity" of the noise. Try different values.

In [None]:
# TODO: Create generators with different latent dimensions
# and compare their parameter counts

latent_dims = [10, 50, 100, 200]  # <-- Modify this!

for latent_dim in latent_dims:
    gen = Generator(latent_dim=latent_dim, ngf=64)
    params = sum(p.numel() for p in gen.parameters())
    print(f'Latent dim {latent_dim:3d}: {params:,} parameters')

---

# Summary

## Key Takeaways

1. **GANs** consist of two networks in competition:
   - Generator creates fake images
   - Discriminator distinguishes real from fake

2. **DCGAN** uses convolutional layers for stable image generation:
   - Transposed convolutions for upsampling
   - Batch normalization for stability
   - Specific activations (ReLU/LeakyReLU/Tanh/Sigmoid)

3. **Training dynamics**:
   - Both networks improve together
   - Balance is crucial - neither should dominate
   - Visible quality improvement over epochs

4. **Latent space**:
   - Random noise maps to realistic images
   - Smooth interpolation shows learned structure
   - Each dimension may capture some aspect of variation

## What We Didn't Cover

- **Mode collapse**: When Generator produces limited variety
- **Wasserstein GAN**: More stable training loss
- **Conditional GAN**: Control what to generate
- **StyleGAN**: State-of-the-art face generation

---

## Reflection Questions

1. **For your research domain**: What could GANs generate? (synthetic data, augmentation, simulation?)

2. **Data augmentation**: How could generated data help train other models in your field?

3. **Ethical considerations**: What are the risks of generative models that create realistic fake data?

4. **Beyond images**: GANs can generate molecules, protein structures, time series. What would be most useful for your research?