# Deep Convolutional GAN (DCGAN)

## Core Idea

DCGAN extends vanilla GAN with convolutional architectures, enabling generation of spatially coherent images.
The generator uses transposed convolutions (fractionally-strided convolutions) for learned upsampling,
while the discriminator uses strided convolutions for downsampling, eliminating pooling layers entirely.

## Mathematical Foundation

### Transposed Convolution

Standard convolution with kernel $K$, stride $s$, padding $p$:
$$H_{out} = \lfloor \frac{H_{in} + 2p - k}{s} \rfloor + 1$$

Transposed convolution (learned upsampling):
$$H_{out} = (H_{in} - 1) \times s - 2p + k + p_{out}$$

**Interpretation:** Transposed convolution is NOT the inverse of convolution. It's the gradient operation
of convolution with respect to its input, which happens to perform upsampling.

### Architecture Guidelines (Radford et al., 2015)

1. Replace pooling with strided convolutions (discriminator) and transposed convolutions (generator)
2. Use BatchNorm in both networks (except G output and D input layers)
3. Remove fully connected layers for deeper architectures
4. Use ReLU in generator (except output: Tanh)
5. Use LeakyReLU in discriminator

## Problem Statement

Vanilla GAN with fully-connected layers:
- Loses spatial structure (flattening destroys 2D relationships)
- Scales poorly with image resolution ($O(n^2)$ parameters)
- Produces blurry, incoherent images

DCGAN addresses these by leveraging convolutional inductive bias for images.

## Algorithm Comparison

| Aspect | FC-GAN | DCGAN |
|--------|--------|-------|
| Spatial structure | Lost | Preserved |
| Parameter scaling | $O(H \times W)$ | $O(k^2 \times C)$ |
| Max practical resolution | 28x28 | 64x64+ |
| Training stability | Poor | Better |

## Complexity Analysis

- **Generator:** $O(\sum_l k_l^2 \cdot C_{in}^l \cdot C_{out}^l \cdot H_l \cdot W_l)$
- **Discriminator:** Same complexity, but spatial dims decrease
- **Memory:** Dominated by intermediate feature maps, not weights

In [None]:
from __future__ import annotations

from dataclasses import dataclass, field
from typing import Dict, List, Tuple

import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch import Tensor
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.utils import make_grid

In [None]:
@dataclass
class DCGANConfig:
    """Configuration for DCGAN.
    
    Core Idea:
        Hyperparameters following DCGAN paper recommendations.
    
    Mathematical Theory:
        - ngf/ndf: Base channel multipliers. Feature maps scale as ngf*2^i.
        - beta1=0.5: Reduced momentum in Adam prevents oscillation in adversarial training.
    """
    latent_dim: int = 100
    ngf: int = 64
    ndf: int = 64
    nc: int = 3
    image_size: int = 32
    
    lr: float = 2e-4
    beta1: float = 0.5
    beta2: float = 0.999
    
    batch_size: int = 128
    num_epochs: int = 50
    label_smoothing: float = 0.9
    
    device: str = field(default_factory=lambda: "cuda" if torch.cuda.is_available() else "cpu")
    seed: int = 42

In [None]:
class DCGANGenerator(nn.Module):
    """DCGAN Generator with transposed convolutions.
    
    Core Idea:
        Progressive upsampling from 1x1 latent to full resolution image.
        Each transposed conv doubles spatial dimensions while halving channels.
    
    Mathematical Theory:
        Transposed conv output size: $H_{out} = (H_{in}-1) \times s - 2p + k$
        With k=4, s=2, p=1: $H_{out} = 2 \times H_{in}$ (exact doubling)
    
    Architecture:
        z (100x1x1) -> 512x4x4 -> 256x8x8 -> 128x16x16 -> 64x32x32 -> 3x32x32
    
    Complexity:
        Parameters: ~2.7M for ngf=64
        FLOPs: O(ngf^2 * image_size^2)
    """
    
    def __init__(self, config: DCGANConfig) -> None:
        super().__init__()
        self.config = config
        ngf = config.ngf
        
        self.main = nn.Sequential(
            # z -> ngf*8 x 4 x 4
            nn.ConvTranspose2d(config.latent_dim, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # ngf*8 x 4 x 4 -> ngf*4 x 8 x 8
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # ngf*4 x 8 x 8 -> ngf*2 x 16 x 16
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # ngf*2 x 16 x 16 -> ngf x 32 x 32
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # ngf x 32 x 32 -> nc x 32 x 32
            nn.Conv2d(ngf, config.nc, 3, 1, 1, bias=False),
            nn.Tanh(),
        )
        self._init_weights()
    
    def _init_weights(self) -> None:
        for m in self.modules():
            if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)):
                nn.init.normal_(m.weight, 0.0, 0.02)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.normal_(m.weight, 1.0, 0.02)
                nn.init.zeros_(m.bias)
    
    def forward(self, z: Tensor) -> Tensor:
        return self.main(z.view(-1, self.config.latent_dim, 1, 1))

In [None]:
class DCGANDiscriminator(nn.Module):
    """DCGAN Discriminator with strided convolutions.
    
    Core Idea:
        Progressive downsampling from image to single probability.
        Strided convolutions replace pooling for learned downsampling.
    
    Mathematical Theory:
        Strided conv output: $H_{out} = \lfloor(H_{in} + 2p - k) / s\rfloor + 1$
        With k=4, s=2, p=1: $H_{out} = H_{in} / 2$ (exact halving)
    
    Architecture:
        3x32x32 -> 64x16x16 -> 128x8x8 -> 256x4x4 -> 1x1x1
    
    Problem Statement:
        No BatchNorm in first layer: BN on input images can destabilize training.
        LeakyReLU prevents dead neurons that ReLU causes in discriminator.
    """
    
    def __init__(self, config: DCGANConfig) -> None:
        super().__init__()
        self.config = config
        ndf = config.ndf
        
        self.main = nn.Sequential(
            # nc x 32 x 32 -> ndf x 16 x 16
            nn.Conv2d(config.nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, True),
            # ndf x 16 x 16 -> ndf*2 x 8 x 8
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, True),
            # ndf*2 x 8 x 8 -> ndf*4 x 4 x 4
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, True),
            # ndf*4 x 4 x 4 -> 1 x 1 x 1
            nn.Conv2d(ndf * 4, 1, 4, 1, 0, bias=False),
            nn.Sigmoid(),
        )
        self._init_weights()
    
    def _init_weights(self) -> None:
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight, 0.0, 0.02)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.normal_(m.weight, 1.0, 0.02)
                nn.init.zeros_(m.bias)
    
    def forward(self, x: Tensor) -> Tensor:
        return self.main(x).view(-1, 1)

In [None]:
class DCGANTrainer:
    """Training orchestrator for DCGAN.
    
    Core Idea:
        Standard GAN training with BCE loss. Key difference from MLP-GAN
        is the convolutional architecture, not the training procedure.
    """
    
    def __init__(self, config: DCGANConfig) -> None:
        self.config = config
        self.device = torch.device(config.device)
        
        torch.manual_seed(config.seed)
        
        self.generator = DCGANGenerator(config).to(self.device)
        self.discriminator = DCGANDiscriminator(config).to(self.device)
        
        self.optimizer_g = torch.optim.Adam(
            self.generator.parameters(), lr=config.lr, betas=(config.beta1, config.beta2)
        )
        self.optimizer_d = torch.optim.Adam(
            self.discriminator.parameters(), lr=config.lr, betas=(config.beta1, config.beta2)
        )
        
        self.criterion = nn.BCELoss()
        self.fixed_noise = torch.randn(64, config.latent_dim, device=self.device)
        self.history: Dict[str, List[float]] = {"loss_d": [], "loss_g": [], "d_real": [], "d_fake": []}
    
    def _train_discriminator(self, real: Tensor) -> Tuple[float, float, float]:
        batch_size = real.size(0)
        real_labels = torch.full((batch_size, 1), self.config.label_smoothing, device=self.device)
        fake_labels = torch.zeros(batch_size, 1, device=self.device)
        
        self.optimizer_d.zero_grad()
        
        output_real = self.discriminator(real)
        loss_real = self.criterion(output_real, real_labels)
        
        z = torch.randn(batch_size, self.config.latent_dim, device=self.device)
        fake = self.generator(z).detach()
        output_fake = self.discriminator(fake)
        loss_fake = self.criterion(output_fake, fake_labels)
        
        loss_d = loss_real + loss_fake
        loss_d.backward()
        self.optimizer_d.step()
        
        return loss_d.item(), output_real.mean().item(), output_fake.mean().item()
    
    def _train_generator(self, batch_size: int) -> float:
        real_labels = torch.full((batch_size, 1), self.config.label_smoothing, device=self.device)
        
        self.optimizer_g.zero_grad()
        
        z = torch.randn(batch_size, self.config.latent_dim, device=self.device)
        fake = self.generator(z)
        output = self.discriminator(fake)
        
        loss_g = self.criterion(output, real_labels)
        loss_g.backward()
        self.optimizer_g.step()
        
        return loss_g.item()
    
    def train_epoch(self, dataloader: DataLoader) -> Dict[str, float]:
        self.generator.train()
        self.discriminator.train()
        
        epoch_metrics = {k: 0.0 for k in self.history.keys()}
        num_batches = len(dataloader)
        
        for real, _ in dataloader:
            real = real.to(self.device)
            batch_size = real.size(0)
            
            loss_d, d_real, d_fake = self._train_discriminator(real)
            loss_g = self._train_generator(batch_size)
            
            epoch_metrics["loss_d"] += loss_d
            epoch_metrics["loss_g"] += loss_g
            epoch_metrics["d_real"] += d_real
            epoch_metrics["d_fake"] += d_fake
        
        for k in epoch_metrics:
            epoch_metrics[k] /= num_batches
            self.history[k].append(epoch_metrics[k])
        
        return epoch_metrics
    
    @torch.no_grad()
    def generate_samples(self, num_samples: int = 64) -> Tensor:
        self.generator.eval()
        z = torch.randn(num_samples, self.config.latent_dim, device=self.device)
        return self.generator(z).cpu()

In [None]:
def create_dataloader(config: DCGANConfig) -> DataLoader:
    transform = transforms.Compose([
        transforms.Resize(config.image_size),
        transforms.ToTensor(),
        transforms.Normalize([0.5] * config.nc, [0.5] * config.nc),
    ])
    dataset = datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
    return DataLoader(dataset, batch_size=config.batch_size, shuffle=True, drop_last=True, num_workers=2)

In [None]:
def visualize_samples(samples: Tensor, title: str = "Generated Samples") -> None:
    grid = make_grid(samples, nrow=8, normalize=True, value_range=(-1, 1))
    plt.figure(figsize=(10, 10))
    plt.imshow(grid.permute(1, 2, 0).numpy())
    plt.title(title)
    plt.axis("off")
    plt.tight_layout()
    plt.show()

In [None]:
def plot_training_curves(history: Dict[str, List[float]]) -> None:
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    axes[0].plot(history["loss_d"], label="Discriminator", alpha=0.8)
    axes[0].plot(history["loss_g"], label="Generator", alpha=0.8)
    axes[0].set_xlabel("Epoch")
    axes[0].set_ylabel("Loss")
    axes[0].set_title("Training Loss")
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    axes[1].plot(history["d_real"], label="D(real)", alpha=0.8)
    axes[1].plot(history["d_fake"], label="D(fake)", alpha=0.8)
    axes[1].axhline(y=0.5, color="r", linestyle="--", label="Equilibrium")
    axes[1].set_xlabel("Epoch")
    axes[1].set_ylabel("Discriminator Output")
    axes[1].set_title("Discriminator Confidence")
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

In [None]:
if __name__ == "__main__":
    config = DCGANConfig(num_epochs=50, batch_size=128)
    dataloader = create_dataloader(config)
    trainer = DCGANTrainer(config)
    
    print(f"Generator parameters: {sum(p.numel() for p in trainer.generator.parameters()):,}")
    print(f"Discriminator parameters: {sum(p.numel() for p in trainer.discriminator.parameters()):,}")
    print(f"Device: {config.device}")
    
    for epoch in range(config.num_epochs):
        metrics = trainer.train_epoch(dataloader)
        
        if (epoch + 1) % 10 == 0:
            print(f"Epoch [{epoch+1}/{config.num_epochs}] "
                  f"Loss_D: {metrics['loss_d']:.4f} Loss_G: {metrics['loss_g']:.4f} "
                  f"D(real): {metrics['d_real']:.3f} D(fake): {metrics['d_fake']:.3f}")
            samples = trainer.generate_samples(64)
            visualize_samples(samples, f"Epoch {epoch+1}")
    
    plot_training_curves(trainer.history)

## Summary

DCGAN introduces convolutional architecture to GANs with key design principles:

1. **Transposed convolutions** for learned upsampling (generator)
2. **Strided convolutions** for learned downsampling (discriminator)
3. **BatchNorm** everywhere except G output and D input
4. **ReLU** in generator, **LeakyReLU** in discriminator
5. **Adam** with $\beta_1 = 0.5$ for stable adversarial training

The convolutional inductive bias enables generation of spatially coherent images
at higher resolutions than fully-connected GANs.