<a href="https://colab.research.google.com/github/goyalpramod/paper_implementations/blob/main/GANs_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Code implementation of the following paper -> [Generative Adversarial Nets](https://arxiv.org/pdf/1406.2661)

This work has been highly influenced by the following implemnetations

* [bamos/dcgan-completion.tensorflow](https://github.com/bamos/dcgan-completion.tensorflow)
* [soumith/dcgan.torch](https://github.com/soumith/dcgan.torch/blob/master/main.lua) [The original pytorch implementation, you can find the tf version in alec's github]
* [openai/improved-gan](https://github.com/openai/improved-gan)
* [lilianweng/unified-gan-tensorflow](https://github.dev/lilianweng/unified-gan-tensorflow) [I picked this one to follow (no reasons)]
* [carpedm20/DCGAN-tensorflow](https://github.com/carpedm20/DCGAN-tensorflow) [Most of the implementation build on this]

Also check out the following blogs for more clarity on the topic
* [Brandon's Blog](https://bamos.github.io/2016/08/09/deep-completion/)
* [Lil'log's Blog](https://lilianweng.github.io/posts/2017-08-20-gan/)

# Lets understand some equations from the paper

$\min_G \max_D V(D,G) = \mathbb{E}_{x\sim p_{data}(x)}[\log D(x)] + \mathbb{E}_{z\sim p_z(z)}[\log(1-D(G(z)))]$

$\min_G \max_D V(D,G)$

This notation means:

* The discriminator D tries to maximize V(D,G)
* The generator G tries to minimize V(D,G)

This creates an adversarial game between G and D

$\mathbb{E}_{x\sim p_{data}(x)}[\log D(x)]$

This represents:
* Expected value (𝔼) over samples x drawn from the real data distribution pdata(x)
* D(x) outputs a probability between 0 and 1 (how "real" D thinks x is)
* log D(x) rewards D for correctly identifying real samples
* When D(x)→1 for real data, log D(x)→0, maximizing the objective

$\mathbb{E}_{z\sim p_z(z)}[\log(1-D(G(z)))]$

This represents:

* Expected value over random noise z drawn from noise distribution pz(z)
* G(z) generates fake samples from noise
* D(G(z)) is the discriminator's output on generated samples
* log(1-D(G(z))) rewards D for correctly identifying fake samples
* When D(G(z))→0 for fake data, log(1-D(G(z)))→0, maximizing the objective

$D^*_G(x) = \frac{p_{data}(x)}{p_{data}(x) + p_g(x)}$

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

In [2]:
nn.ConvTranspose2d?

In [None]:
class Generator(nn.Module):
    """
    Input: (batch_size, latent_dim)  # typically latent_dim = 100
    Output: (batch_size, channels, height, width)  # e.g., (batch_size, 3, 64, 64)

    Architecture Hints:
    1. Start with Dense layer to reshape latent vector
    2. Use multiple ConvTranspose2d layers to upscale
    3. Each layer should follow pattern: ConvTranspose2d -> BatchNorm2d -> ReLU
    4. Final layer should use Tanh activation
    """
    def __init__(self, latent_dim):
        super(Generator, self).__init__()

        # Step 1: Dense layer to reshape
        # latent_dim -> (4*4*512) features
        # Hint: Use Linear layer here

        # Step 2: Reshape to (batch_size, 512, 4, 4)

        # Step 3: Multiple ConvTranspose2d blocks
        # Block 1: (512, 4, 4) -> (256, 8, 8)
        # kernel_size=4, stride=2, padding=1

        # Block 2: (256, 8, 8) -> (128, 16, 16)
        # Same parameters as above

        # Block 3: (128, 16, 16) -> (64, 32, 32)

        # Final Block: (64, 32, 32) -> (3, 64, 64)
        # Don't forget Tanh() at the end!

In [3]:
class Generator(nn.Module):
    def __init__(self, latent_dim):
        super().__init__()

        # Dense layer
        self.linear = nn.Linear(latent_dim, 4*4*512)

        # Blocks
        self.block1 = nn.Sequential(
            nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True)
        )

        self.block2 = nn.Sequential(
            nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True)
        )

        self.block3 = nn.Sequential(
            nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True)
        )

        self.block4 = nn.Sequential(
            nn.ConvTranspose2d(64, 3, kernel_size=4, stride=2, padding=1),
            nn.Tanh()
        )

    def forward(self, x):
        # x shape: (batch_size, latent_dim)
        x = self.linear(x)
        x = x.view(-1, 512, 4, 4)  # reshape using view
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        return x

In [4]:
nn.Conv2d?

In [None]:
class Discriminator(nn.Module):
    """
    Input: (batch_size, channels, height, width)  # e.g., (batch_size, 3, 64, 64)
    Output: (batch_size, 1)  # probability between 0-1

    Architecture Hints:
    1. Use Conv2d layers to downsample
    2. Each layer: Conv2d -> BatchNorm2d (except first) -> LeakyReLU
    3. Final layer should be Dense with Sigmoid
    """
    def __init__(self):
        super(Discriminator, self).__init__()

        # Block 1: (3, 64, 64) -> (64, 32, 32)
        # Conv2d: kernel_size=4, stride=2, padding=1
        # Don't use BatchNorm in first layer!

        # Block 2: (64, 32, 32) -> (128, 16, 16)
        # Remember BatchNorm here

        # Block 3: (128, 16, 16) -> (256, 8, 8)

        # Block 4: (256, 8, 8) -> (512, 4, 4)

        # Final: Flatten and Dense to single output
        # Don't forget Sigmoid()!

In [5]:
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()

        self.block1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=4, stride=2, padding=1),
            nn.LeakyReLU(0.2, inplace=True)
        )

        self.block2 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True)
        )

        self.block3 = nn.Sequential(
            nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace=True)
        )

        self.block4 = nn.Sequential(
            nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1),
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2, inplace=True)
        )

        self.output = nn.Sequential(
            nn.Linear(512*4*4, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = x.view(-1, 512*4*4)  # flatten
        x = self.output(x)
        return x

In [None]:
def train_step(real_images, generator, discriminator, criterion,
               optimizer_G, optimizer_D, latent_dim):
    batch_size = real_images.size(0)

    # 1. Train Discriminator
    optimizer_D.zero_grad()

    # 1a. Train on real batch
    # Hint: Create labels for real images (all ones)
    # Hint: Forward real images through D
    # Hint: Calculate loss for real images

    # 1b. Train on fake batch
    # Hint: Generate noise z
    # Hint: Generate fake images using G(z)
    # Hint: Create labels for fake images (all zeros)
    # Hint: Forward fake images through D
    # Hint: Calculate loss for fake images

    # 1c. Add both losses and update D
    # Hint: d_loss = d_loss_real + d_loss_fake
    # Hint: backward and step

    # 2. Train Generator
    optimizer_G.zero_grad()

    # 2a. Since we updated D, perform another forward pass of fake images
    # Hint: Use the SAME noise vector z
    # Hint: But we want D to think these are real! So use real_labels

    # 2b. Calculate G's loss based on D's output
    # Hint: Calculate loss using criterion
    # Hint: backward and step

    return d_loss.item(), g_loss.item()