# 4. Image Generation with GANs

Generate completely new images with Generative Adversarial Networks (GANs). Learn to build and train a Deep Convolutional GAN, and how to evaluate the quality and variety of its outputs.


In [1]:
import torch.nn as nn
import torch

## Introduction to GANs


### Generator

A GAN generator takes a random noise vector as input and produces a generated image. To make its architecture more reusable, you will pass both input and output shapes as parameters to the model. This way, you can use the same model with different sizes of input noise and images of varying shapes.

You can also access a custom `gen_block()` function which returns a block of: linear layer, batch norm, and ReLU activation. You will use it as a building block for the generator.

```
def gen_block(in_dim, out_dim):
    return nn.Sequential(
        nn.Linear(in_dim, out_dim),
        nn.BatchNorm1d(out_dim),
        nn.ReLU(inplace=True)
    )
```


In [2]:
def gen_block(in_dim, out_dim):
    return nn.Sequential(
        nn.Linear(in_dim, out_dim), nn.BatchNorm1d(out_dim), nn.ReLU(inplace=True)
    )

Instructions:

- Define `self.generator` as a sequential model.
- After the last `gen_block`, add a linear layer with the appropriate input size and the output size of `out_dim`.
- Add a sigmoid activation after the linear layer.
- In the `forward()` method, pass the model's input through `self.generator`.


In [3]:
class Generator(nn.Module):
    def __init__(self, in_dim, out_dim):
        super(Generator, self).__init__()
        # Define generator block
        self.generator = nn.Sequential(
            gen_block(in_dim, 256),
            gen_block(256, 512),
            gen_block(512, 1024),
            # Add linear layer
            nn.Linear(1024, out_dim),
            # Add activation
            nn.Sigmoid(),
        )

    def forward(self, x):
        # Pass input through generator
        return self.generator(x)

### Discriminator

With the generator defined, the next step in building a GAN is to construct the discriminator. It takes the generator's output as input, and produces a binary prediction: is the input generated or real?

You can also access a custom `disc_block()` function which returns a block of a linear layer followed by a LeakyReLU activation. You will use it as a building block for the discriminator.

```
def disc_block(in_dim, out_dim):
    return nn.Sequential(
        nn.Linear(in_dim, out_dim),
        nn.LeakyReLU(0.2)
    )
```


In [4]:
def disc_block(in_dim, out_dim):
    return nn.Sequential(nn.Linear(in_dim, out_dim), nn.LeakyReLU(0.2))

Instructions:

- Add the last discriminator block to the model, with the appropriate input size and the output of `256`.
- After the last discriminator block, add a linear layer to map the output to the size of `1`.
- Define the forward() method to pass the input image through the sequential block defined in `__init__()`.


In [5]:
class Discriminator(nn.Module):
    def __init__(self, im_dim):
        super(Discriminator, self).__init__()
        self.disc = nn.Sequential(
            disc_block(im_dim, 1024),
            disc_block(1024, 512),
            # Define last discriminator block
            disc_block(512, 256),
            # Add a linear layer
            nn.Linear(256, 1),
        )

    def forward(self, x):
        # Define the forward method
        return self.disc(x)

## Deep Convolutional GAN


### Convolutional Generator

Define a convolutional generator following the DCGAN guidelines discussed in the last video.

A custom function `dc_gen_block()` is available, which eturns a block of a transposed convolution, batch norm, and ReLU activation. This function serves as a foundational component for constructing the convolutional generator. You can get familiar with `dc_gen_block()`'s definition below.


In [6]:
def dc_gen_block(in_dim, out_dim, kernel_size, stride):
    return nn.Sequential(
        nn.ConvTranspose2d(in_dim, out_dim, kernel_size, stride=stride),
        nn.BatchNorm2d(out_dim),
        nn.ReLU(),
    )

Instructions:

- Add the last generator block, mapping the size of the feature maps to `256`.
- Add a transposed convolution with the output size of `3`.
- Add the tanh activation.


In [7]:
class DCGenerator(nn.Module):
    def __init__(self, in_dim, kernel_size=4, stride=2):
        super(DCGenerator, self).__init__()
        self.in_dim = in_dim
        self.gen = nn.Sequential(
            dc_gen_block(in_dim, 1024, kernel_size, stride),
            dc_gen_block(1024, 512, kernel_size, stride),
            # Add last generator block
            dc_gen_block(512, 256, kernel_size, stride),
            # Add transposed convolution
            nn.ConvTranspose2d(256, 3, kernel_size, stride),
            # Add tanh activation
            nn.Tanh(),
        )

    def forward(self, x):
        x = x.view(len(x), self.in_dim, 1, 1)
        return self.gen(x)

### Convolutional Discriminator

With the DCGAN's generator ready, the last step before you can proceed to training it is to define the convolutional discriminator.

To build the convolutional discriminator, you will use a custom `gc_disc_block()` function which returns a block of a convolution followed by a batch norm and the leaky ReLU activation. You can inspect `dc_disc_block()`'s definition below.

In [8]:
def dc_disc_block(in_dim, out_dim, kernel_size, stride):
    return nn.Sequential(
        nn.Conv2d(in_dim, out_dim, kernel_size, stride=stride),
        nn.BatchNorm2d(out_dim),
        nn.LeakyReLU(0.2),
    )

Instructions:

- Add the first discriminator block using the custom `dc_disc_block()` function with 3 input feature maps and 512 output feature maps.
- Add the convolutional layer with the output size of `1`.
- In the `forward()` method, pass the input through the sequential block you defined in `__init__()`.

In [9]:
class DCDiscriminator(nn.Module):
    def __init__(self, kernel_size=4, stride=2):
        super(DCDiscriminator, self).__init__()
        self.disc = nn.Sequential(
            # Add first discriminator block
            dc_disc_block(3, 512, kernel_size, stride),
            dc_disc_block(512, 1024, kernel_size, stride),
            # Add a convolution
            nn.Conv2d(1024, 1, kernel_size, stride=stride),
        )

    def forward(self, x):
        # Pass input through sequential block
        x = self.disc(x)
        return x.view(len(x), -1)

## Training GANs

### Generator loss

Before you can train your GAN, you need to define loss functions for both the generator and the discriminator. You will start with the former.

Recall that the generator's job is to produce such fake images that would fool the discriminator into classifying them as real. Therefore, the generator incurs a loss if the images it generated are classified by the discriminator as fake (label `0`).

Define the `gen_loss()` function that calculates the generator loss. It takes four arguments:

- `gen`, the generator model
- `disc`, the discriminator model
- `num_images`, the number of images in batch
- `z_dim`, the size of the input random noise

Instructions:

- Generate random noise of shape `num_images` by `z_dim` and assign it to noise.
- Use the generator to generate a fake image from for `noise` and assign it to `fake`.
- Get discriminator's prediction for the generated fake image.
- Compute generators loss by calling `criterion` on discriminator's predictions and the a tensor of ones of the same shape.

In [10]:
def gen_loss(gen, disc, criterion, num_images, z_dim):
    # Define random noise
    noise = torch.rand(num_images, z_dim)
    # Generate fake image
    fake = gen(noise)
    # Get discriminator's prediction on the fake image
    disc_pred = disc(fake)
    # Compute generator loss
    criterion = nn.BCEWithLogitsLoss()
    gen_loss = criterion(fake, disc_pred)
    return gen_loss

### Discriminator loss

It's time to define the loss for the discriminator. Recall that the discriminator's job is to classify images either real or fake. Therefore, the generator incurs a loss if it classifies generator's outputs as real (label `1`) or the real images as fake (label `0`).

Define the `disc_loss()` function that calculates the discriminator loss. It takes five arguments:

- `gen`, the generator model
- `disc`, the discriminator model
- `real`, a sample of real images from the training data
- `num_images`, the number of images in batch
- `z_dim`, the size of the input random noise

Instructions:

- Use the discriminator to classify `fake` images and assign the predictions to `disc_pred_fake`.
- Compute the fake loss component by calling `criterion` on discriminator's predictions for fake images and the a tensor of zeros of the same shape.
- Use the discriminator to classify `real` images and assign the predictions to `disc_pred_real`.
- Compute the real loss component by calling `criterion` on discriminator's predictions for real images and the a tensor of ones of the same shape.

In [11]:
def disc_loss(gen, disc, real, num_images, z_dim):
    criterion = nn.BCEWithLogitsLoss()
    noise = torch.randn(num_images, z_dim)
    fake = gen(noise)
    # Get discriminator's predictions for fake images
    disc_pred_fake = disc(fake)
    # Calculate the fake loss component
    fake_loss = criterion(fake, disc_pred_fake)
    # Get discriminator's predictions for real images
    disc_pred_real = disc(real)
    # Calculate the real loss component
    real_loss = criterion(real, disc_pred_real)
    disc_loss = (real_loss + fake_loss) / 2
    return disc_loss

### Training loop

Finally, all the hard work you put into defining the model architectures and loss functions comes to fruition: it's training time! Your job is to implement and execute the GAN training loop. Note: a `break` statement is placed after the first batch of data to avoid a long runtime.

The two optimizers, `disc_opt` and `gen_opt`, have been initialized as `Adam()` optimizers. The functions to compute the losses that you defined earlier, `gen_loss()` and `disc_loss()`, are available to you. A `dataloader` is also prepared for you.

Recall that:

- `disc_loss()`'s arguments are: `gen`, `disc`, `real`, `cur_batch_size`, `z_dim`.
- `gen_loss()`'s arguments are: `gen`, `disc`, `cur_batch_size`, `z_dim`.

Instructions:
- Calculate the discriminator loss using `disc_loss()` by passing it the generator, the discriminator, the sample of real images, current batch size, and the noise size of `16`, in this order, and assign the result to `disc_loss`.
- Calculate gradients using `disc_loss`.
- Calculate the generator loss using `gen_loss()` by passing it the generator, the discriminator, current batch size, and the noise size of `16`, in this order, and assign the result to `gen_loss`.
- Calculate gradients using `gen_loss`.

In [None]:
for epoch in range(1):
    for real in dataloader:
        cur_batch_size = len(real)
        
        disc_opt.zero_grad()
        # Calculate discriminator loss
        disc_loss = disc_loss(gen, disc, real, cur_batch_size, z_dim=16)
        # Compute discriminator gradients 
        disc_loss.backward()
        disc_opt.step()

        gen_opt.zero_grad()
        # Calculate generator loss
        gen_loss = gen_loss(gen, disc, cur_batch_size, z_dim=16)
        # Compute generator gradients
        gen_loss.backward()
        gen_opt.step()

        print(f"Generator loss: {gen_loss}")
        print(f"Discriminator loss: {disc_loss}")
        break

## Evaluating GANs

### Generating images

Now that you have designed and trained your GAN, it's time to evaluate the quality of the images it can generate. For a start, you will perform a visual inspection to see if the generation resemble the Pokemons at all. To do this, you will create random noise as input for the generator, pass it to the model and plot the outputs.

The Deep Convolutional Generator with trained weights is available to you as `gen`.

Instructions:

- Create a random noise tensor of shape `num_images_to_generate` by `16`, the input noise size you used to train the generator, and assign it to `noise`.
- Generate images by passing the noise to the generator and assign them to `fake`.
- Inside the for loop, slice `fake` to extract the i-th image and assign it to `image_tensor`.
- Permute `image_tensor`'s dimensions from (color, height, width) to (hight, width, color) and assign the output to `image_tensor_permuted`.

In [None]:
num_images_to_generate = 5
# Create random noise tensor
noise = torch.rand(5, 16)

# Generate images
with torch.no_grad():
    fake = gen(noise)
print(f"Generated tensor shape: {fake.shape}")

for i in range(num_images_to_generate):
    # Slice fake to select i-th image
    image_tensor = fake[i, :, :, :]
    # Permute the image dimensions
    image_tensor_permuted = image_tensor.permute(1, 2, 0)
    plt.imshow(image_tensor_permuted)
    plt.show()

### Fréchet Inception Distance

The visual inspection of generated images is a great start. But given they look okay, a more precise, quantitative evaluation will be helpful to understand the generator's performance. You will evaluate your GAN using the Fréchet Inception Distance, or FID.

Two tensors with fake and real images, 32 examples each, are available to you as `fake` and `real`, respectively. Use them to compute the FID!

Instructions:

- Import `FrechetInceptionDistance` from the appropriate `torchmetrics` module.
- Instantiate the FID metric based on the 64th Inception feature layer and assign it to `fid`.
- Update `fid` with real image tensor, multiplied by `255` and parsed to `torch.uint8`.
- Compute the `fid` metric, assigning the output to `fid_score`.

In [None]:
# Import FrechetInceptionDistance
from torchmetrics.image.fid import FrechetInceptionDistance

# Instantiate FID
fid = FrechetInceptionDistance(feature=64)

# Update FID with real images
fid.update((fake * 255).to(torch.uint8), real=False)
fid.update((real * 255).to(torch.uint8), real=True)

# Compute the metric
fid_score = fid.compute()
print(fid_score)