# Convolutional Deep Generative Adversarial Networks (DCGAN) using PyTorch

Author: [Markus Enzweiler](https://markus-enzweiler-de), markus.enzweiler@hs-esslingen.de

This is a demo used in a Computer Vision & Machine Learning lecture. Feel free to use and contribute.

We analyze convolutional GANs on datasets such as MNIST and CelebA. We use the Python code and pretrained models from https://github.com/menzHSE/torch-gan. 

## Training

This notebook does not show how to train GANs. Plese refer to https://github.com/menzHSE/torch-gan for that. Here, we use pretrained models from the aforementioned repository. 

Training a GAN involves understanding it as a two-player min-max game with two neural networks: the generator and the discriminator. The generator's goal is to produce data that mimics real data, attempting to deceive the discriminator. It learns and improves based on the discriminator's feedback. The discriminator evaluates both real and generated data, honing its ability to distinguish between the two.

The core of GAN training is this competitive process. The generator aims to create increasingly realistic data, while the discriminator tries to accurately identify fakes. This competition drives both networks to improve over time. However, maintaining a balance is crucial; if one network outperforms the other significantly, it can lead to training issues. The challenge in GAN training is achieving this balance and ensuring neither network dominates.

## Visualization of the Generator's Progress during Training

### MNIST

![MNIST Progress](https://github.com/menzHSE/cv-ml-lecture-notebooks/blob/main/assets/mnist.gif?raw=true)


### CIFAR-10

![CIFAR-10 Progress](https://github.com/menzHSE/cv-ml-lecture-notebooks/blob/main/assets/cifar-10.gif?raw=true)


### CelebA

![CelebA Progress](https://github.com/menzHSE/cv-ml-lecture-notebooks/blob/main/assets/celeb-a.gif?raw=true)


## GAN Implementation Overview

The architecture of the implemented DCGANs in the repository https://github.com/menzHSE/torch-gan follows the influential paper ["Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"](https://arxiv.org/abs/1511.06434):


<img src="https://github.com/menzHSE/cv-ml-lecture-notebooks/blob/main/assets/generator_architecture.jpg?raw=true" width="600px" />
<br>

<sup>(Figure taken from https://arxiv.org/abs/1511.06434)</sup>
<br>
<br>
Particularly, we follow the best practices and parameter settings mentioned in the paper:


- **Fully Convolutional Architecture:** Adapts a convolutional approach throughout the model for both the generator and discriminator, enhancing feature extraction capabilities.
- **Architecture Guidelines for Stability:** Incorporates best practices for stable training of deep convolutional GANs, such as:
  - Replacing pooling layers with strided convolutions in the discriminator and transposed convolutions in the generator.
  - Utilizing batch normalization in both the generator and discriminator to improve training dynamics.
  - Omitting fully connected hidden layers in deeper architectures to streamline the network.
  - Employing ReLU activation in the generator for all layers except the output, which uses a Tanh function.
  - Integrating LeakyReLU activation in the discriminator for all layers, enhancing gradient flow.
  - Custom weight initialization and Adam optimizer with adapted parameters, as given in the DCGAN paper.  

For another guide on creating a DCGAN with PyTorch, including additional insights and practical tips, also check out the [PyTorch DCGAN tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html).

## Setup

Adapt `packagePath` to point to the directory containing this notebeook.

In [None]:
# Imports
import sys
import os

In [None]:
# Additional imports

# Repository Root
repo_root = os.path.abspath(os.path.join("..", ".."))
# Add the repository root to the system path
sys.path.append(repo_root)

# Package Imports
from nbutils import requirements as nb_reqs
from nbutils import colab as nb_clab
from nbutils import git as nb_git
from nbutils import exec as nb_exec

In [None]:
# Package Path
package_path = "./" # local
print(f"Package path: {package_path}")

# Running on Colab?
on_colab = nb_clab.check_for_colab()

In [None]:
# Clone git repository

# Absolute path of the repository directory
repo_dir = os.path.join(package_path, "torch-gan")
repo_url = "https://github.com/menzHSE/torch-gan.git"

nb_git.clone(repo_url, repo_dir)

In [None]:
# Install requirements in the current Jupyter kernel
req_file = os.path.join(repo_dir, "requirements.txt")
nb_reqs.pip_install_reqs(req_file)

# Additional requirements for this notebook
req_file = os.path.join(os.getcwd(), "requirements.txt")
nb_reqs.pip_install_reqs(req_file)    

# Latent Space Analysis using MNIST

To analyze the concept of the latent space, we use a special generator network that has been trained with 2 latent dimensions  MNIST from https://github.com/menzHSE/torch-gan. This makes it easy to visualize. 

## Load the MNIST Generator

In [None]:
import numpy as np
import torch
import torchvision

# random seed
seed = 0
torch.manual_seed(seed)
np.random.seed(seed)

# Add the directory containing models.py to the system path
sys.path.append(os.path.join(package_path, 'torch-gan'))


# Now we can import the model and dataset
import model
import dataset
import device

# parameters
dataset_id       = "mnist"
num_latent_dims  = 2
max_num_filters  = 512
img_size         = (64, 64)
batch_size       = 32
model_id         = f"G_filters_{max_num_filters:04d}_dims_{num_latent_dims:04d}.pth"
gen_fname        = os.path.join(package_path, "torch-gan", "pretrained", dataset_id, model_id)
dev              = device.autoselectDevice()

# load dataset
# for GANs, this is normalised to [-1, 1]
mnist_train_loader, mnist_test_loader, mnist_classes_list, mnist_num_img_channels = dataset.get_loaders(
    dataset_id, img_size=img_size, batch_size=batch_size)

# load the generator
G_lat2 = model.Generator(num_latent_dims, mnist_num_img_channels, max_num_filters, dev)
G_lat2.load(gen_fname, dev)

if G_lat2:
    print(f"Model {gen_fname} loaded successfully!")
    print(f"Device used: {dev}")
    G_lat2.to(dev)
    G_lat2.eval()


## Show some Training Images

In [None]:
def normalizeForDisplay(images):
    # normalize from [-1, 1] to [0, 1]
    return (images + 1.0) / 2.0

In [None]:
import matplotlib.pyplot as plt

# get a batch of images from the training set and display them
# we use the torchvision.utils.make_grid function to create a grid of images
images, labels = next(iter(mnist_train_loader))
grid_img = torchvision.utils.make_grid(normalizeForDisplay(images), nrow=batch_size//4)
plt.figure(figsize=(10,10))
plt.imshow(np.transpose(grid_img, (1,2,0)))
plt.axis('off')
plt.title('Batch from the Training Set')
plt.show()

## Sample the Latent Space and Show Generated Images

In [None]:
# Generate a grid of images by uniformly sampling the latent space
# and generate images from the the latent vectors

# Number of images per row and column
n = 20

# Size of each image (assuming square images)
image_size = 64  

# Limits of the latent space
xlim = [-2, 2]
ylim = [-2, 2]

# Number of ticks on each axis
num_ticks = 9

# Create a grid of latent vectors
x = np.linspace(xlim[0], xlim[1], n)
y = np.linspace(ylim[1], ylim[0], n)
xx, yy = np.meshgrid(x, y)

# Create an empty array for the large image
large_image = np.zeros((n * image_size, n * image_size))

# Loop through the grid
for i in range(n):
    for j in range(n):
        # Get the latent vector
        z = np.array([[xx[i, j], yy[i, j]]])
        
        # Decode the latent vector to an image
        with torch.no_grad():
            x_generated = G_lat2(torch.from_numpy(z).float().to(dev)).cpu().numpy()
        
        # Place the generated image in the large array
        large_image[i*image_size:(i+1)*image_size, j*image_size:(j+1)*image_size] = x_generated[0, 0]

# Create a figure
plt.figure(figsize=(10, 10))

# Display the large image
plt.imshow(normalizeForDisplay(large_image), cmap='gray')

# Set the ticks to correspond to the latent space values
tick_positions_x = np.linspace(0, n*image_size, num_ticks)
tick_labels_x = np.linspace(xlim[0], xlim[1], num_ticks)
plt.xticks(ticks=tick_positions_x, labels=[f'{val:.1f}' for val in tick_labels_x])

tick_positions_y = np.linspace(0, n*image_size, num_ticks)
tick_labels_y = np.linspace(ylim[0], ylim[1], num_ticks)
plt.yticks(ticks=tick_positions_y, labels=[f'{val:.1f}' for val in reversed(tick_labels_y)])  # Reversed y-labels

# Labels and title
plt.xlabel('Latent Dimension 1')
plt.ylabel('Latent Dimension 2')
plt.title('Grid of Images Generated from Samples in the Latent Space')

# Show the plot
plt.show()


# Generate Random MNIST-like Samples from the Generator

The generator in a DCGAN is pivotal for creating new images. It begins with a random latent vector, typically following a normal distribution, as its input. This vector, representing a random point in latent space, is what the generator uses to produce an image.

When generating an image, the generator's role is to map this latent vector to an image that mirrors the distribution of real images it's been trained on. This process involves the transformation of the input vector's distribution into something that resembles the data distribution of real images, that is however not explicity estimated. It is just sampled from with the help of the generator.  

**Steps for Image Generation**

- Latent vector creation: Start with a random latent vector sampled from a normal distribution. The size of this vector is defined by the GAN's architecture.

- Transformation the by generator: Feed the latent vector into the generator network. The generator then uses its learned parameters to transform this vector into an image.

- Output: The result is a synthetic image that, ideally, looks similar to the images the network was trained on.


## Load a Generator with 100 Latent Dimensions (see DCGAN Paper)

In [None]:
# parameters
dataset_id       = "mnist"
num_latent_dims  = 100
max_num_filters  = 512
img_size         = (64, 64)
batch_size       = 32
model_id         = f"G_filters_{max_num_filters:04d}_dims_{num_latent_dims:04d}.pth"
gen_fname        = os.path.join(package_path, "torch-gan", "pretrained", dataset_id, model_id)
dev              = device.autoselectDevice()


# load the generator
G_lat100 = model.Generator(num_latent_dims, mnist_num_img_channels, max_num_filters, dev)
G_lat100.load(gen_fname, dev)

if G_lat100:
    print(f"Model {gen_fname} loaded successfully!")
    print(f"Device used: {dev}")
    G_lat100.to(dev)
    G_lat100.eval()

In [None]:
import utils

def sampleAndPlot(G, num_latent_dims, num_samples=batch_size):
    with torch.no_grad():
        for i in range(num_samples):         
      
            # generate a random latent vector   
            z = utils.sample_latent_vectors(1, num_latent_dims, dev)

            # generate an image from the latent vector
            img = G(z)

            if i == 0:
                pics = img
            else:
                pics = torch.cat((pics, img), dim=0)

          
        # Create a grid of images
        grid_img = torchvision.utils.make_grid(pics, nrow=batch_size//4)

        # Convert grid to numpy and transpose axes for plotting
        grid_np = grid_img.cpu().numpy()
        grid_np = np.transpose(grid_np, (1, 2, 0))

        # Plotting
        plt.figure(figsize=(10, 10))
        plt.imshow(normalizeForDisplay(grid_np))
        plt.axis('off')
        plt.title(f'Randomly Generated Images from the generator with {num_latent_dims} Latent Dimensions')
        plt.show()

## Generate Samples

In [None]:
sampleAndPlot(G_lat100, num_latent_dims=100)