# Face Generation

In this project, you'll define and train a Generative Adverserial network of your own creation on a dataset of faces. Your goal is to get a generator network to generate *new* images of faces that look as realistic as possible!

The project will be broken down into a series of tasks from **defining new architectures training adversarial networks**. At the end of the notebook, you'll be able to visualize the results of your trained Generator to see how it performs; your generated samples should look like fairly realistic faces with small amounts of noise.

### Get the Data

You'll be using the [CelebFaces Attributes Dataset (CelebA)](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) to train your adversarial networks.

This dataset has higher resolution images than datasets you have previously worked with (like MNIST or SVHN) you've been working with, and so, you should prepare to define deeper networks and train them for a longer time to get good results. It is suggested that you utilize a GPU for training.

### Pre-processed Data

Since the project's main focus is on building the GANs, we've done *some* of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. Some sample data is show below.

<img src='assets/processed_face_data.png' width=60% />

In [None]:
# run this once to unzip the file
!unzip processed-celeba-small.zip

In [None]:
from glob import glob
from typing import Tuple, Callable, Dict

import matplotlib.pyplot as plt
import numpy as np
import torch
from PIL import Image
from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import Compose, ToTensor, Resize, Normalize

import os
import tests
import random

I've tried to fix the random values, but that also fixes the results a bit. So only execute it if you want the noise be stabilized.

In [None]:
seed = 42
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

In [None]:
data_dir = 'processed_celeba_small/celeba/'

## Data pipeline

The [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) dataset contains over 200,000 celebrity images with annotations. Since you're going to be generating faces, you won't need the annotations, you'll only need the images. Note that these are color images with [3 color channels (RGB)](https://en.wikipedia.org/wiki/Channel_(digital_image)#RGB_Images) each.

### Pre-process and Load the Data

Since the project's main focus is on building the GANs, we've done *some* of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. This *pre-processed* dataset is a smaller subset of the very large CelebA dataset and contains roughly 30,000 images. 

Your first task consists in building the dataloader. To do so, you need to do the following:
* implement the get_transforms function
* create a custom Dataset class that reads the CelebA data

### Exercise: implement the get_transforms function

The `get_transforms` function should output a [`torchvision.transforms.Compose`](https://pytorch.org/vision/stable/generated/torchvision.transforms.Compose.html#torchvision.transforms.Compose) of different transformations. You have two constraints:
* the function takes a tuple of size as input and should **resize the images** to the input size
* the output images should have values **ranging from -1 to 1**

In [None]:
def get_transforms(size: Tuple[int, int]) -> Callable:
    """ Transforms to apply to the image."""
    transforms = [
        Resize(size),  # Resize the image to the specified size
        ToTensor(),    # Convert the image to a PyTorch tensor
        Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # Normalize to [-1, 1]
    ]
    
    return Compose(transforms)

### Exercise: implement the DatasetDirectory class


The `DatasetDirectory` class is a torch Dataset that reads from the above data directory. The `__getitem__` method should output a transformed tensor and the `__len__` method should output the number of files in our dataset. You can look at [this custom dataset](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#creating-a-custom-dataset-for-your-files) for ideas. 

In [None]:
class DatasetDirectory(Dataset):
    """
    A custom dataset class that loads images from a folder.
    
    Args:
    - directory (str): Location of the images.
    - transforms (Callable): Transform function to apply to the images.
    - extension (str): File format to filter images by (e.g., '.jpg', '.png').
    """
    
    def __init__(self, directory: str, transforms: Callable = None, extension: str = '.jpg'):
        self.directory = directory
        self.transforms = transforms if transforms else get_transforms((64, 64)) # more flexibility for transformations
        self.extension = extension
        self.image_paths = [os.path.join(directory, f) for f in os.listdir(directory) 
                            if f.endswith(extension)]

    def __len__(self) -> int:
        """Returns the number of items in the dataset."""
        return len(self.image_paths)

    def __getitem__(self, index: int) -> torch.Tensor:
        """Loads an image, applies transformation, and returns it."""
        image_path = self.image_paths[index]
        image = Image.open(image_path).convert('RGB')  # Ensure image is in RGB mode
        
        # Apply the transformations to the image
        if self.transforms:
            image = self.transforms(image)
        
        return image

In [None]:
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to verify your dataset implementation
dataset = DatasetDirectory(data_dir, get_transforms((64, 64)))
tests.check_dataset_outputs(dataset)

The functions below will help you visualize images from the dataset.

In [None]:
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""

def denormalize(images):
    """Transform images from [-1.0, 1.0] to [0, 255] and cast them to uint8."""
    return ((images + 1.) / 2. * 255).astype(np.uint8)

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(20, 4))
plot_size=20
for idx in np.arange(plot_size):
    ax = fig.add_subplot(2, int(plot_size/2), idx+1, xticks=[], yticks=[])
    img = dataset[idx].numpy()
    img = np.transpose(img, (1, 2, 0))
    img = denormalize(img)
    ax.imshow(img)

## Model implementation

As you know, a GAN is comprised of two adversarial networks, a discriminator and a generator. Now that we have a working data pipeline, we need to implement the discriminator and the generator. 

Feel free to implement any additional class or function.

### Exercise: Create the discriminator

The discriminator's job is to score real and fake images. You have two constraints here:
* the discriminator takes as input a **batch of 64x64x3 images**
* the output should be a single value (=score)

Feel free to get inspiration from the different architectures we talked about in the course, such as DCGAN, WGAN-GP or DRAGAN.

#### Some tips
* To scale down from the input image, you can either use `Conv2d` layers with the correct hyperparameters or Pooling layers.
* If you plan on using gradient penalty, do not use Batch Normalization layers in the discriminator.

In [None]:
from torch.nn import Module
import torch.nn as nn

In [None]:
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        
        # Define the discriminator network
        self.model = nn.Sequential(
            # First convolution layer
            nn.Conv2d(3, 64, kernel_size=4, stride=2, padding=1),  # 64x64 -> 32x32
            nn.LeakyReLU(0.2, inplace=True),
            
            # Second convolution layer
            nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1),  # 32x32 -> 16x16
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),
            
            # Third convolution layer
            nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1),  # 16x16 -> 8x8
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace=True),
            
            # Fourth convolution layer
            nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1),  # 8x8 -> 4x4
            nn.BatchNorm2d(512),
            nn.LeakyReLU(0.2, inplace=True),
            
            # Final convolution layer
            nn.Conv2d(512, 1, kernel_size=4, stride=1, padding=0),  # 4x4 -> 1x1

            # Commented out because I am going to use the Wasserstein Distance and hence a Wasserstein-GAN
            # nn.Sigmoid()  # Output a single value representing real or fake
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass for the discriminator."""
        return self.model(x)


In [None]:
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to check your discriminator implementation
discriminator = Discriminator()
tests.check_discriminator(discriminator)

### Exercise: create the generator

The generator's job creates the "fake images" and learns the dataset distribution. You have three constraints here:
* the generator takes as input a vector of dimension `[batch_size, latent_dimension, 1, 1]`
* the generator must outputs **64x64x3 images**

Feel free to get inspiration from the different architectures we talked about in the course, such as DCGAN, WGAN-GP or DRAGAN.

#### Some tips:
* to scale up from the latent vector input, you can use `ConvTranspose2d` layers
* as often with Gan, **Batch Normalization** helps with training

In [None]:
class Generator(nn.Module):
    def __init__(self, latent_dim: int):
        super(Generator, self).__init__()
        
        # Define the generator network
        self.model = nn.Sequential(
            # First layer: latent vector -> 4x4x512 feature maps
            nn.ConvTranspose2d(latent_dim, 512, kernel_size=4, stride=1, padding=0),  # 1x1 -> 4x4
            nn.BatchNorm2d(512),
            nn.ReLU(True),
            
            # Second layer: 4x4x512 -> 8x8x256 feature maps
            nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1),  # 4x4 -> 8x8
            nn.BatchNorm2d(256),
            nn.ReLU(True),
            
            # Third layer: 8x8x256 -> 16x16x128 feature maps
            nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1),  # 8x8 -> 16x16
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            
            # Fourth layer: 16x16x128 -> 32x32x64 feature maps
            nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),  # 16x16 -> 32x32
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            
            # Fifth layer: 32x32x64 -> 64x64x3 (RGB image)
            nn.ConvTranspose2d(64, 3, kernel_size=4, stride=2, padding=1),  # 32x32 -> 64x64
            nn.Tanh()  # Output range in [-1, 1]
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass for the generator."""
        x = x.view(x.size(0), x.size(1), 1, 1)  # Reshape the latent vector to start with 1x1 spatial dimensions
        return self.model(x)


In [None]:
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to verify your generator implementation
latent_dim = 128
generator = Generator(latent_dim)
tests.check_generator(generator, latent_dim)

## Optimizer

In the following section, we create the optimizers for the generator and discriminator. You may want to experiment with different optimizers, learning rates and other hyperparameters as they tend to impact the output quality.

### Exercise: implement the optimizers

I've added two new parameters **lr** and **betas** for finetuning the optimizers.

In [None]:
import torch.optim as optim
from torch.nn import Module

def create_optimizers(generator: Module, discriminator: Module, lr: float = 0.0002, betas=(0.5, 0.999)):
    """
    This function returns the optimizers for the generator and the discriminator.
    
    Args:
    - generator: The generator model.
    - discriminator: The discriminator model.
    - lr: Learning rate for the optimizers (default is 0.0002).
    - betas: Betas for the Adam optimizer (default is (0.5, 0.999)).
    
    Returns:
    - g_optimizer: Optimizer for the generator.
    - d_optimizer: Optimizer for the discriminator.
    """
    
    # Optimizer for the generator
    g_optimizer = optim.Adam(generator.parameters(), lr=lr, betas=betas)
    
    # Optimizer for the discriminator
    d_optimizer = optim.Adam(discriminator.parameters(), lr=lr, betas=betas)
    
    return g_optimizer, d_optimizer

## Losses implementation

In this section, we are going to implement the loss function for the generator and the discriminator. You can and should experiment with different loss function.

Some tips:
* You can choose the commonly used the binary cross entropy loss or select other losses we have discovered in the course, such as the Wasserstein distance.
* You may want to implement a gradient penalty function as discussed in the course. It is not required and the code will work whether you implement it or not.

### Exercise: implement the generator loss

The generator's goal is to get the discriminator to think its generated images (= "fake" images) are real.

Because I am going to use the **Wasserstein** loss I must implement a function for the Lipschitz Criterion.

In [None]:
def generator_loss(fake_logits):
    """
    Wasserstein Generator Loss.
    
    Args:
    - fake_logits (torch.Tensor): The critic scores for the generated (fake) images.
    
    Returns:
    - loss (torch.Tensor): The generator loss to be minimized.
    """
    # In WGAN, we want to maximize the critic's output on fake images
    # Equivalent to minimizing -fake_logits
    loss = -torch.mean(fake_logits)
    
    return loss

### Exercise: implement the discriminator loss

We want the discriminator to give high scores to real images and low scores to fake ones and the discriminator loss should reflect that.

In [None]:
# This is not needed, because of the Wasserstein distance (and the modified discriminator (no sigmoid) becomes a critic)
# def discriminator_loss(real_logits, fake_logits):
#    """ Discriminator loss, takes the fake and real logits as inputs. """
#    # TODO: implement the discriminator loss 
#    return loss

def critic_loss(real_logits, fake_logits):
    """
    Wasserstein Critic Loss.
    
    Args:
    - real_logits (torch.Tensor): The critic scores for real images.
    - fake_logits (torch.Tensor): The critic scores for generated (fake) images.
    
    Returns:
    - loss (torch.Tensor): The critic loss to be minimized.
    """
    # Wasserstein Loss for Critic: maximize E[critic(real)] - E[critic(fake)]
    loss = -torch.mean(real_logits) + torch.mean(fake_logits)
    
    return loss

### Exercise (Optional): Implement the gradient Penalty

In the course, we discussed the importance of gradient penalty in training certain types of Gans. Implementing this function is not required and depends on some of the design decision you made (discriminator architecture, loss functions).

In [None]:
# def gradient_penalty(discriminator, real_samples, fake_samples):
#    """ This function enforces """
#    gp = 0
#    # TODO (Optional): implement the gradient penalty
#    return gp

import torch
import torch.autograd as autograd

def gradient_penalty(critic, real_samples, fake_samples, device):
    """
    Computes the gradient penalty for WGAN-GP.
    
    Args:
    - critic (torch.nn.Module): The critic (discriminator) model.
    - real_samples (torch.Tensor): Batch of real images.
    - fake_samples (torch.Tensor): Batch of generated images.
    - device (torch.device): The device to run computations on (e.g., 'cuda' or 'cpu').
    
    Returns:
    - gp (torch.Tensor): The gradient penalty value.
    """
    # Step 1: Interpolate between real and fake samples
    batch_size = real_samples.size(0)
    alpha = torch.rand(batch_size, 1, 1, 1, device=device)  # Random weight for interpolation
    interpolates = alpha * real_samples + (1 - alpha) * fake_samples
    interpolates = interpolates.to(device)

    # Step 2: Get critic scores for the interpolated samples
    interpolates.requires_grad_(True)  # Enable gradient calculation
    critic_interpolates = critic(interpolates)

    # Step 3: Compute gradients with respect to the interpolated samples
    gradients = autograd.grad(
        outputs=critic_interpolates,
        inputs=interpolates,
        grad_outputs=torch.ones_like(critic_interpolates, device=device),  # Same shape as critic output
        create_graph=True,
        retain_graph=True,
        only_inputs=True
    )[0]  # Gradient tensor

    # Step 4: Compute gradient norm
    gradients = gradients.view(batch_size, -1)  # Flatten the gradients
    gradient_norm = gradients.norm(2, dim=1)  # Compute L2 norm per sample

    # Step 5: Compute gradient penalty as (||gradient||_2 - 1)^2
    gp = torch.mean((gradient_norm - 1) ** 2)

    return gp


## Training


Training will involve alternating between training the discriminator and the generator. You'll use your functions real_loss and fake_loss to help you calculate the discriminator losses.

* You should train the discriminator by alternating on real and fake images
* Then the generator, which tries to trick the discriminator and should have an opposing loss function

### Exercise: implement the generator step and the discriminator step functions

Each function should do the following:
* calculate the loss
* backpropagate the gradient
* perform one optimizer step

In [None]:
# We are building a Wasserstein GAN so a slightly modified signature for one generator step must be used.
# def generator_step(batch_size: int, latent_dim: int) -> Dict:
#    """ One training step of the generator. """
#    # TODO: implement the generator step (foward pass, loss calculation and backward pass)
#    return {'loss': g_loss}

import torch
from typing import Dict

def generator_step(generator: torch.nn.Module, critic: torch.nn.Module, g_optimizer: torch.optim.Optimizer, 
                   batch_size: int, latent_dim: int, device: torch.device) -> Dict:
    """
    One training step for the generator in WGAN.
    
    Args:
    - generator (torch.nn.Module): The generator model.
    - critic (torch.nn.Module): The critic (discriminator) model.
    - g_optimizer (torch.optim.Optimizer): Optimizer for the generator.
    - batch_size (int): The batch size.
    - latent_dim (int): The latent dimension size.
    - device (torch.device): Device to run the models (e.g. 'cuda' or 'cpu').
    
    Returns:
    - Dict: A dictionary containing the generator loss.
    """
    # Step 1: Sample random latent vectors
    z = torch.randn(batch_size, latent_dim, device=device)

    # Step 2: Generate fake images
    fake_images = generator(z)

    # Step 3: Get the critic's output for the fake images
    fake_logits = critic(fake_images)

    # Step 4: Compute the generator loss (Wasserstein loss)
    g_loss = generator_loss(fake_logits)

    # Step 5: Backpropagation and optimization step for the generator
    g_optimizer.zero_grad()  # Clear any accumulated gradients
    g_loss.backward()  # Backpropagate the loss
    g_optimizer.step()  # Update the generator's weights

    # Return the generator loss for tracking purposes
    return {'loss': g_loss.item()}


# The same goes for the discriminator step, some modifications for the signature of the function first (additional parameters for the critic etc.)
# def discriminator_step(batch_size: int, latent_dim: int, real_images: torch.Tensor) -> Dict:
#    """ One training step of the discriminator. """
#    # TODO: implement the discriminator step (foward pass, loss calculation and backward pass)
#    return {'loss': d_loss, 'gp': gp}

from typing import Dict

def discriminator_step(generator: torch.nn.Module, critic: torch.nn.Module, d_optimizer: torch.optim.Optimizer, 
                       batch_size: int, latent_dim: int, real_images: torch.Tensor, device: torch.device, 
                       lambda_gp: float = 10.0) -> Dict:
    """
    One training step for the discriminator (critic) in WGAN-GP.
    
    Args:
    - generator (torch.nn.Module): The generator model.
    - critic (torch.nn.Module): The critic (discriminator) model.
    - d_optimizer (torch.optim.Optimizer): Optimizer for the critic.
    - batch_size (int): The batch size.
    - latent_dim (int): The latent dimension size.
    - real_images (torch.Tensor): Batch of real images.
    - device (torch.device): Device to run the models (e.g. 'cuda' or 'cpu').
    - lambda_gp (float): Weight of the gradient penalty term (default is 10.0).
    
    Returns:
    - Dict: A dictionary containing the critic loss and gradient penalty.
    """
    # Step 1: Sample random latent vectors
    z = torch.randn(batch_size, latent_dim, device=device)

    # Step 2: Generate fake images
    fake_images = generator(z)

    # Step 3: Get the critic's output for real and fake images
    real_logits = critic(real_images)
    fake_logits = critic(fake_images.detach())  # Detach so that gradients are not propagated to the generator

    # Step 4: Compute the Wasserstein critic loss
    d_loss = critic_loss(real_logits, fake_logits)

    # Step 5: Compute the gradient penalty
    gp = gradient_penalty(critic, real_samples=real_images, fake_samples=fake_images, device=device)

    # Step 6: Combine critic loss with gradient penalty
    total_d_loss = d_loss + lambda_gp * gp

    # Step 7: Backpropagation and optimization step for the critic
    d_optimizer.zero_grad()  # Clear any accumulated gradients
    total_d_loss.backward()  # Backpropagate the loss
    d_optimizer.step()  # Update the critic's weights

    # Return the critic loss and gradient penalty for tracking purposes
    return {'loss': total_d_loss.item(), 'gp': gp.item()}

### Main training loop

You don't have to implement anything here but you can experiment with different hyperparameters.

In [None]:
from datetime import datetime

In [None]:
# you can experiment with different dimensions of latent spaces
latent_dim = 128

# update to cpu if you do not have access to a gpu
device = 'cuda'

# number of epochs to train your model
n_epochs = 15

# number of images in each batch
batch_size = 128

In [None]:
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
print_every = 50

# Create optimizers for the discriminator D and generator G
generator = Generator(latent_dim).to(device)
discriminator = Discriminator().to(device)
g_optimizer, d_optimizer = create_optimizers(generator, discriminator)

dataloader = DataLoader(dataset, 
                        batch_size=batch_size, 
                        shuffle=True, 
                        num_workers=4, 
                        drop_last=True,
                        pin_memory=False)

In [None]:
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""

def display(fixed_latent_vector: torch.Tensor):
    """ helper function to display images during training """
    fig = plt.figure(figsize=(14, 4))
    plot_size = 16
    for idx in np.arange(plot_size):
        ax = fig.add_subplot(2, int(plot_size/2), idx+1, xticks=[], yticks=[])
        img = fixed_latent_vector[idx, ...].detach().cpu().numpy()
        img = np.transpose(img, (1, 2, 0))
        img = denormalize(img)
        ax.imshow(img)
    plt.show()

### Exercise: implement the training strategy

You should experiment with different training strategies. For example:

* train the generator more often than the discriminator. 
* added noise to the input image
* use label smoothing

Implement with your training strategy below.

In [None]:
import torch
from datetime import datetime

fixed_latent_vector = torch.randn(16, latent_dim, 1, 1).float().cuda()

losses = []
for epoch in range(n_epochs):
    for batch_i, real_images in enumerate(dataloader):
        real_images = real_images.to(device)  # Move real images to the correct device (GPU/CPU)

        ####################################
        # Training strategy implementation
        ####################################
        
        # Step 1: Train the critic (discriminator) with gradient penalty
        d_loss = discriminator_step(generator, discriminator, d_optimizer, batch_size, latent_dim, real_images, device)
        
        # Step 2: Train the generator every 4 critic steps
        if batch_i % 3 == 0: # changed that to experiment with the results
            g_loss = generator_step(generator, discriminator, g_optimizer, batch_size, latent_dim, device)
        
        ####################################
        
        # Print and store the losses at intervals
        if batch_i % print_every == 0:
            # Append discriminator loss and generator loss
            d = d_loss['loss']  # Critic loss
            g = g_loss['loss']  # Generator loss
            losses.append((d, g))
            
            # Print discriminator and generator loss
            time = str(datetime.now()).split('.')[0]
            print(f'{time} | Epoch [{epoch+1}/{n_epochs}] | Batch {batch_i}/{len(dataloader)} | d_loss: {d:.4f} | g_loss: {g:.4f}')
    
    # After every epoch, display generated images using the fixed latent vector
    generator.eval()  # Set generator to evaluation mode
    with torch.no_grad():  # Disable gradient calculation for faster evaluation
        generated_images = generator(fixed_latent_vector)
    
    # Display generated images (replace 'display' with your preferred visualization method)
    display(generated_images)
    
    generator.train()  # Set generator back to training mode

### Training losses

Plot the training losses for the generator and discriminator.

In [None]:
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
fig, ax = plt.subplots()
losses = np.array(losses)
plt.plot(losses.T[0], label='Discriminator', alpha=0.5)
plt.plot(losses.T[1], label='Generator', alpha=0.5)
plt.title("Training Losses")
plt.legend()

### Question: What do you notice about your generated samples and how might you improve this model?
When you answer this question, consider the following factors:
* The dataset is biased; it is made of "celebrity" faces that are mostly white
* Model size; larger models have the opportunity to learn more features in a data feature space
* Optimization strategy; optimizers and number of epochs affect your final result
* Loss functions

**Answer:** 

The generated faces are increasingly shaped per epoch until it comes to an apparent standstill. The changes to the faces then become smaller and smaller. What is also noticeable is that the training often gets stuck in the noise. The only thing to do then is to stop and start again. This is due to the fact that GANs are generally difficult to train. There are some random variables such as the noise, the learning rates (of the ADAM optimizer) or that the discriminator or generator become too weak/strong, and the initialization of the weights, etc. All those factors make the training of GANs somewhat unstable. 

Another interesting fact is that due to the stochastic nature of a WGAN, the discriminator often becomes too strong, so that the generator no longer has any significant learning progress. Then the training has no progress again.

Opportunities for **improvement**:

- Statiblize noise vector with fixed seed (see code above)
- Hyperparameter tuning
- Modification of lambda for the gradient penalty
- Change ratio to discriminator and generator training (3 Discriminator and 1 Generator seems to be a good choice)

### Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_face_generation.ipynb".  

Submit the notebook using the ***SUBMIT*** button in the bottom right corner of the Project Workspace.