# Making DCGAN to generate faces and fun things using TensorFlow 2 & Keras

We are going to train GAN for generating faces and then we will make fun playing with it. Generative adversarial networks (GANs) are deep neural net architectures comprised of two nets, pitting one against the other (thus the “adversarial”). One neural network, called the generator, generates new faces, while the other, the discriminator, decides whether each instance of face it reviews belongs to the actual training dataset or not.

We will use aligned faces of celebrities to train our GAN and make animations to visualize results!

TF1 model source: http://bamos.github.io/2016/08/09/deep-completion/

## Prerequisites

In this section we will install some useful packages and extensions

In [None]:
!pip install -U nb_black watermark

In [None]:
%load_ext lab_black
%load_ext watermark

%watermark -v -m -p numpy,matplotlib,tensorflow,imageio

## Common imports and variables

In [None]:
import os
import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt


%matplotlib inline


IMAGE_SIZE_NO_CROP = 256  # Size of image before cropping
IMAGE_SIZE = 64  # Shapes of input image
BATCH_SIZE = 64  # Batch size
DATA_PATH = "/kaggle/input/celeba-dataset/img_align_celeba"
RANDOM_SEED = 42

tf.random.set_seed(RANDOM_SEED)

Checking available GPUs

In [None]:
print(tf.config.experimental.list_physical_devices("GPU"))
print(tf.test.gpu_device_name())

# Preparing the dataset

Here we will check and prepare our data. 
We need the faces only. Images in the dataset are centered on eyes, so we will crop faces utilizing that fact.

I've found caching is extremely useful in this task. The whole dataset can be put into memory if you have >12GB RAM.
Prefetching will also help us to utilize resources better.

Sometimes image_dataset_from_directory is slow as fuck. Also Kaggle won't let us to cache everything in memory and will kill the kernel during the training, that's frustrating.

Nevertheless training on the whole dataset will take some time (first epoch with BATCH_SIZE=64 takes ~1800 seconds to finish with GPU accelerator here, ~1300 seconds for BATCH_SIZE=512, after caching it's ~300 seconds per epoch).

In [None]:
num_images = len(os.listdir(os.path.join(DATA_PATH, "img_align_celeba")))
print(f"Num images: {num_images}")

In [None]:
celeb_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    DATA_PATH,
    label_mode=None,
    color_mode="rgb",
    batch_size=BATCH_SIZE,
    image_size=(IMAGE_SIZE_NO_CROP, IMAGE_SIZE_NO_CROP),
    seed=RANDOM_SEED,
)

In [None]:
CACHE_FILE = "cache"


def process(image):
    #     images are centered on eyes, we will crop faces utilizing that fact
    height, width = image.shape[1], image.shape[2]

    offset_height = int(height * 0.35)
    offset_width = int(height * 0.27)

    image = tf.image.crop_to_bounding_box(
        image, offset_height, offset_width, int(width * 0.45), int(height * 0.45)
    )
    image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE], preserve_aspect_ratio=True)

    image = tf.cast((image - 127.5) / 127.5, tf.float32)
    return image


celeb_dataset = (
    celeb_dataset.map(process)
    .prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    .cache(filename=CACHE_FILE)
)

Let's see what we have got

In [None]:
fig = plt.figure(figsize=(8, 4), constrained_layout=True)


for images in celeb_dataset.take(1):
    for i in range(8):
        ax = plt.subplot(2, 4, i + 1)
        plt.imshow((images[i].numpy() * 127.5 + 127.5).astype("uint8"))
        plt.axis("off")

Nice!

# Defining a network

We will build two models: generator and discriminator.
The generator is producing images from the noise while the discriminator are trying to distinguish those images from the faces of celebrities.

### Some network parameters

In [None]:
Z_DIM = 100  # Dimension of face's manifold
GENERATOR_DENSE_SIZE = 512  # Length of first tensor in generator
N_CHANNELS = 3  # Number channels of input image
NUM_CONV_DISCRIMINATOR = 4  # amount of convolution layers in discriminator model

### Generator


Generator has the folllowing architecture:

<img src="http://bamos.github.io/data/2016-08-09/discrim-architecture.png">

Here we have dense input and the rest of layers are transposed convolutions.

A transposed convolution will reverse the spatial transformation of a regular convolution with the same parameters.
If you perform a regular convolution followed by a transposed convolution and both have the same settings (kernel size, padding, stride), then the input and output will have the same shape. This makes it super easy to build encoder-decoder networks with them. 

Here are some notes on the architecture of the generator:
1. The deeper the convolution, the less filters it uses.
1. Deconvolutions-relu layers are applied to achieve input image shape.
1. Batch normalization is used before nonlinearity for speed and stability of learning.
1. Tanh activation at the end of network allows to scale images to [-1, 1].
1. To force generator not to collapse and produce different outputs bias is initialized with zero.

In [None]:
from tensorflow.keras import layers
from tensorflow.keras import Sequential


def make_generator_model():
    model = Sequential()
    model.add(
        layers.Dense(
            4 * 4 * GENERATOR_DENSE_SIZE,
            use_bias=False,
            input_shape=(Z_DIM,),
        )
    )
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Reshape((4, 4, GENERATOR_DENSE_SIZE)))

    assert model.output_shape == (
        None,
        4,
        4,
        GENERATOR_DENSE_SIZE,
    )  # Note: None is the batch size

    depth_mul = 1  # Depth decreases as spatial component increases.
    size = 4  # Size increases as depth decreases.

    while size < IMAGE_SIZE // 2:
        filters = int(GENERATOR_DENSE_SIZE * depth_mul)
        model.add(
            layers.Conv2DTranspose(
                filters,
                (5, 5),
                strides=(2, 2),
                padding="same",
                use_bias=False,
            )
        )
        assert model.output_shape == (
            None,
            size * 2,
            size * 2,
            filters,
        )
        model.add(layers.BatchNormalization())
        model.add(layers.LeakyReLU())

        size *= 2
        depth_mul /= 2

    model.add(
        layers.Conv2DTranspose(
            3,
            (5, 5),
            strides=(2, 2),
            padding="same",
            use_bias=False,
            activation="tanh",
        )
    )
    assert model.output_shape == (None, IMAGE_SIZE, IMAGE_SIZE, N_CHANNELS)

    return model

In [None]:
generator = make_generator_model()

### Discriminator


Discriminator takes 3D tensor as input and outputs one number that is a probability of input being a face. Its architecture is quite similar to "reverse" generator.

In [None]:
def make_discriminator_model():
    model = Sequential()

    for i in range(NUM_CONV_DISCRIMINATOR):
        model.add(
            layers.Conv2D(
                64 * 2 ** i,
                (5, 5),
                strides=(2, 2),
                padding="same",
                input_shape=[IMAGE_SIZE, IMAGE_SIZE, N_CHANNELS],
            )
        )
        model.add(layers.LeakyReLU())
        model.add(layers.Dropout(0.2))

    model.add(layers.Flatten())
    model.add(layers.Dense(1))

    return model

In [None]:
discriminator = make_discriminator_model()

### Loss functions


We will use the following loss functions:
$$ D\_loss = \frac{-1}{m} \sum_{i=1}^{m}[\log{D(x_i)} + \log{(1 - D(G(z_i)))}]$$
$$ G\_loss = \frac{1}{m} \sum_{i=1}^{m} \log{(1 - D(G(z_i)))}$$

In [None]:
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)


def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss


def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

### Optimizers

There are different optimizers for discriminator and generator

In [None]:
from tensorflow.keras import optimizers


generator_optimizer = optimizers.Adam(1e-4)
discriminator_optimizer = optimizers.Adam(1e-4)

# Training

Here we will define some functions to determine the training process. We will train both models simultaneously.
The generator will be taking BATCH_SIZE random vectors at the every step and make images. The discriminator at first will check true images, then output and finally we will compute losses using both real and fake ones.

The model will be trained for 30 epochs. Every 15 epochs the weights of the model are being saved.
We will show progress on generating 8 images for the same random vectors after the end of every epoch.

In [None]:
EPOCHS = 30
NUM_SAMPLES_TO_GENERATE = 8
NUM_CHECKPOINT = 10


# You will reuse this seed overtime (so it's easier)
# to visualize progress in the animated GIF)
seed = tf.random.uniform([NUM_SAMPLES_TO_GENERATE, Z_DIM])

In [None]:
checkpoint_dir = "./training_checkpoints"

checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")

checkpoint = tf.train.Checkpoint(
    generator_optimizer=generator_optimizer,
    discriminator_optimizer=discriminator_optimizer,
    generator=generator,
    discriminator=discriminator,
)

In [None]:
# Notice the use of `tf.function`
# This annotation causes the function to be "compiled".
@tf.function
def train_step(images):
    noise = tf.random.uniform([BATCH_SIZE, Z_DIM])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator(noise, training=True)

        real_output = discriminator(images, training=True)
        fake_output = discriminator(generated_images, training=True)

        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

        gradients_of_generator = gen_tape.gradient(
            gen_loss, generator.trainable_variables
        )
        gradients_of_discriminator = disc_tape.gradient(
            disc_loss, discriminator.trainable_variables
        )

        generator_optimizer.apply_gradients(
            zip(gradients_of_generator, generator.trainable_variables)
        )
        discriminator_optimizer.apply_gradients(
            zip(gradients_of_discriminator, discriminator.trainable_variables)
        )

In [None]:
from IPython.display import clear_output
import time


def train(dataset, epochs):
    for epoch in range(epochs):
        start = time.time()

        for image_batch in dataset:
            train_step(image_batch)

        # Produce images for the GIF as you go
        clear_output(wait=True)
        generate_and_save_images(generator, epoch + 1, seed)

        # Save the model every 15 epochs
        if (epoch + 1) % NUM_CHECKPOINT == 0:
            checkpoint.save(file_prefix=checkpoint_prefix)

        print("Time for epoch {} is {} sec".format(epoch + 1, time.time() - start))

    # Generate after the final epoch
    clear_output(wait=True)
    generate_and_save_images(generator, epochs, seed)

In [None]:
def generate_and_save_images(model, epoch, test_input):
    # Notice `training` is set to False.
    # This is so all layers run in inference mode (batchnorm).
    predictions = model(test_input, training=False)

    fig = plt.figure(figsize=(8, 4), constrained_layout=True)

    for i in range(predictions.shape[0]):
        plt.subplot(2, 4, i + 1)
        plt.imshow((predictions[i].numpy() * 127.5 + 127.5).astype("uint8"))
        plt.axis("off")

    plt.savefig("image_at_epoch_{:04d}.png".format(epoch))
    plt.show()

Run next cell to train and subsequent cell to load model

In [None]:
train(celeb_dataset, EPOCHS)

In [None]:
# to load weights execute this
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

# Visualizing the results

Let's make an animation showing the progress of our model

In [None]:
import imageio
import glob
from IPython.display import Image


anim_file = "dcgan.gif"

filenames = glob.glob("image_at_epoch_*.png")
images = [imageio.imread(filename) for filename in sorted(filenames)]
imageio.mimsave(anim_file, images, fps=8)

# to show in notebook
# Image(anim_file)

<img src="https://media.giphy.com/media/AYvPzML6qfOPVGZ3I8/giphy.gif">

# Face interpolation

Our model performs quite well. Time for fun things. 

At first, let's try to interpolate between faces: we will generate two vectors $z_1$ and $z_2$ and get a batch of vectors of the form $\alpha\cdot z_1 + (1- \alpha)\cdot  z_2, \alpha \in [0,1]$ for generating faces on them and looking at results.

In [None]:
vectors = tf.random.uniform([16, Z_DIM])

predictions = generator(vectors, training=False)

In [None]:
fig = plt.figure(figsize=(8, 8), constrained_layout=True)

for i in range(predictions.shape[0]):
    plt.subplot(4, 4, i + 1).set_title(i)
    plt.imshow((predictions[i].numpy() * 127.5 + 127.5).astype("uint8"))
    plt.axis("off")

Let's take 6 and 14. The first is looking more feminine, so it's a woman, the second is a man.

In [None]:
NUM_ALPHAS = 30

idx_first, idx_second = 6, 14
alphas = np.linspace(0, 1, NUM_ALPHAS)

to_interpolate = [vectors[idx_first]]
for alpha in alphas[::-1]:
    to_interpolate.append(
        alpha * vectors[idx_first] + (1 - alpha) * vectors[idx_second]
    )

to_interpolate.append(vectors[idx_second])
to_interpolate = np.array(to_interpolate)

predictions = generator(to_interpolate, training=False)

We will save images for every prediction and two original vectors

In [None]:
for i in range(predictions.shape[0]):
    fig = plt.figure(figsize=(4, 4), constrained_layout=True)
    plt.imshow((predictions[i].numpy() * 127.5 + 127.5).astype("uint8"))
    plt.axis("off")
    plt.savefig(f"face_{i:02d}.png")
    plt.close()

And make animation!

In [None]:
anim_file = "face.gif"

filenames = glob.glob("face*.png")
images = [imageio.imread(filename) for filename in sorted(filenames)]
imageio.mimsave(anim_file, images, fps=5)

# to show in notebook
# Image(anim_file)

<img src="https://media.giphy.com/media/pvqX59RXU4GAKMUqm2/giphy.gif"></img>

Very cool! Isn't it?

# Making smiling faces

Here we will check for some faces that are smiling and some that are not to extract "smiling" vector. Later we will apply that vector to inputs for getting smiliing images as outputs and vice versa.

We denote a "smile vector" as mean of vectors z with generated smile on it minus mean of vectors z without generated smile on it.

### Building a smile vector

In [None]:
to_test = tf.random.uniform([36, Z_DIM])


predictions = generator(to_test, training=False)

fig = plt.figure(figsize=(16, 16))

for i in range(predictions.shape[0]):
    plt.subplot(6, 6, i + 1).set_title(i)
    plt.imshow((predictions[i].numpy() * 127.5 + 127.5).astype("uint8"))
    plt.axis("off")

It's actually hard to find not smiling ones. I've executed previous cell for several times.

Some faces are really strange. Who's 10th guy? Tiger man? The hell is 29th?

Let's choose up to 5 images for the following groups:
1. Big smiles: 4, 8, 14, 30, 31
2. No smiles: 3 (poker face), 6 (reminds me of vampires), 15, 25, 26

In [None]:
smiliing_indices = [4, 8, 14, 30, 31]
not_smiling_indices = [3, 6, 15, 25, 26]

smiliing_array = np.array([to_test[i] for i in smiliing_indices])
not_smiling_array = np.array([to_test[i] for i in not_smiling_indices])

In [None]:
def predict_and_plot(array, title=None):
    predictions = generator(array, training=False)

    fig = plt.figure(figsize=(16, 4))
    if title:
        fig.suptitle(title)

    for i in range(predictions.shape[0]):
        plt.subplot(1, 5, i + 1)
        plt.imshow((predictions[i].numpy() * 127.5 + 127.5).astype("uint8"))
        plt.axis("off")

In [None]:
predict_and_plot(smiliing_array, "Smiling faces")

In [None]:
predict_and_plot(not_smiling_array, "Not smiling faces")

That's what we have chosen to build a "smile vector"

In [None]:
smile_arr_vec = smiliing_array.mean(axis=0)
not_smile_arr_vec = not_smiling_array.mean(axis=0)

smile_vec = smile_arr_vec - not_smile_arr_vec

### Applying smile vector


Time to apply it to smiling faces, not smiling faces, it's also worth to try to apply anti-smiling vector that should make faces sad. Oof!

#### Not smiling faces

In [None]:
predict_and_plot(not_smiling_array, "Original")
predict_and_plot(not_smiling_array + smile_vec, "+ smile vector")
predict_and_plot(not_smiling_array - smile_vec, "- smile vector")

Gosh! The faces at the last row are looking so judgmental. "Look what you've done"

#### Smiling faces

In [None]:
predict_and_plot(smiliing_array, "Original")
predict_and_plot(smiliing_array + smile_vec, "+ smile vector")
predict_and_plot(smiliing_array - smile_vec, "- smile vector")