# Generative Adversarial Networks (GANs)

This is adapted from [this TF tutorial](https://www.tensorflow.org/tutorials/generative/dcgan), as well as these: [the Chollet notebook](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter12_part05_gans.ipynb), itself a port of [this Keras tutorial](https://keras.io/examples/generative/dcgan_overriding_train_step/).


In [1]:
import os
import sys
import pathlib
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

In [2]:
# reminder: Colab code to mount your drive
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive')  # 'My Drive' is the default name of Google Drives
    os.chdir('drive/My Drive/2023-DMLAP/DMLAP')
    os.listdir() 

## Getting the CelebA dataset

### Colab

```bash
!mkdir -p datasets/dcgan_celeba
!gdown --id 1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684 -O datasets/dcgan_celeba/data.zip
!unzip -qq datasets/dcgan_celeba/data.zip -d datasets/dcgan_celeba
```

### Locally

Create a directory called `datasets` then download manually from [here](https://drive.google.com/uc?id=1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684) (1.4 GB!) and unzip to a  (so your directory looks like `DMLAP/python/datasets/dcgan_celeba/img_align_celeba`.

Or use [gdown](https://pypi.org/project/gdown/):

```python
!pip install gdown # in colab
```
```bash
 conda install -c conda-forge gdown # locally
```

With this code below:

In [2]:
from zipfile import ZipFile

celeba_dir = "datasets/dcgan_celeba"
extracted_dir = os.path.join(celeba_dir, "img_align_celeba")

if not os.path.isdir(extracted_dir):
    print("Downloading Celeba dataset")
    os.makedirs(celeba_dir, exist_ok=True)

    url = "https://drive.google.com/uc?id=1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684"
    fname = "datasets/dcgan_celeba/data.zip"
    gdown.download(url, fname, quiet=False)
    
    print("Unzipping")
    with ZipFile("datasets/dcgan_celeba/data.zip", "r") as zipobj:
        zipobj.extractall("datasets/dcgan_celeba")
else:
    print("Celeb directory exists")
print("Done")

Celeb directory exists
Done


In [3]:
basedir = pathlib.Path("datasets/dcgan_celeba")  # this will fail if you don't have a `dcgan_celeba` dir
imgdir = basedir / "img_align_celeba"          # with, inside it, another folder containing the images
outputdir = basedir / "generated"

if not os.path.isdir(outputdir):
    os.mkdir(outputdir)

batch_size = 32 # 128
img_size = 64

dataset = tf.keras.utils.image_dataset_from_directory(
    imgdir,
    label_mode=None,
    image_size=(img_size, img_size),
    batch_size=batch_size,
    smart_resize=True
)

Found 202599 files belonging to 1 classes.


In [4]:
ds_len = len(dataset)
print(f"{ds_len * batch_size} samples in {ds_len} batches")

202624 samples in 6332 batches


**Limit the dataset**

To make tests, and shorten the training time, you can limit the size of our dataset by defining a variable `num_batches` and use the `take()` method:
```python
num_batches = 300
dataset_short = dataset.take(num_batches)
```

**Rescaling the images**

In [5]:
dataset = dataset.map(lambda x: x / 255.)

**Displaying the first image**

In [None]:
for x in dataset:
    plt.axis("off")
    plt.imshow((x.numpy() * 255).astype("int32")[0])
    break

---

# The generator

In the Generator, the reverse operation of convolution is used to grow, rather than shrink, our image: the [`tf.keras.layers.Conv2DTranspose`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2DTranspose). See [this Stack Overflow answer](https://drive.google.com/uc?id=1SnOH8oSc-Nm8BnfgBscZ9ZoreU4JFpxN) and [this article](https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d) for more, and [this repo](https://github.com/vdumoulin/conv_arithmetic) for even more (summarised in [this video](https://www.youtube.com/watch?v=ilkSwsggSNM&list=PLTKMiZHVd_2KJtIXOW0zFhFfBaJJilH51&index=137)).

In [9]:
def build_generator(latent_dim = 100):
    return tf.keras.Sequential(
        [
            tf.keras.Input(shape=(latent_dim,)),
            tf.keras.layers.Dense(8 * 8 * 128),
            tf.keras.layers.Reshape((8, 8, 128)),
            tf.keras.layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding="same"),
            tf.keras.layers.LeakyReLU(alpha=0.2),
            tf.keras.layers.Conv2DTranspose(256, kernel_size=4, strides=2, padding="same"),
            tf.keras.layers.LeakyReLU(alpha=0.2),
            tf.keras.layers.Conv2DTranspose(512, kernel_size=4, strides=2, padding="same"),
            tf.keras.layers.LeakyReLU(alpha=0.2),
            tf.keras.layers.Conv2D(3, kernel_size=5, padding="same", activation="sigmoid"),
        ],
        name="generator",
    )

generator = build_generator()

In [10]:
generator.summary()

Model: "generator"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_1 (Dense)             (None, 8192)              1056768   
                                                                 
 reshape (Reshape)           (None, 8, 8, 128)         0         
                                                                 
 conv2d_transpose (Conv2DTr  (None, 16, 16, 128)       262272    
 anspose)                                                        
                                                                 
 leaky_re_lu_3 (LeakyReLU)   (None, 16, 16, 128)       0         
                                                                 
 conv2d_transpose_1 (Conv2D  (None, 32, 32, 256)       524544    
 Transpose)                                                      
                                                                 
 leaky_re_lu_4 (LeakyReLU)   (None, 32, 32, 256)       0 

---

## The discriminator

In [None]:
def build_discriminator():
    return tf.keras.Sequential(
        [
            tf.keras.Input(shape=(img_size, img_size, 3)),
            tf.keras.layers.Conv2D(64, kernel_size=4, strides=2, padding="same"),
            tf.keras.layers.LeakyReLU(alpha=0.2),
            tf.keras.layers.Conv2D(128, kernel_size=4, strides=2, padding="same"),
            tf.keras.layers.LeakyReLU(alpha=0.2),
            tf.keras.layers.Conv2D(128, kernel_size=4, strides=2, padding="same"),
            tf.keras.layers.LeakyReLU(alpha=0.2),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(1, activation="sigmoid"),
        ],
        name="discriminator",
    )
discriminator = build_discriminator()

In [None]:
discriminator.summary()

Model: "discriminator"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 32, 32, 64)        3136      
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 32, 32, 64)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 16, 16, 128)       131200    
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 16, 16, 128)       0         
                                                                 
 conv2d_2 (Conv2D)           (None, 8, 8, 128)         262272    
                                                                 
 leaky_re_lu_2 (LeakyReLU)   (None, 8, 8, 128)         0         
                                                                 
 flatten (Flatten)           (None, 8192)            

## The GAN Model

In [11]:
class GAN(tf.keras.Model):                                                  # subclassing `tf.keras.Model`
    def __init__(self, discriminator, generator, latent_dim):
        super().__init__()
        self.discriminator = discriminator
        self.generator = generator
        self.latent_dim = latent_dim
        self.d_loss_metric = tf.keras.metrics.Mean(name="d_loss")           # custom metrics
        self.g_loss_metric = tf.keras.metrics.Mean(name="g_loss")

    def compile(self, d_optimizer, g_optimizer, loss_fn):                   # `compile` required for `tf.keras.Model`
        super(GAN, self).compile()
        self.d_optimizer = d_optimizer
        self.g_optimizer = g_optimizer
        self.loss_fn = loss_fn

    @property
    def metrics(self):
        return [self.d_loss_metric, self.g_loss_metric]

    def train_step(self, real_images):                                       # `train_step` required for `tf.keras.Model`
        batch_size = tf.shape(real_images)[0]

        # 1. TRAIN DISCRIMINATOR --------------------------------------------------

        random_latent_vectors = tf.random.normal(                            # feed a batch of generated
            shape=(batch_size, self.latent_dim)                             
        )
        generated_images = self.generator(random_latent_vectors)
        combined_images = tf.concat([generated_images, real_images], axis=0) # & real images
        labels = tf.concat( # (fake: 1, real: 0)
            [tf.ones((batch_size, 1)), tf.zeros((batch_size, 1))],
            axis=0
        ) # (↓ label smoothing: inject randomness in the labels)
        labels += 0.05 * tf.random.uniform(tf.shape(labels))

        with tf.GradientTape() as tape:                                      # gradient logic:
            predictions = self.discriminator(combined_images)                # Discriminator predicts
            d_loss = self.loss_fn(labels, predictions)
        grads = tape.gradient(d_loss, self.discriminator.trainable_weights)  # gradients to update
        self.d_optimizer.apply_gradients(                                    # our Discriminator
            zip(grads, self.discriminator.trainable_weights)
        )
        self.d_loss_metric.update_state(d_loss) # update loss

        # 2. TRAIN GENERATOR ------------------------------------------------------

        random_latent_vectors = tf.random.normal(
            shape=(batch_size, self.latent_dim)
        )
        misleading_labels = tf.zeros((batch_size, 1)) # 0: real

        with tf.GradientTape() as tape:                                 # gradient logic:
            predictions = self.discriminator(                           # get predictions from Discriminator
                self.generator(random_latent_vectors)                   # from generated fake images
            )
            g_loss = self.loss_fn(misleading_labels, predictions)       # loss labels vs preds
        grads = tape.gradient(g_loss, self.generator.trainable_weights) # gradients to update
        self.g_optimizer.apply_gradients(                               # our Generator
            zip(grads, self.generator.trainable_weights)
        )
        self.g_loss_metric.update_state(g_loss) # update loss

        return {
            "d_loss": self.d_loss_metric.result(),
            "g_loss": self.g_loss_metric.result()
        }

**A callback that samples generated images during training**

In [12]:
class GANMonitor(tf.keras.callbacks.Callback):
    def __init__(self, num_img=3, latent_dim=128):
        self.num_img = num_img
        self.latent_dim = latent_dim

    def on_epoch_end(self, epoch, logs=None):
        random_latent_vectors = tf.random.normal(shape=(self.num_img, self.latent_dim))
        generated_images = self.model.generator(random_latent_vectors)
        generated_images *= 255
        generated_images.numpy()
        for i in range(self.num_img):
            img = tf.keras.utils.array_to_img(generated_images[i])
            img.save(outputdir / f"generated_img_{epoch:03d}_{i}.png")

**Compiling and training the GAN**

In [13]:
epochs = 10 # ← Chollet has 100, you need a good GPU & patience for that!
latent_dim = 100

gan = GAN(discriminator=discriminator, generator=generator, latent_dim=latent_dim)

gan.compile(
    d_optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), # very small learning rates!
    g_optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss_fn=tf.keras.losses.BinaryCrossentropy(),
)

**Reload a trained model**

In [14]:
reload = False # True to reload
generator_path = basedir / "generator_dcgan_celeba.h5"
discriminator_path = basedir / "discriminator_dcgan_celeba.h5"
if os.path.isfile(discriminator_path) and os.path.isfile(generator_path) and reload:
    gan.generator.load_weights(generator_path)
    gan.discriminator.load_weights(discriminator_path)

**Training**

In [None]:
gan.fit(
    dataset, epochs=epochs, callbacks=[
        GANMonitor(num_img=10, latent_dim=latent_dim),
    ]
)

**Check the results**

In [18]:
def plot_gan_images(gan):
    n = 12
    random_latent_vectors = tf.random.normal(shape=(n, latent_dim))
    generated_images = gan.generator(random_latent_vectors)
    generated_images = tf.cast(generated_images * 255, tf.uint8)

    # https://stackoverflow.com/a/54681765
    _, axs = plt.subplots(3, 4, figsize=(12, 12))
    axs = axs.flatten()
    for img, ax in zip(generated_images, axs):
        ax.axis('off')
        ax.imshow(img.numpy())
    plt.show()    

In [None]:
plot_gan_images(gan)

**Save and reload a model**

In [14]:
generator_model_path = basedir / "generator_dcgan_celeba.keras"
discriminator_model_path = basedir / "discriminator_dcgan_celeba.keras"

generator.save(generator_model_path)
discriminator.save(discriminator_model_path)

In [14]:
gan_model_path = basedir / "gan_dcgan_celeba.keras"
GAN.save(gan_model_path)

In [None]:
generator_reloaded = tf.keras.models.load_model(generator_model_path)
discriminator_reloaded = tf.keras.models.load_model(discriminator_model_path)

gan_reloaded = GAN(                                     # REBUILD GAN
    discriminator=discriminator_reloaded,
    generator=generator_reloaded, 
    latent_dim=generator_reloaded.input_shape[1]        # the input shape is (batch, latent_dim) 
)

In [39]:
discriminator.save_weights(discriminator_path)          # SAVE
generator.save_weights(generator_path)

In [39]:
discriminator_reloaded = build_discriminator()          # REBUILD D & G
generator_reloaded = build_generator()

discriminator_reloaded.load_weights(discriminator_path) # LOAD WEIGHTS
generator_reloaded.load_weights(generator_path)

gan_reloaded = GAN(                                     # REBUILD GAN
    discriminator=discriminator_reloaded,
    generator=generator_reloaded, 
    latent_dim=latent_dim
)

## Notes / Tricks

Here is a summary of some of the tricks Chollet mentions in his book, that are used in this implementation:

- Sample from the latent space using a **normal distribution** (Gaussian), not a uniform one;
- GANs are likely to get stuck in all sorts of ways (it's an unstable, dynamic equilibrium):  
  we introduce **random noise** to the labels for the discriminator to prevent this;
- Sparse gradients can hinder GAN training, remedy: **strided convolutions** for downsampling instead of max pooling, and the **`LeakyReLU`** instead of `ReLu`;
- To avoid checkerboard artifacts caused by unequal coverage of the pixel space in the generator, use a kernel size **divisible by the stride size** with strided `Conv2DTranspose` or `Conv2D`.

<small>*Deep Learning With Python*, 2<sup>nd</sup> ed., p.404</small>

Note also that, as is mentioned by Chintala (see lecture above), the labels for true/fake are reversed from the original formulation (here 0 is true, 1 is fake), that seems to improve stability.

## Experiments

The work that can be done here broadly falls into three main directions:
- *Freeze* the network, work on the dataset:
  - In this direction, most of your work is to gather datasets, and improve the ease of use. Are you able to develop a suite of tools that would allow you to handle datasets more easily? (In this case, the images are already cropped and the same size, which already takes some work! It would be nice to integrate tools that allow you to make this part of the work more streamlined: put any images in a folder, and a Python script crops them, etc.)? It might be worth looking into [data augmentation](https://www.tensorflow.org/tutorials/images/data_augmentation) (inject randomness into your image dataset).
  - It would be interesting to train GANs on generative images! You might end up with really distorted versions of what you started with.
  - It's likely that people have trained GANs on spectrograms, as we see now with diffusion, but it might be a real fun thing to try?
  - The image used for the week on text is a [book project by Allisson Parrish](https://www.aleator.press/releases/wendit-tnce-inf) that uses GANs to generate images of (unreadable) poems!
  - Also, people have created loops where they train GANs on their own outputs, which creates distortions that may be worth exploring.
- *Freeze* the dataset, work on the network:
  - Maybe there's one dataset that's really your focus, or you're happy to work with established material (even MNIST!, like in the TF tutorial mentioned at the top), or the whole data processing feels boring? You might then want to look into fiddling with the model, and gather tricks (for instance: do you see an improvement if you normalise your images to be between [-1,1] instead of [0,1], like here (your Generator will have to have a `tanh` rather than a `sigmoid` as its last layer, see again the TF tutorial)? Then of course there's the network themselves, where all sorts of parameters can be tweaked, from the number of layers, to the strides of the convolution... Many GAN implementation add `BatchNormalization` layers in the generator, that could be tried (beware, you then need to explicitly pass `training=False` when calling your model for predictions, which disables the BatchNorm layers.
  - **Note:** experimenting at a technical level with GANs (like with other things) can be a confusing rabbit hole. My recommendations are: make sure you have stable resources (e.g. you own a GPU or pay for Colab Pro), and try and make your net/dataset/experiments *as small/easy as possible*, so you can make a lot of them, get an inuition of what works and what doesn't. Perfect results really aren't the goal here, and it's never good for your momentum to have to wait hours or days before training finishes!
  - How do you document this process of experimentation? You would probably need to save the various parameters of your experimentation (for yourself and, perhaps, the viewer), and associate that with some images generated at this point. In the TF tutorial, they use [imageio to create gifs](https://www.tensorflow.org/tutorials/generative/dcgan#create_a_gif).
- *Freeze* both network and dataset, and try to use the network, or its output, in unexpected ways: one could imagine just training this network, or using a top-level StyleGAN (see below), and using the resulting images in some way, as material for something else? 

## Next Steps

### The State of the Art

The field has now moved away from GANs, as Diffusion has gained in popularity. The best results have probably been achieved by [Nvidia's StyleGan 3](https://nvlabs.github.io/stylegan3/) ([repo](https://github.com/NVlabs/stylegan3)). Check the [stylegan notebook](05_stylegan.ipynb) to check it out (on Colab!).

Another interesting option could be lucidrainss [Lightweight GAN](https://github.com/lucidrains/lightweight-gan) implementation.

### Zoos: list of all GAN variants

When it comes to GANs, just like Diffusion now, the explosion has been so enormous it is rather difficult (impossible?) to keep up:

- [Avinash Hindupur, "The GAN Zoo"](https://github.com/hindupuravinash/the-gan-zoo)
- [Jihye Back, "GAN-Zoos"](https://happy-jihye.github.io/gan/)