# Training a DCGAN I
## (Deep Convolutional Generative Adversarial Networks)

This is adapted from [this PyTorch example](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html), see also [the same as source code](https://github.com/pytorch/examples/blob/main/dcgan/main.py).

Some elements were retained from a previous version, inspired by [this TensorFlow tutorial](https://www.tensorflow.org/tutorials/generative/dcgan) (you can also look at [this Chollet notebook](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter12_part05_gans.ipynb), itself a port of [this Keras tutorial](https://keras.io/examples/generative/dcgan_overriding_train_step/)).

#### Install Imageio (to generate GIFs at the end)

```bash
conda install -c conda-forge imageio # locally (ships with Colab)
```

#### reminder: Colab code to mount your drive

```python
import sys
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/drive/')  # 'My Drive' is the default name of Google Drives
    os.chdir('drive/My Drive/IS53055B-DMLCP/DMLCP') # change to your favourite dir
```

In [None]:
import os
import pathlib

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

from IPython import display
from IPython.display import HTML
from IPython.display import Video

import torch
from torch import nn
import torch.nn.functional as F

import torchvision as tv
from torchvision.transforms import v2
import torchvision.transforms.functional as TF

# Get cpu, gpu or mps device for training
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

## Prepare our Dataset

### Train on [MNIST](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html)



In [None]:
# # Model / data parameters
BUFFER_SIZE = 60000
BATCH_SIZE = 128

LATENT_DIM = 100 # The size of the latent space/input vector

N_CHANNELS = 3 # 3 for colour
IMAGE_SHAPE = (N_CHANNELS,64,64) # C, H, W

G_DIM = 64
D_DIM = 64

# fixed directory structure -------------
DATASETS_DIR = pathlib.Path("datasets")
DATASETS_DIR.mkdir(exist_ok=True)

MODELS_DIR = pathlib.Path("models")
MODELS_DIR.mkdir(exist_ok=True)

GENERATED_DIR = pathlib.Path("generated")
GENERATED_DIR.mkdir(exist_ok=True)
# ----------------------------------------

MODEL_NAME = "dcgan_mnist" # change accordingly

DCGAN_DIR = MODELS_DIR / MODEL_NAME
DCGAN_DIR.mkdir(exist_ok=True)

GENERATOR_NAME = f"{MODEL_NAME}_g"
DISCRIMINATOR_NAME = f"{MODEL_NAME}_d"

# generated images
DCGAN_GEN_DIR = GENERATED_DIR / f"{MODEL_NAME}_images"
DCGAN_GEN_DIR.mkdir(exist_ok=True)

In [None]:
# utils

def denorm(x):
    """Denormalize the outputs from [-1, 1] to [0,1] (generator with 'tanh' activation)"""
    return (x * 0.5) + 0.5

In [None]:
transforms = v2.Compose([
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True), # from [0,255] to [0,1]
    v2.Resize(IMAGE_SHAPE[1]),
    v2.CenterCrop(IMAGE_SHAPE[1]),
    v2.Normalize(mean=[0.5]*N_CHANNELS, std=[0.5]*N_CHANNELS) # (x - mean)/std
])

train_data = tv.datasets.MNIST(
    root=DATASETS_DIR,
    train=True,
    download=True,
    transform=transforms,
)

for t in train_data:
    print(t[0].shape, t[0].dtype, t[0].min(), t[0].max())
    break

Create `DataLoader`s:

In [None]:
# Create data loaders.
train_dataloader = torch.utils.data.DataLoader(
    train_data,
    batch_size=BATCH_SIZE,
    drop_last=True
)

for X, y in train_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape} {X.dtype}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

### Inspect our Dataset

In [None]:
for i, t in enumerate(train_dataloader):
    x, y = t
    a = denorm(x[0])
    img = TF.to_pil_image(a)
    plt.axis("off")
    plt.imshow(img, cmap='gray')
    break

[`tv.utils.make_grid`](https://pytorch.org/vision/main/generated/torchvision.utils.make_grid.html)

In [None]:
for i, t in enumerate(train_dataloader):
    x, y = t
    plt.figure(figsize=(8,8))
    plt.axis("off")
    plt.title("Training Images")
    plt.imshow(TF.to_pil_image(tv.utils.make_grid(x[:64], padding=2, normalize=True).detach().cpu()))
    plt.show()
    break

In [None]:
ds_len = len(train_dataloader)
print(f"{ds_len * BATCH_SIZE} samples in {ds_len} batches")

## Create the models


In [None]:
# custom weights initialization called on ``G`` and ``D``
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

### The Generator

Source [here](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html#generator). The generator uses `nn.ConvTranspose2d` (upsampling) layers to produce an image from a seed (random noise). Start with a `Dense` layer that takes this seed as input, then upsample several times until you reach the desired image size of 28x28x1. Notice the `tf.keras.layers.LeakyReLU` activation for each layer, except the output layer which uses tanh.

In [None]:
class Generator(nn.Module):
    def __init__(self, latent_dim, output_dim, n_channels):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            #                  input, output, kernel, stride, padding
            nn.ConvTranspose2d(latent_dim, output_dim * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(output_dim * 8),
            nn.ReLU(True),
            # state size. (output_dim*8) x 4 x 4
            nn.ConvTranspose2d(output_dim * 8, output_dim * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(output_dim * 4),
            nn.ReLU(True),
            # state size. (output_dim*4) x 8 x 8
            nn.ConvTranspose2d(output_dim * 4, output_dim * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(output_dim * 2),
            nn.ReLU(True),
            # state size. (output_dim*2) x 16 x 16
            nn.ConvTranspose2d(output_dim * 2, output_dim, 4, 2, 1, bias=False),
            nn.BatchNorm2d(output_dim),
            nn.ReLU(True),
            # state size. (output_dim) x 32 x 32
            nn.ConvTranspose2d(output_dim, n_channels, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (n_channels) x 64 x 64
        )

    def forward(self, input):
        return self.main(input)

G = Generator(
    latent_dim=LATENT_DIM,
    output_dim=G_DIM,
    n_channels=N_CHANNELS
).to(device)

# Apply the ``weights_init`` function to randomly initialize all weights
#  to ``mean=0``, ``stdev=0.02``.
G.apply(weights_init)

# reloading
RELOAD_G_PATH = "" # change to existing path
if RELOAD_G_PATH != "":
    G.load_state_dict(torch.load(RELOAD_G_PATH))

print(G)
print()
print(f"Our model has {sum(p.numel() for p in G.parameters()):,} parameters.")

Let&rsquo;s see it&rsquo;s output before training



In [None]:
noise = torch.randn([1, LATENT_DIM, 1, 1]).to(device) # shape required by the convolution
print(noise.size())

with torch.no_grad():
    generator_output = G(noise)
    print(generator_output.shape, generator_output.min(), generator_output.max())
    # move to cpu, denormalize, turn into PIL
    img = TF.to_pil_image(
        denorm(generator_output[0].detach().cpu())
    )

plt.imshow(img, cmap='gray')

### The Discriminator

The discriminator is a CNN-based image classifier.

In [None]:
class Discriminator(nn.Module):
    def __init__(self, image_shape, input_dim):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            # input is (n_channels) x 64 x 64
            #         input, output, kernel, stride, padding
            nn.Conv2d(image_shape[0], input_dim, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (input_dim) x 32 x 32
            nn.Conv2d(input_dim, input_dim * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(input_dim * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (input_dim*2) x 16 x 16
            nn.Conv2d(input_dim * 2, input_dim * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(input_dim * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (input_dim*4) x 8 x 8
            nn.Conv2d(input_dim * 4, input_dim * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(input_dim * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (input_dim*8) x 4 x 4
            nn.Conv2d(input_dim * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        output = self.main(input)
        return output.view(-1, 1).squeeze(1)


D = Discriminator(
    image_shape=IMAGE_SHAPE,
    input_dim=64,
).to(device)

# Apply the ``weights_init`` function to randomly initialize all weights
#  to ``mean=0``, ``stdev=0.02``.
D.apply(weights_init)

# reloading
RELOAD_D_PATH = "" # change to existing path
if RELOAD_D_PATH != "":
    D.load_state_dict(torch.load(RELOAD_D_PATH))

print(D)
print()
print(f"Our model has {sum(p.numel() for p in D.parameters()):,} parameters.")

Use the (as yet untrained) discriminator to classify the generated images as real or fake. The model will be trained to output positive values for real images, and negative values for fake images.

In [None]:
with torch.no_grad():
    decision = D(generator_output).detach().cpu()
print(decision)

## Define the loss and optimizers

Define loss functions and optimizers for both models.


In [None]:
# Initialize the ``BCELoss`` function
criterion = nn.BCELoss()

# Establish convention for real and fake labels during training
real_label = 1.
fake_label = 0.

LEARNING_RATE = 0.0002

# Setup Adam optimizers for both G and D
optimizerG = torch.optim.Adam(
    G.parameters(), lr=LEARNING_RATE, betas=(0.5, 0.999)
)
optimizerD = torch.optim.Adam(
    D.parameters(), lr=LEARNING_RATE, betas=(0.5, 0.999)
)

In [None]:
N_IMAGES = 64

# Create batch of latent vectors that we will use to visualize
#  the progression of the generator
fixed_noise = torch.randn(N_IMAGES, LATENT_DIM, 1, 1, device=device)

In [None]:
# Training Loop
EPOCHS = 5
PRINT_EVERY = 50           # (Batch) print stats
SHOW_EVERY = 500           # (Batch) show grid
SAVE_GRID_EVERY = 50       # (Batch) save grid (for GIF, see below)
SAVE_MODEL_EVERY = EPOCHS  # (Epoch) save model (skips epoch 0 unless it is 1: 'every epoch')

tot = len(train_dataloader) # for print formatting

# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []

iters = 0

print("Starting Training Loop...")
# For each epoch
for epoch in range(EPOCHS):
    # For each batch in the dataloader
    for i, data in enumerate(train_dataloader):

        real, _ = data         # we don't use the labels
        real = real.to(device) # move to accelerator

        # -----------------------------------------------------------
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        # -----------------------------------------------------------
        ## Train with all-real batch

        D.zero_grad()
        label = torch.full((BATCH_SIZE,), real_label, dtype=torch.float, device=device)
        # Forward pass real batch through D
        output = D(real).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item() # D accuracy on real

        ## Train with all-fake batch
        # Generate batch of latent vectors
        noise = torch.randn(BATCH_SIZE, LATENT_DIM, 1, 1, device=device)
        # Generate fake image batch with G
        fake = G(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output = D(fake.detach()).view(-1)
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch, accumulated (summed) with previous gradients
        errD_fake.backward()
        D_G_z1 = output.mean().item() # D accuracy on fake 1
        # Compute error of D as sum over the fake and the real batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()

        # -------------------------------------------
        # (2) Update G network: maximize log(D(G(z)))
        # -------------------------------------------

        G.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = D(fake).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item() # D accuracy on fake 2
        # Update G
        optimizerG.step()

        # Output training stats
        if i % PRINT_EVERY == 0:
            print(
                f"Epoch: {epoch+1:{len(str(EPOCHS))}}/{EPOCHS} | Batch: {i:{len(str(tot))}}/{tot} | "
                + f"Losses, D: {errD.item():.5f}, G: {errG.item():.5f} | "
                + f"Accs, D(x): {D_x:.5f}, D(G(z)): {D_G_z1:.5f} / {D_G_z2:.5f}"
            )

        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())

        # Check how the generator is doing by saving G's output on fixed_noise
        if iters % SAVE_GRID_EVERY == 0:
            with torch.no_grad():
                fake = G(fixed_noise).detach().cpu()

            fake_img = TF.to_pil_image(tv.utils.make_grid(fake, padding=2, normalize=True))
            img_list.append(fake_img)

            # Save our images
            plt.figure(figsize=(12,12))
            plt.axis("off")
            plt.imshow(fake_img)
            plt.savefig(DCGAN_GEN_DIR / f"grid_iter_{iters:04d}.png")
            plt.close()

        # Show grid
        if iters % SHOW_EVERY == 0:
            plt.figure(figsize=(12,12))
            plt.axis("off")
            plt.imshow(fake_img)
            plt.show()

        iters += 1

    # Save our model (if every epoch, we save even at the first epoch, otherwise no)
    if (epoch > 0 or SAVE_MODEL_EVERY == 1) and (epoch + 1) % SAVE_MODEL_EVERY == 0:
        print(f"Saving Generator and Discriminator to {DCGAN_DIR}")
        # to use
        torch.jit.save(torch.jit.script(G), DCGAN_DIR / f"{GENERATOR_NAME}.iter_{iters:04d}_scripted.pt")
        torch.jit.save(torch.jit.script(D), DCGAN_DIR / f"{DISCRIMINATOR_NAME}.iter_{iters:04d}_scripted.pt")        
        # to retrain
        torch.save(G.state_dict(), DCGAN_DIR / f"{GENERATOR_NAME}.iter_{iters:04d}.pt")
        torch.save(D.state_dict(), DCGAN_DIR / f"{DISCRIMINATOR_NAME}.iter_{iters:04d}.pt")

In [None]:
plt.figure(figsize=(10,5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses,label="G")
plt.plot(D_losses,label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()

## Generate video from images (using matplotlib)

In [None]:
fig = plt.figure(figsize=(12,12))
plt.axis("off")
ims = [[plt.imshow(i, animated=True)] for i in img_list]
ani = animation.ArtistAnimation(
    fig, ims, interval=100, # determines the speed
    repeat_delay=1000, blit=True
)
plt.close(fig) # Close the figure to avoid displaying a static image (ChatGPT 4o)

# .gif also possible
ani.save(DCGAN_GEN_DIR / "grid.mp4", writer="ffmpeg", dpi=80)

Video(DCGAN_GEN_DIR / "grid.mp4", embed=True)

### Download from Colab (using Unix utilities)

```bash
!rm -rf models/.ipynb_checkpoints # remove ipynb fluff
!zip -r dcgan_mnist.zip models    # zip model folder
```
Then go to the left-hand side bar, click on the folder icon, and use the three dots on the right of the zip file to download it.

## Extra: Create a GIF from files (using imageio)


In [None]:
import PIL
import base64
import textwrap
import mimetypes
import imageio as iio

In [None]:
# Display a single image using the epoch number
def display_image(iter_no):
    return PIL.Image.open(
        DCGAN_GEN_DIR / f"grid_iter_{iter_no:04d}.png"
    ).convert('RGB')

In [None]:
display_image(920) # check the directory for the correct number

In [None]:
ANIM_PATH = DCGAN_GEN_DIR / 'grid_redux.gif'

# adapting the the tutorial version to v3 + looping the gif (thanks ChatGPT)
with iio.get_writer(ANIM_PATH, mode='I', loop=0) as writer:
    # IDEA: here we go from start to finish, and we loop...
    #       we could add those images a second time but
    #       *backward*, so that the gif will return to the
    for f in sorted(DCGAN_GEN_DIR.glob('grid*.png')):
        image = iio.v3.imread(f)
        writer.append_data(image)

In [None]:
# adapted from here: https://github.com/tensorflow/docs/blob/master/tools/tensorflow_docs/vis/embed.py

def embed_data(mime, data):
    """Embeds data as an html tag with a data-url."""
    b64 = base64.b64encode(data).decode()
    if mime.startswith('image'):
        tag = f'<img src="data:{mime};base64,{b64}"/>'
    elif mime.startswith('video'):
        tag = textwrap.dedent(f"""
            <video width="640" height="480" controls>
              <source src="data:{mime};base64,{b64}" type="video/mp4">
              Your browser does not support the video tag.
            </video>
            """)
    else:
        raise ValueError('Images and Video only.')
    return HTML(tag)

def embed_file(path):
    """Embeds a file in the notebook as an html tag with a data-url."""
    path = pathlib.Path(path)
    mime, unused_encoding = mimetypes.guess_type(str(path))
    data = path.read_bytes()
    return embed_data(mime, data)

embed_file(ANIM_PATH)

---

## Experiments


### Use your model!



Use your saved model in the [05_dcgan_visualizing_result.ipynb](https://github.com/jchwenger/DMLCP/blob/main/python/05_dcgan_visualizing_results.ipynb) notebook!

### Work with the animation



First of all, note that we create the latent noise tensor at the beginning of training: if we created one inside the loop, things would look much more chaotic!

You might want to have a different grid, or just one image. Using this function, you could generate one image at a time (modify the size in `figsize`):

```python
N_IMAGES = 3
fixed_noise = torch.randn(N_IMAGES, LATENT_DIM, 1, 1, device=device)

def save_images(noise, iters=0, save=True, show=True):
    with torch.no_grad():
        output = G(noise).cpu().detach()
    for i, o in enumerate(output):
        img = TF.to_pil_image(denorm(o))
        plt.figure(figsize=(6,6))
        plt.axis("off")
        plt.imshow(img, cmap='gray')
        if save:
            plt.savefig(DCGAN_GEN_DIR / f"single_image.iter_{iters}_{i:04d}.png")
        if show:
            plt.show()

save_images(fixed_noise)
```

### Train on a different dataset

The code above also works with other datasets! You can for instance replace `MNIST` in `tv.datasets.MNIST` by:
  - [`FashionMNIST`](https://pytorch.org/vision/main/generated/torchvision.datasets.FashionMNIST.html?highlight=fashionmnist#torchvision.datasets.FashionMNIST)
  - [`CIFAR10`](https://pytorch.org/vision/stable/generated/torchvision.datasets.CIFAR10.html#torchvision.datasets.CIFAR10)
  - [`LSUN`](https://pytorch.org/vision/stable/generated/torchvision.datasets.LSUN.html#torchvision.datasets.LSUN) (requires `pip install lmdb`)
  - [`EMNIST`](https://pytorch.org/vision/stable/generated/torchvision.datasets.EMNIST.html#torchvision.datasets.EMNIST)

  - Or your own data:

  ```python
  train_data = tv.datasets.ImageFolder(
      root=DATASETS_DIR,
      transform=transforms,
  )
  ```

  Fun fact: there are [even more MNIST-like datasets](https://www.simonwenkel.com/lists/datasets/list-of-mnist-like-datasets.html). (Generating an MNIST-like dataset could be done using for instance [p5js-ccapture](https://github.com/jchwenger/p5js-ccapture).)
  

  

#### Train on CelebA (celeb faces)

Special steps are required to download the [CelebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) dataset.

##### 1. Colab



My recommendation is the following:

1. Download the dataset from the Google Drive of the authors **once**, and upload the `img_align_celeba.zip` file (1.4 GB!) to your drive.  
    1. Manually: [here](https://drive.google.com/uc?id=1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684)  
    2. Using `gdown`: `!gdown 1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684`

The code snippet here will download the dataset using Python:
```python
import gdown
from zipfile import ZipFile

CELEBA_DIR = DATASETS_DIR / "dcgan_celeba"
EXTRACTED_DIR = os.path.join(CELEBA_DIR, "img_align_celeba")

LE_ID = '1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684' # add your ID here

if not os.path.isdir(EXTRACTED_DIR):
    if LE_ID is None:
        print("Variable `LE_ID` is None: upload the Celeba dataset to your drive, retrieve its id, and add it to `LE_ID`!")
    else:
        print("Downloading Celeba dataset")
        os.makedirs(CELEBA_DIR, exist_ok=True)

        fname = "datasets/dcgan_celeba/data.zip"
        url = f"https://drive.google.com/uc?id={LE_ID}"
        gdown.download(url, fname, quiet=False)

        print("Unzipping")
        with ZipFile("datasets/dcgan_celeba/data.zip", "r") as zipobj:
            zipobj.extractall("datasets/dcgan_celeba")
else:
    print("CelebA directory exists")

# all the images will be of the classd 'img_align_celeba',
# the only folder in there, but we don't care
train_data = tv.datasets.ImageFolder(
   root=CELEBA_DIR,
   transform=transforms,
)
```

Note that the bash commands below do the same as the Python code above:

```bash
!mkdir -p datasets/dcgan_celeba
!gdown 1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684 -O datasets/dcgan_celeba/img_align_celeba.zip
!unzip -qq datasets/dcgan_celeba/img_align_celeba.zip -d datasets/dcgan_celeba
```

Perform these steps first, *then* connect to your drive and switch directories (if you want to save your model and generated images in your drive, otherwise no need).

##### 2. Locally


Perform the steps to download the data once and unzip it so your directory looks like `DMLAP/python/datasets/dcgan_celeba/img_align_celeba` (using the lines above or the cell below).

To install [gdown](https://pypi.org/project/gdown/): `conda install -c conda-forge gdown`.

---

## More thoughts

The work with generative models that can be done here broadly falls into three main directions:
- *Freeze* the network, work on the dataset:
  - In this direction, most of your work is to gather datasets, and improve the ease of use. Are you able to develop a suite of tools that would allow you to handle datasets more easily? (In this case, the images are already cropped and the same size, which already takes some work! It would be nice to integrate tools that allow you to make this part of the work more streamlined: put any images in a folder, and a Python script crops them, etc.)? It might be worth looking into [data augmentation](https://www.tensorflow.org/tutorials/images/data_augmentation) (inject randomness into your image dataset, [this tutorial](https://www.tensorflow.org/tutorials/generative/pix2pix) uses that).
  - It would be interesting to train GANs on generative images! You might end up with really distorted versions of what you started with.
  - It's likely that people have trained GANs on spectrograms, as we see now with diffusion, but it might be a real fun thing to try?
  - The image used for the week on text is a [book project by Allisson Parrish](https://www.aleator.press/releases/wendit-tnce-inf) that uses GANs to generate images of (unreadable) poems!
  - Also, people have created loops where they train GANs on their own outputs, which creates distortions that may be worth exploring.
- *Freeze* the dataset, work on the network:
  - Maybe there's one dataset that's really your focus, or you're happy to work with established material, or the whole data processing feels boring? You might then want to look into fiddling with the model, and gather tricks (for instance: do you see an improvement if you normalise your images to be between [0,1] instead of [-1,1], like here (your Generator will have to have a `sigmoid` rather than a `tanh` as its last layer)? Then of course there's the network themselves, where all sorts of parameters can be tweaked, from the number of layers, to the strides of the convolution...
  - **Note:** experimenting at a technical level with GANs (like with other things) can be a confusing rabbit hole. My recommendations are: make sure you have stable resources (e.g. you own a GPU or pay for Colab Pro), and try and make your net/dataset/experiments *as small/easy as possible*, so you can make a lot of them, get an inuition of what works and what doesn't. Perfect results really aren't the goal here, and it's never good for your momentum to have to wait hours or days before training finishes!
  - How do you document this process of experimentation? You would probably need to save the various parameters of your experimentation (for yourself and, perhaps, the viewer), and associate that with some images generated at this point.
- *Freeze* both network and dataset, and try to use the network, or its output, in unexpected ways: one could imagine just training this network, or using a top-level StyleGAN (see below), and using the resulting images in some way, as material for something else?


### The State of the Art



The field has now moved away from GANs, as Diffusion has gained in popularity. The best results have probably been achieved by [Nvidia's StyleGan 3](https://nvlabs.github.io/stylegan3/) ([repo](https://github.com/NVlabs/stylegan3)) (both written in PyTorch). Check the [StyleGAN 3 notebook](10_models_1_stylegan3.ipynb) to check it out (on Colab!).

Another interesting option to look into is lucidrains' [Lightweight GAN](https://github.com/lucidrains/lightweight-gan) implementation.



### Zoos: list of all GAN variants



When it comes to GANs, just like Diffusion now, the explosion has been so enormous it is rather difficult (impossible?) to keep up:

- [Avinash Hindupur, "The GAN Zoo"](https://github.com/hindupuravinash/the-gan-zoo)
- [Jihye Back, "GAN-Zoos"](https://happy-jihye.github.io/gan/)

### Notes / Tricks



More resources worth checking: [Soumith Chintala, "How to Train a GAN? Tips and tricks to make GANs work"](https://github.com/soumith/ganhacks) (and [video](https://www.youtube.com/watch?v=X1mUN6dD8uE), as well as [Goodfellow's workshop](https://www.youtube.com/watch?v=HGYYEUSm-0Q)). This is summarised [in this part of a long course](https://www.youtube.com/watch?v=_cUdjPdbldQ&list=PLTKMiZHVd_2KJtIXOW0zFhFfBaJJilH51&index=153). To go deeper still, there's [this paper](https://arxiv.org/abs/1606.03498), and a [GAN guide](https://github.com/garridoq/gan-guide), and the [Art using GANs](https://github.com/Kaustubh1Verma/Art-using-GANs) repo.

Here is a summary of some of the tricks Chollet mentions in his book, that are used in this implementation:

- Sample from the latent space using a **normal distribution** (Gaussian), not a uniform one;
- GANs are likely to get stuck in all sorts of ways (it's an unstable, dynamic equilibrium): we introduce **random noise** to the labels for the discriminator to prevent this (called label smoothing);
- Sparse gradients can hinder GAN training, remedy: **strided convolutions** for downsampling instead of max pooling, and the **`LeakyReLU`** instead of `ReLu`;
- To avoid checkerboard artifacts caused by unequal coverage of the pixel space in the generator, use a kernel size **divisible by the stride size** with strided `Conv2DTranspose` or `Conv2D`. [In this implementation, we avoid those and use `UpSamling2D` followed by a `Conv2D`].

<small>*Deep Learning With Python*, 2<sup>nd</sup> ed., p.404</small>

Note also that, as is mentioned by Chintala (see lecture above), the labels for true/fake are reversed from the original formulation (here 0 is true, 1 is fake), that is said to improve stability.