# Auto-encoders and Generative models in `keras`

In this session, you will experiment with auto-encoders and then a family of generative models called 
Generative Adversarial Models (GANs).

## Auto-encoders

**Question 1.** Implement a shallow auto-encoder (with a single layer from the input to the hidden 
representation in dimension 16, and a single layer from this hidden representation to the output) and 
fit it to MNIST training set.



In [None]:
import keras
from keras.datasets import mnist
from keras.layers import Dense, InputLayer
from keras.models import Sequential


(X_train, _), (X_test, _) = mnist.load_data()
# Represent images as long vectors of pixels in [0, 1]
X_train = X_train.reshape((X_train.shape[0], -1)) / 255.
X_train = X_train[::2]  # Keep half of the dataset
X_test = X_test.reshape((X_test.shape[0], -1)) / 255.
X_test = X_test[::2]  # Keep half of the dataset

# TODO


**Question 2.** Use the code below to visualize the quality of reconstruction on some test samples.

In [None]:
import matplotlib.pyplot as plt

def plot_reconstruction(img, reconstruction):
    plt.figure()
    plt.subplot(1, 2, 1)
    plt.imshow(img.reshape((28, 28)), cmap="gray")
    plt.title("Original image")
    plt.subplot(1, 2, 2)
    plt.imshow(reconstruction.reshape((28, 28)), cmap="gray")
    plt.title("Reconstructed image")

preds = model(X_test).numpy()
plot_reconstruction(X_test[0], preds[0])


**Question 3.** Check if adding more layers (in both the encoder and decoder, trying to keep a mirror 
structure) helps better reconstructing the images.

Auto-encoders are known to be good image denoisers, if trained using noisy images as inputs and clean ones as outputs.

**Question 4.** Using the below-defined noisy copies of `X_train` and `X_test`, check the denoising 
capabilities of a network with the same structure as in the previous question.

In [None]:
import numpy as np

X_train_noisy = X_train + .1 * np.random.randn(*X_train.shape)
X_test_noisy = X_test + .1 * np.random.randn(*X_test.shape)

# TODO

## Generative Adversarial Networks (GAN)

In this section, you will be invited to play with two types of GAN models to generate MNIST-like data.

First, you will find below an almost complete implementation of the original GAN model (widely inspired from <https://github.com/eriklindernoren/Keras-GAN>).

**Question 5.** Fill in the blanks (TODO marks in the `train` method) to complete the code and train a model on MNIST for 1000 epochs.

In [None]:
from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, ZeroPadding2D, LeakyReLU
from keras.layers import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam

import numpy as np

class GAN():
    def __init__(self):
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.latent_dim = 100

        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss='binary_crossentropy',
            optimizer=optimizer,
            metrics=['accuracy'])

        # Build the generator
        self.generator = self.build_generator()

        # The generator takes noise as input and generates imgs
        z = Input(shape=(self.latent_dim,))
        img = self.generator(z)

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The discriminator takes generated images as input and determines validity
        validity = self.discriminator(img)

        # The combined model  (stacked generator and discriminator)
        # Trains the generator to fool the discriminator
        self.combined = Model(z, validity)
        self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)


    def build_generator(self):

        model = Sequential()

        model.add(Dense(256, input_dim=self.latent_dim))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(1024))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(np.prod(self.img_shape), activation='tanh'))
        model.add(Reshape(self.img_shape))

        noise = Input(shape=(self.latent_dim,))
        img = model(noise)

        return Model(noise, img)

    def build_discriminator(self):

        model = Sequential()

        model.add(Flatten(input_shape=self.img_shape))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(256))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(1, activation='sigmoid'))

        img = Input(shape=self.img_shape)
        validity = model(img)

        return Model(img, validity)

    def train(self, epochs, batch_size=128):

        # Load the dataset
        (X_train, _), (_, _) = mnist.load_data()

        # Rescale -1 to 1
        X_train = X_train / 127.5 - 1.
        X_train = np.expand_dims(X_train, axis=3)

        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):

            # ---------------------
            #  Train Discriminator
            # ---------------------

            # Select a random batch of images
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            imgs = X_train[idx]

            noise = np.random.randn(batch_size, self.latent_dim)
            
            # Generate a batch of new images
            gen_imgs = self.generator.predict(noise)

            # Train the discriminator
            d_loss_real = self.discriminator.train_on_batch(imgs, None)  # TODO: change None to a reasonable value
            d_loss_fake = self.discriminator.train_on_batch(gen_imgs, None)  # TODO: change None to a reasonable value
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # ---------------------
            #  Train Generator
            # ---------------------

            noise = np.random.randn(batch_size, None)  # TODO: change None to a reasonable value

            # Train the generator (to have the discriminator label samples as valid)
            g_loss = self.combined.train_on_batch(noise, None)  # TODO: change None to a reasonable value

            # Plot the progress
            print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))

gan = GAN()
gan.train(epochs=10 * 1000)


Now that your model is trained, generate a few images and visualize them with the code below:

In [None]:
import matplotlib.pyplot as plt

n_images = 3
z = np.random.randn(n_images, None)  # TODO: change None to a reasonable value
gen_imgs = gan.generator.predict(z)

# Rescale images 0 - 1
gen_imgs = 0.5 * gen_imgs + 0.5
for i in range(n_images):
  plt.imshow(gen_imgs[i, :, :, 0], cmap='gray')
  plt.show()

Code for a Conditional GAN is quite similar (_cf._ below, once again widely inspired from the same GitHub repository).

**Question 6.** What is the input fed to the generator to generate a fake sample?

In [None]:
from keras.layers import Multiply, Embedding


class CGAN():
    def __init__(self):
        # Input shape
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.num_classes = 10
        self.latent_dim = 100

        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss=['binary_crossentropy'],
            optimizer=optimizer,
            metrics=['accuracy'])

        # Build the generator
        self.generator = self.build_generator()

        # The generator takes noise and the target label as input
        # and generates the corresponding digit of that label
        noise = Input(shape=(self.latent_dim,))
        label = Input(shape=(1,))
        img = self.generator([noise, label])

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The discriminator takes generated image as input and determines validity
        # and the label of that image
        valid = self.discriminator([img, label])

        # The combined model  (stacked generator and discriminator)
        # Trains generator to fool discriminator
        self.combined = Model([noise, label], valid)
        self.combined.compile(loss=['binary_crossentropy'],
            optimizer=optimizer)

    def build_generator(self):

        model = Sequential()

        model.add(Dense(256, input_dim=self.latent_dim))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(1024))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(np.prod(self.img_shape), activation='tanh'))
        model.add(Reshape(self.img_shape))

        noise = Input(shape=(self.latent_dim,))
        label = Input(shape=(1,), dtype='int32')
        label_embedding = Flatten()(Embedding(self.num_classes, self.latent_dim)(label))

        model_input = Multiply()([noise, label_embedding])
        img = model(model_input)

        return Model([noise, label], img)

    def build_discriminator(self):

        model = Sequential()

        model.add(Dense(512, input_dim=np.prod(self.img_shape)))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.4))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.4))
        model.add(Dense(1, activation='sigmoid'))

        img = Input(shape=self.img_shape)
        label = Input(shape=(1,), dtype='int32')

        label_embedding = Flatten()(Embedding(self.num_classes, np.prod(self.img_shape))(label))
        flat_img = Flatten()(img)

        model_input = Multiply()([flat_img, label_embedding])

        validity = model(model_input)

        return Model([img, label], validity)

    def train(self, epochs, batch_size=128, sample_interval=50):

        # Load the dataset
        (X_train, y_train), (_, _) = mnist.load_data()

        # Configure input
        X_train = (X_train.astype(np.float32) - 127.5) / 127.5
        X_train = np.expand_dims(X_train, axis=3)
        y_train = y_train.reshape(-1, 1)

        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):

            # ---------------------
            #  Train Discriminator
            # ---------------------

            # Select a random half batch of images
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            imgs, labels = X_train[idx], y_train[idx]

            # Sample noise as generator input
            noise = np.random.normal(0, 1, (batch_size, 100))

            # Generate a half batch of new images
            gen_imgs = self.generator.predict([noise, labels])

            # Train the discriminator
            d_loss_real = self.discriminator.train_on_batch([imgs, labels], valid)
            d_loss_fake = self.discriminator.train_on_batch([gen_imgs, labels], fake)
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # ---------------------
            #  Train Generator
            # ---------------------

            # Condition on labels
            sampled_labels = np.random.randint(0, 10, batch_size).reshape(-1, 1)

            # Train the generator
            g_loss = self.combined.train_on_batch([noise, sampled_labels], valid)

            # Plot the progress
            print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))
            
cgan = CGAN()
cgan.train(epochs=100)

**Question 7.** Fit the model for 1000 epochs and, once fitted, generate a few fake "8" handwritten digits (take inspiration from the code above to show the generated images).

## Diffusion models

For this section on Diffusion models, we will not try to implement diffusion models ourselves but rather rely on pre-trained models stored on HuggingFace.
To do so, we will use the `diffusers` library provided by HuggingFace, which we need to first install:


In [None]:
!pip install diffusers

The following code allows to load a pre-trained model hosted at HuggingFace and use it to generate images.
HuggingFace-hosted models can be found at: <https://huggingface.co/models>

**Question 8.** Use two different models (trained on different training sets) and ask for the generation of 4 different images. Observe the impact of the training set on generated images.

In [None]:
from diffusers import DDPMPipeline
import matplotlib.pyplot as plt

def gen_images(model_id, n_images, n_steps=1000):
  # load model and scheduler
  pipe = DDPMPipeline.from_pretrained(model_id)
  pipe.to("cuda")

  # run pipeline in inference (sample random noise and denoise)
  return pipe(batch_size=n_images, num_inference_steps=n_steps).images



Stable Diffusion is a diffusion model that allows one to generate images from a text prompt. 
The idea behind Stable Diffusion is that the diffusion process at each step is conditioned by a high-dimensional representation of the text prompt, forcing the model to generate images that are related to the said prompt.

**Question 9.** Use Stable Diffusion v1-5 available [there](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate an image from a text of your choice.

In [None]:
!pip install transformers