#Generative models
In this Lab we will be experimenting with

- Convolutional autoencoders
- Latent space visualization and interpolation
- Upsampling techniques
- Variational autoencoders
- Deep Convolutional GANs (DCGANs)

If you want to experiment with Denoising Autoencoders, revisit Lab 3 (task 5):
https://github.com/aivclab/dlcourse/blob/master/Lab3_FunWithMNIST.ipynb

**Before we start - remember to set runtime to GPU**

**NOTE:** In case you have trouble running Keras/TensorFlow in Colab, try one of the following:

In [None]:
# Try this
#!pip install --upgrade tensorflow==1.8.0

# ... or this
#%tensorflow_version 1.x

# Check TensorFlow version
#import tensorflow as tf
#print(tf.__version__)

##1. Download the MNIST dataset
As usual:

In [None]:
from __future__ import print_function
from tensorflow import keras
from keras.datasets import mnist
from keras import backend as K

num_classes = 10

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# Pre-process inputs
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# Convert class indices to one-hot vectors
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [None]:
# Input shape: 28 x 28 x 1 = image with one color channel
print('input_shape :',input_shape)

# Pre-process inputs
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# to_categorical converts class indices to one-hot vectors
print('y_train shape:', y_train.shape)

##2. Task 1: Convolutional Autoencoder
Here is an example of a Convolutional Autoencoder (CAE) for MNIST.

**Note:** There are many ways to implement CAEs. This one is designed to map the input image down to a 2D latent space, so that you can plot the latent vectors in 2D. Also note how we define the encoder and the decoder separately and combine them afterwards to form the final CAE model.


###2.1 Your task
The shape of the decoder's output should match the input shape (28x28x1), but it doesn't. Try for yourself. Your task is to fix this problem by modifying this line of code:

```
decoded = Conv2D(1, kernel_size=(3, 3), padding='same', activation='sigmoid')(x)
```


In [None]:
from keras.layers import Input, Dropout, Flatten, Dense, Conv2D, MaxPooling2D
from keras.models import Model
from keras.layers import UpSampling2D, ZeroPadding2D, Conv2DTranspose, Reshape

# Number of latent dimensions
latent_dim = 2

# Encoder (convolutional base)
inputs = Input(shape=(28, 28, 1))
x = ZeroPadding2D(padding=(2, 2))(inputs)
x = Conv2D(8, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)
x = Conv2D(16, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)
x = Conv2D(32, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)

# shape info needed to build decoder model
shape = K.int_shape(x)

x = Flatten()(x)
encoded = Dense(latent_dim)(x)
encoder = Model(inputs, encoded)
encoder.summary()
print(("shape of encoded", K.int_shape(encoded)))

# Decoder (upsamling)
encoding = Input(shape=(1, 1, latent_dim))
x = Dense(shape[1] * shape[2] * shape[3], activation='relu')(encoding)
x = Reshape((shape[1], shape[2], shape[3]))(x)
x = Conv2DTranspose(32, (3,3), strides=(2,2), padding='same')(x)
x = Conv2DTranspose(16, (3,3), strides=(2,2), padding='same')(x)
x = Conv2DTranspose(8, (3,3), strides=(2,2), padding='same')(x)
decoded = Conv2D(1, kernel_size=(3, 3), padding='same', activation='sigmoid')(x) # Fix this line !!!
decoder = Model(encoding, decoded)
decoder.summary()
print(("shape of decoded", K.int_shape(decoded)))

x = encoder(inputs)
predictions = decoder(x)
autoencoder = Model(input=inputs, output=predictions)

###2.2 Questions:
1. What does the `ZeroPadding2D` layer do?
2. What is the shape of the data before and after zero padding? (Note: for downsampling and upsampling it is more convenient if the shape of the data is a power of 2).
3. What is the purpose of the `Reshape`layer in the decoder?


###2.3 Training
Let's train the autoencoder for 30 epochs (add more epochs to improve results):

In [None]:
autoencoder.compile(optimizer='adam', loss='mse')
history = autoencoder.fit(x_train, x_train, epochs=30, batch_size=256,
               shuffle=True, validation_data=(x_test, x_test), verbose=1)

In [None]:
import matplotlib.pyplot as plt
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

###2.4 Plot the latent space representation
To get some intuition about what our autoencoder has learned, we can plot the latent representation of the training data:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
mpl.rc('image', cmap='jet')

# Get latent representation
z = encoder.predict(x_train,batch_size=32)

# Plot
plt.figure(figsize=(12, 10))
plt.scatter(z[:, 0], z[:, 1], c=np.argmax(y_train,axis=1))
plt.colorbar()
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.show()

Rather than plotting the latent representation of the training samples, we could also use the CAE to *generate* new samples. We do this by generating latent vectors that span a 2D grid (defined by `grid_x` and `grid_y` below) and then feed each latent vector on the grid into the decoder to generate an image: 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
n = 20
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))
# linearly spaced coordinates corresponding to the 2D plot
# of digit classes in the latent space
grid_x = np.linspace(-6, 3, n)        # Task : Set range according to your latent representation
grid_y = np.linspace(-3, 6, n)[::-1]  # Task : Set range according to your latent representation

for i, yi in enumerate(grid_y):
    for j, xi in enumerate(grid_x):
        z_sample = np.array([[xi, yi]])
        x_decoded = decoder.predict(z_sample.reshape(1,1,1,2))
        digit = x_decoded[0].reshape(digit_size, digit_size)
        figure[i * digit_size: (i + 1) * digit_size,
                j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))
start_range = digit_size // 2
end_range = n * digit_size + start_range + 1
pixel_range = np.arange(start_range, end_range, digit_size)
sample_range_x = np.round(grid_x, 1)
sample_range_y = np.round(grid_y, 1)
plt.xticks(pixel_range, sample_range_x)
plt.yticks(pixel_range, sample_range_y)
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.imshow(figure, cmap='Greys_r')

**Sub-task:** To get the best result, modify the x and y ranges (`grid_x` and `grid_y') so that they approximately match the ranges observed in the previous plot of the training data.

###2.5 Question
1. Which digits can the autoencoder generate faithfully, which digits does it have trouble generating? Why?

###2.6 Encoding, decoding and latent space interpolation
Now that we have trained an autoencoder, we can use it to encode existing images and generate new images (from a latent representation). With the latent representation we can also start doing interpolation between training samples.

Your task is to 

1. Encode an image of a 7 and an image of a 9 (or any other pair if you refer)
2. Decode the encodings to generate reconstructed images
3. Interpolate between the two digits in latent space

In [None]:
# Draw two samples (a 7 and a 9) and display them
y_test_category = np.argmax(y_test,axis=1)
ix7 = np.where(y_test_category==7)[0][1] # Pick a 7
ix9 = np.where(y_test_category==9)[0][1] # Pick a 9
plt.subplot(221);plt.imshow(x_test[ix7,:].squeeze(),cmap='gray')
plt.subplot(222);plt.imshow(x_test[ix9,:].squeeze(),cmap='gray')

# Subtask 1 (encoding):
# Calculate the latent representation of each sample using the encoder
z7 = # encode x_test[ix7,:]
z9 = # encode x_test[ix9,:]

# Subtask 2 (decoding):
# Reconstruct images from the two latent vectors using the decoder
x_hat_7 = # decode z7
x_hat_9 = # decode z9

# Show reconstruction
plt.subplot(223);plt.imshow(x_hat_7.squeeze(),cmap='gray')
plt.subplot(224);plt.imshow(x_hat_9.squeeze(),cmap='gray')

# Subtask 3 (interpolate):
# Just run - no changes required)
N = 8
interp_features = np.zeros((N,latent_dim))
for i in range(latent_dim):
  interp_features[:,i] = np.linspace(z7[0,i].squeeze(),z9[0,i].squeeze(),N)

plt.figure(figsize=(20,6))
for i in range(N):
  x = interp_features[i,:].reshape(1,1,1,latent_dim)
  out = decoder.predict(x)
  plt.subplot(1,N,i+1)
  plt.imshow(out.squeeze(),cmap='gray')  

**Perspectives:** Given a dataset of facial images, you could use latent space interpolation to generate images like these:

![alt text](https://github.com/davidsandberg/facenet/wiki/20170708-150701-add_smile.png)

##3. Task 2: Implement a CAE from scratch
The purpose of this task is to test if you can implement a CAE from scratch. **I recommend you skip ahead and complete the tasks on variational encoders and GANs first, and then return to this task later**.

Your task is to implement this CAE archtecture for MNIST:

![alt text](https://github.com/aivclab/dlcourse/raw/master/data/Lab9_CAE_architecture.png)

**Explanation**:
- "Conv 1", "Conv 2", "Conv 3", "D Conv 1", "D Conv 2", "D Conv 3", and "D Conv 4" are *all* regular 2D convolutions: [Conv2D](https://keras.io/layers/convolutional/#conv2d).
- "M.P" is short for Max Pooling
- "U.S" is short for upsampling. You must use [UpSampling2D](https://keras.io/layers/convolutional/#upsampling2d) and **not** [Conv2DTranspose](https://keras.io/layers/convolutional/#conv2dtranspose). (What's the difference by the way?)

##4. Task 3: Variational Autoencoder
Recall that variational autoencoders (VAE) are designed to learn smooth latent space representation (the problem with traditional autoencoders is that they tend to generate gaps in the latent space, making interpolation impossible). The purpose of this task is to see if this is actually the case in practise.

Below is an implementation of a convolutional VAE.

In [None]:
from keras.layers import UpSampling2D, ZeroPadding2D, Conv2DTranspose, Lambda, Reshape

batch_size = 256
latent_dim = 2

def sampling(args):
    z_mean, z_log_sigma = args
    epsilon = K.random_normal(shape=(batch_size, latent_dim))
    return z_mean + K.exp(z_log_sigma) * epsilon

# VAE model = encoder + decoder
# build encoder model
inputs = Input(shape=(28, 28, 1),name='encoder_input')
x = ZeroPadding2D(padding=(2, 2))(inputs)
x = Conv2D(8, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)
x = Conv2D(16, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)
x = Conv2D(32, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)
x = Conv2D(64, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)
x = Conv2D(64, kernel_size=(3, 3), strides=(2,2), activation='relu', padding='same')(x)

# shape info needed to build decoder model
shape = K.int_shape(x)

# generate latent vector Q(z|X)
x = Flatten()(x)
z_mean = Dense(latent_dim, name='z_mean')(x)
z_log_var = Dense(latent_dim, name='z_log_var')(x)

# use reparameterization trick to push the sampling out as input
# note that "output_shape" isn't necessary with the TensorFlow backend
z = Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var])

# instantiate encoder model
encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')
encoder.summary()

# build decoder model
latent_inputs = Input(shape=(latent_dim,), name='z_sampling')
x = Dense(shape[1] * shape[2] * shape[3], activation='relu')(latent_inputs)
x = Reshape((shape[1], shape[2], shape[3]))(x)

x = Conv2DTranspose(64, (1,1), strides=(2,2), padding='same')(x)
x = Conv2DTranspose(32, (3,3), strides=(2,2), padding='same')(x)
x = Conv2DTranspose(16, (3,3), strides=(2,2), padding='same')(x)
x = Conv2DTranspose(8, (3,3), strides=(2,2), padding='same')(x)
x = Conv2DTranspose(8, (3,3), strides=(2,2), padding='same')(x)
outputs = Conv2D(1, kernel_size=(5, 5), padding='valid', activation='sigmoid')(x)

# instantiate decoder model
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()

# instantiate VAE model
outputs = decoder(encoder(inputs)[2])
vae = Model(inputs, outputs, name='vae')

###4.1 Questions
1. What does the `sampling` function do?
2. The encoder outputs three variables: `[z_mean, z_log_var, z]`. What do they represent?

###4.2 Loss function
The loss consists of two terms:

- A reconstruction term (or similarity term)
- and a KL divergence term

You can read more about it here: https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf

The KL term is:

![alt text](https://miro.medium.com/max/520/1*uEAxCmyVKxzZOJG6afkCCg.png)

**Sub-task:** Identify the individual terms in the code block below.

In [None]:
from keras.losses import mse
reconstruction_loss = mse(K.flatten(inputs), K.flatten(outputs))
reconstruction_loss *= 28 * 28
kl_loss = K.exp(z_log_var) + K.square(z_mean) - z_log_var - 1
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= 0.5
vae_loss = K.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
vae.compile(optimizer='rmsprop')
vae.summary()

###4.3 Training

In [None]:
num_samples = int(np.floor(x_train.shape[0] / batch_size) * batch_size)
vae.fit(x_train[0:num_samples,:], epochs=30, batch_size=batch_size,
        shuffle=True, verbose=1)

###4.4 Plot the latent space representation

In [None]:
z_mean, _, _ = encoder.predict(x_train[0:num_samples,:],
                                batch_size=batch_size)
import matplotlib as mpl
mpl.rc('image', cmap='jet')
plt.figure(figsize=(12, 10))
plt.scatter(z_mean[:, 0], z_mean[:, 1], c=np.argmax(y_train[0:num_samples,:],axis=1))
plt.colorbar()
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.show()

In [None]:
n = 20
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))
# linearly spaced coordinates corresponding to the 2D plot
# of digit classes in the latent space
grid_x = np.linspace(-2, 2, n)
grid_y = np.linspace(-2, 2, n)[::-1]

for i, yi in enumerate(grid_y):
    for j, xi in enumerate(grid_x):
        z_sample = np.array([[xi, yi]])
        x_decoded = decoder.predict(z_sample)
        digit = x_decoded[0].reshape(digit_size, digit_size)
        figure[i * digit_size: (i + 1) * digit_size,
                j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))
start_range = digit_size // 2
end_range = n * digit_size + start_range + 1
pixel_range = np.arange(start_range, end_range, digit_size)
sample_range_x = np.round(grid_x, 1)
sample_range_y = np.round(grid_y, 1)
plt.xticks(pixel_range, sample_range_x)
plt.yticks(pixel_range, sample_range_y)
plt.xlabel("z[0]")
plt.ylabel("z[1]")
plt.imshow(figure, cmap='Greys_r')

###4.5 Questions
1. What do you think of this latent representation? In terms of quality? In terms of smoothness? Compare to the same plot for the traditional autoencoder.
2. Which digits does the model faithfully reconstruct? Which digits does it have trouble reconstructing? Why?
3. What happens if you set the weight of the KL term to, say 5 (`kl_loss *= 5`), and re-train the model?

###4.6 Encoding, decoding and latent space interpolation
Like we did for the traditional autoencoder (see section 2.6), your task is to 

1. Encode an image of a 7 and an image of a 9 (**warning - this is trickier than you might think!!!**) 
2. Decode the encodings (to generate reconstructed images)
3. Interpolate between the two digits in latent space

In [None]:
# Draw two samples (a 7 and a 9) and display them
y_test_category = np.argmax(y_test,axis=1)
ix7 = np.where(y_test_category==7)[0][1]
ix9 = np.where(y_test_category==9)[0][1]
plt.subplot(221);plt.imshow(x_test[ix7,:].squeeze(),cmap='gray')
plt.subplot(222);plt.imshow(x_test[ix9,:].squeeze(),cmap='gray')

# Subtask 1 (encoding): Calculate the latent representation of each sample
z7 = # encode x_test[ix7,:]
z9 = # encode x_test[ix9,:]

# Subtask 2 (decoding): Reconstruct images from the two latent vectors
x_hat_7 = # decode z7
x_hat_9 = # decode z9

# Show reconstruction
plt.subplot(223);plt.imshow(x_hat_7.squeeze(),cmap='gray')
plt.subplot(224);plt.imshow(x_hat_9.squeeze(),cmap='gray')

# Subtask 3 (interpolate): Just run - no changes required)
N = 8
interp_features = np.zeros((N,latent_dim))
for i in range(latent_dim):
  interp_features[:,i] = np.linspace(z7[0,i].squeeze(),z9[0,i].squeeze(),N)

plt.figure(figsize=(20,6))
for i in range(N):
  x = interp_features[i,:].reshape(1,1,1,latent_dim)
  out = decoder.predict(x)
  plt.subplot(1,N,i+1)
  plt.imshow(out.squeeze(),cmap='gray')  

**Note:** If you want to make nicer reconstructions and better interpolations, increase the latent dimensionality (latent_dim) and re-train the model.

##5. Task 4: Generative Adversarial Networks
Below is an implementation of a Deep Convolutional GAN (DCGAN) for MNIST.

Code here: https://github.com/eriklindernoren/Keras-GAN/blob/master/dcgan/dcgan.py

###5.1 Recap of GANs
A DCGAN is a generative model that learns to map random noise vectors into images. Unlike an autoencoder, which  encodes and decodes an image into itself, DCGAN learns to generate images that look real. This means that you must have a data set of real images to compare with.

The network consists of two sub-networks that are trained in tandem:

- The **Generator** takes a random noise vector and maps it into an image.
- The **Discriminator** takes an input image, which is either **"real"** (i.e., picked from the database of real images) or **"fake"** (i.e., generated by the Generator). It then learns to distingiush between real and fake images.

The two networks are competing against each other, and at some point the Generator becomes so good at generating fakes, which look real, that the Discriminator can no longer distuingish fakes from reals.

GANs are really hard to train and the above example is just a toy example. The training loop looks like this:

![alt text](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2019/05/Summary-of-the-Generative-Adversarial-Network-Training-Algorithm-1024x669.png)


###5.2 Your tasks
1. Run the code block to start training the model. In the meantime go through the code and see if you can identify the major steps of the training loop.

2. What is the dimensionality of the latent space in this example? Change it to 2 instead.

3. Extend the code such that you can train a DCGAN and subsequently make it generate images based on some 2D latent vector that you specify. Use this to make a plot of the 2D latent space, like we did above. What do you observe?

In [None]:
from __future__ import print_function, division

from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
import matplotlib.pyplot as plt

import sys

class DCGAN():
    def __init__(self):
        # Input shape
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.latent_dim = 10

        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss='binary_crossentropy',
            optimizer=optimizer,
            metrics=['accuracy'])

        # Build the generator
        self.generator = self.build_generator()

        # The generator takes noise as input and generates imgs
        z = Input(shape=(self.latent_dim,))
        img = self.generator(z)

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The discriminator takes generated images as input and determines validity
        valid = self.discriminator(img)

        # The combined model  (stacked generator and discriminator)
        # Trains the generator to fool the discriminator
        self.combined = Model(z, valid)
        self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)

    def build_generator(self):

        model = Sequential()

        model.add(Dense(128 * 7 * 7, activation="relu", input_dim=self.latent_dim))
        model.add(Reshape((7, 7, 128)))
        model.add(UpSampling2D())
        model.add(Conv2D(128, kernel_size=3, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Activation("relu"))
        model.add(UpSampling2D())
        model.add(Conv2D(64, kernel_size=3, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Activation("relu"))
        model.add(Conv2D(self.channels, kernel_size=3, padding="same"))
        model.add(Activation("tanh"))

        model.summary()

        noise = Input(shape=(self.latent_dim,))
        img = model(noise)

        return Model(noise, img)

    def build_discriminator(self):

        model = Sequential()

        model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=self.img_shape, padding="same"))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
        model.add(ZeroPadding2D(padding=((0,1),(0,1))))
        model.add(BatchNormalization(momentum=0.8))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(Flatten())
        model.add(Dense(1, activation='sigmoid'))

        model.summary()

        img = Input(shape=self.img_shape)
        validity = model(img)

        return Model(img, validity)

    def train(self, epochs, batch_size=128, save_interval=500):

        # Load the dataset
        (X_train, _), (_, _) = mnist.load_data()

        # Rescale -1 to 1
        X_train = X_train / 127.5 - 1.
        X_train = np.expand_dims(X_train, axis=3)

        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):

            # ---------------------
            #  Train Discriminator
            # ---------------------

            # Select a random half of images
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            imgs = X_train[idx]

            # Sample noise and generate a batch of new images
            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))
            gen_imgs = self.generator.predict(noise)

            # Train the discriminator (real classified as ones and generated as zeros)
            d_loss_real = self.discriminator.train_on_batch(imgs, valid)
            d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # ---------------------
            #  Train Generator
            # ---------------------

            # Train the generator (wants discriminator to mistake images as real)
            g_loss = self.combined.train_on_batch(noise, valid)

            # Plot the progress
            print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))

            # If at save interval => save generated image samples
            if epoch % save_interval == 0:
                print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))
                self.save_imgs(epoch)

    def save_imgs(self, epoch):
        r, c = 5, 5
        noise = np.random.normal(0, 1, (r * c, self.latent_dim))
        gen_imgs = self.generator.predict(noise)

        # Rescale images 0 - 1
        gen_imgs = 0.5 * gen_imgs + 0.5

        plt.figure()
        fig, axs = plt.subplots(r, c)
        cnt = 0
        for i in range(r):
            for j in range(c):
                axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')
                axs[i,j].axis('off')
                cnt += 1
        #fig.savefig("mnist_%d.png" % epoch)
        plt.show()

if __name__ == '__main__':
    dcgan = DCGAN()
    dcgan.train(epochs=4000, batch_size=32, save_interval=500)

##5.3 Conditional GAN (optional)
The original GAN has no knowledge, and hence no understanding of the data's class labels. CGAN aims to solve this issue by telling both the generator and the discriminator what the class label is. Specifically, CGAN concatenates a one-hot vector y to the random noise vector z to result in an architecture that looks like this:

![alt text](https://paper-attachments.dropbox.com/s_D85DDA7D01FD04AEE96825C4B90F1126BC7D080CA4F2947D4A5DEC07FAD6122C_1559840765144_Screenshot+2019-06-06+at+10.35.29+PM.png)

If you have more time, try out the CGAN tutorial:

- https://github.com/eriklindernoren/Keras-GAN#cgan
- https://github.com/eriklindernoren/Keras-GAN/blob/master/cgan/cgan.py

How does it work?


In [None]:
from __future__ import print_function, division

from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout, multiply
from keras.layers import BatchNormalization, Activation, Embedding, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam

import matplotlib.pyplot as plt

import numpy as np

class CGAN():
    def __init__(self):
        # Input shape
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.num_classes = 10
        self.latent_dim = 100

        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss=['binary_crossentropy'],
            optimizer=optimizer,
            metrics=['accuracy'])

        # Build the generator
        self.generator = self.build_generator()

        # The generator takes noise and the target label as input
        # and generates the corresponding digit of that label
        noise = Input(shape=(self.latent_dim,))
        label = Input(shape=(1,))
        img = self.generator([noise, label])

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The discriminator takes generated image as input and determines validity
        # and the label of that image
        valid = self.discriminator([img, label])

        # The combined model  (stacked generator and discriminator)
        # Trains generator to fool discriminator
        self.combined = Model([noise, label], valid)
        self.combined.compile(loss=['binary_crossentropy'],
            optimizer=optimizer)

    def build_generator(self):

        model = Sequential()

        model.add(Dense(256, input_dim=self.latent_dim))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(1024))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(np.prod(self.img_shape), activation='tanh'))
        model.add(Reshape(self.img_shape))

        model.summary()

        noise = Input(shape=(self.latent_dim,))
        label = Input(shape=(1,), dtype='int32')
        label_embedding = Flatten()(Embedding(self.num_classes, self.latent_dim)(label))

        model_input = multiply([noise, label_embedding])
        img = model(model_input)

        return Model([noise, label], img)

    def build_discriminator(self):

        model = Sequential()

        model.add(Dense(512, input_dim=np.prod(self.img_shape)))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.4))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.4))
        model.add(Dense(1, activation='sigmoid'))
        model.summary()

        img = Input(shape=self.img_shape)
        label = Input(shape=(1,), dtype='int32')

        label_embedding = Flatten()(Embedding(self.num_classes, np.prod(self.img_shape))(label))
        flat_img = Flatten()(img)

        model_input = multiply([flat_img, label_embedding])

        validity = model(model_input)

        return Model([img, label], validity)

    def train(self, epochs, batch_size=128, sample_interval=50):

        # Load the dataset
        (X_train, y_train), (_, _) = mnist.load_data()

        # Configure input
        X_train = (X_train.astype(np.float32) - 127.5) / 127.5
        X_train = np.expand_dims(X_train, axis=3)
        y_train = y_train.reshape(-1, 1)

        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):

            # ---------------------
            #  Train Discriminator
            # ---------------------

            # Select a random half batch of images
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            imgs, labels = X_train[idx], y_train[idx]

            # Sample noise as generator input
            noise = np.random.normal(0, 1, (batch_size, 100))

            # Generate a half batch of new images
            gen_imgs = self.generator.predict([noise, labels])

            # Train the discriminator
            d_loss_real = self.discriminator.train_on_batch([imgs, labels], valid)
            d_loss_fake = self.discriminator.train_on_batch([gen_imgs, labels], fake)
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # ---------------------
            #  Train Generator
            # ---------------------

            # Condition on labels
            sampled_labels = np.random.randint(0, 10, batch_size).reshape(-1, 1)

            # Train the generator
            g_loss = self.combined.train_on_batch([noise, sampled_labels], valid)

            # Plot the progress
            print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))

            # If at save interval => save generated image samples
            if epoch % sample_interval == 0:
                self.sample_images(epoch)

    def sample_images(self, epoch):
        r, c = 2, 5
        noise = np.random.normal(0, 1, (r * c, 100))
        sampled_labels = np.arange(0, 10).reshape(-1, 1)

        gen_imgs = self.generator.predict([noise, sampled_labels])

        # Rescale images 0 - 1
        gen_imgs = 0.5 * gen_imgs + 0.5

        fig, axs = plt.subplots(r, c)
        cnt = 0
        for i in range(r):
            for j in range(c):
                axs[i,j].imshow(gen_imgs[cnt,:,:,0], cmap='gray')
                axs[i,j].set_title("Digit: %d" % sampled_labels[cnt])
                axs[i,j].axis('off')
                cnt += 1
        #fig.savefig("images/%d.png" % epoch)
        #plt.close()
        plt.show()


if __name__ == '__main__':
    cgan = CGAN()
    cgan.train(epochs=4000, batch_size=32, sample_interval=200)