<a href="https://colab.research.google.com/github/mallibus/Unige-DL2019/blob/master/UNIGE_DL2019_3_GenerativeDeepLearning_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3. GenerativeDeepLearning

In [0]:
from __future__ import print_function
import tensorflow as tf
import os
import matplotlib.pyplot as plt
import numpy as np
from keras import backend as K
from keras.preprocessing import image
from scipy.stats import norm
%matplotlib inline


tf.enable_eager_execution()

print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))

# 3. Generative Deep Learning
Until now we have seen model that works with labeled data; now we move in an unsupervised setting: we have data $X$ without labels, and the goal is to learn some hidden or underlying structure of the data (e.g. in ML dimensionality reduction, ..).<br>
More specifically, the goal of generative models is to take as input training samples from some distribution and learn a model that represents that distribution; once we have that model, we can use it to generate new data.<br>
<img src="http://mlclass.epizy.com/lab3_images_notebook/sample.png" width=450px><br>
We want to learn $P_{model}(x)$ similar to $P_{data}(x)$.

We see three classes of models:
    1. Autoencoders 
    2. Variational Autoencoders (VAEs)
    3. Generative Adversarial Networks (GANs)

## 3.1 Autoencoders
An autoencoder is a neural network that is trained to attempt to copy its input to its output.<br>The network may be viewed as consisting of two parts: an encoder function $h = f(x)$ and a decoder that produces a reconstruction $r = g(h)$.<br>
<img src="http://mlclass.epizy.com/lab3_images_notebook/autoencoder.jpg" width=180px><br>
Traditionally, autoencoders were used for dimensionality reduction or feature learning; recently, theoretical connections with latent variable models have brought autoencoders to the forefront of generative modeling.<br>

Autoencoders learn a “compressed representation” of input automatically by first compressing the input (encoder) and decompressing it back (decoder) to match the original input.<br>
The learning is aided by using distance function that quantifies the information loss that occurs from the lossy compression.<br>
<img src="http://mlclass.epizy.com/lab3_images_notebook/autoenc.jpg" width=650px><br>

### Build a simple autoencoder

#### Load MNIST dataset

In [0]:
(x_train, _), (x_test, _) = # --fill here-- #

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

x_train  = # --fill here-- # reshape data (from 60000*28*28 to 60000*784)
x_test =  # --fill here-- # reshape data (from 60000*28*28 to 60000*784)

x_train.shape, x_test.shape

SyntaxError: ignored

#### Build a simple autoencoder
Until now we have built model by creating a sequential layer and then adding other layers.<br>
Another way to do it is to declare each layer specifing the previous layer (that produces the input for the current layer) and finally build the model:<br>

        e.g.  input_layer = tf.keras.layers.Input(shape=(n,))
              layer1 = tf.keras.layers.Dense(m)(input_layer)
              layer2 = tf.keras.layers.Dense(m)(layer1)
              model = tf.keras.models.Model(input_layer, layer2)

In [0]:
# this is the size of our encoded representations
encoding_dim = 32  # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# this is our input layer
input_img = tf.keras.layers.Input(shape=(x_train.shape[1],))

# "encoded" is the encoded representation of the input
encoded = tf.keras.layers.Dense(encoding_dim, activation='relu')(input_img)

# "decoded" is the lossy reconstruction of the input
decoded = tf.keras.layers.Dense(x_train.shape[1], activation='sigmoid')(encoded)

# this model maps an input to its reconstruction
autoencoder = # --fill here-- #


# this model maps an input to its encoded representation
encoder = tf.keras.models.Model(input_img, encoded)

# create a input layer for an encoded (32-dimensional) input to be used by decoder
encoded_input = tf.keras.layers.Input(shape=(encoding_dim,))

# retrieve the last layer of the autoencoder model
decoder_layer = # --fill here-- # 

# create the decoder model
decoder = tf.keras.models.Model(encoded_input, decoder_layer(encoded_input))

#### Compile the model

In [0]:
adam = tf.keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
autoencoder.compile(optimizer=adam, loss='binary_crossentropy')

#### Train the model

In [0]:
history = autoencoder.fit(x_train, x_train, 
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

#### Plot training & validation loss values

In [0]:
def plot_hist(history):
    # --fill here-- #

In [0]:
plot_hist(history)

#### Visualize some reconstructed images (outputs of the decoder $r = g(h)$)

In [0]:
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

n = 10  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(1,n+1):
    # display original
    ax = plt.subplot(2, n, i)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

#### Visualize some outputs of the encoder $h = f(x)$

In [0]:
n = 10
plt.figure(figsize=(20, 8))
for i in range(1,n+1):
    ax = plt.subplot(1, n, i)
    plt.imshow(encoded_imgs[i].reshape(4, 8).T)
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

#### Try to use a regularizer (as L1) in the encoder. How does this affect the results? What can you observe?
#Induce sparsity, less validation loss value, reconstructed images less accurate

In [0]:
# --fill here-- #
# hint: Let's train this model for 100 epochs (with the added regularization the model is less likely to overfit and can be 
# trained longer).

### Build a deeper autoencoder

In [0]:
input_img = # --fill here-- #
encoded = # --fill here-- #
encoded = # --fill here-- #
encoded = # --fill here-- # units = encoding_dim

decoded = # --fill here-- #
decoded = # --fill here-- #
decoded = # --fill here-- #

autoencoder = # --fill here-- #

#### Compile the model

In [0]:
# --fill here-- #

#### Train the model

In [0]:
# --fill here-- # think about the epochs required

#### Plot training & validation loss values

In [0]:
# --fill here-- #

#### Visualize some reconstructed imagges

In [0]:
# --fill here-- #

### Build a Convolutional Autoencoder

#### Load MNIST dataset

In [0]:
(x_train, _), (x_test, _) = # --fill here-- #

x_train = # --fill here-- # normalize
x_test = # --fill here-- # normalize


x_train = # --fill here-- # reshape (60000,28,28)->(60000,28,28,1)
x_test = # --fill here-- # reshape (60000,28,28)->(60000,28,28,1)

#### Build a convolutional autoencoder

In [0]:
input_img = tf.keras.layers.Input(shape=(28, 28, 1))  

x = tf.keras.layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = tf.keras.layers.MaxPooling2D((2, 2), padding='same')(x)
x = # --fill here-- # Conv2D
x = # --fill here-- # MaxPooling
x = # --fill here-- # Conv2D
encoded = # --fill here-- # MaxPooling

# at this point the representation is (4, 4, 8) i.e. 128-dimensional

x = tf.keras.layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = tf.keras.layers.UpSampling2D((2, 2))(x)
x = # --fill here-- # Conv2D
x = # --fill here-- # UpSampling
x = # --fill here-- # Conv2D
x = # --fill here-- # UpSampling
decoded = tf.keras.layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = # --fill here-- #

#### Compile the model

In [0]:
# --fill here-- #

#### Train the model

In [0]:
# --fill here-- #

#### Plot training & validation loss values

In [0]:
# --fill here-- #

#### Visualize some reconstructed images

In [0]:
# --fill here-- #

Two interesting practical applications of autoencoders are data denoising and dimensionality reduction for data visualization.<br>
In this part of the lab we see the application to image denoising.

### Image Denoising with Convolutional Autoencoder

#### Create noisy samples

In [0]:
# If you have just done the previous part (Convolutional Autoencoder), you've just load the dataset;
# otherwise you have to load data 

noise_factor = # --fill here-- #
x_train_noisy = # --fill here-- # add noise to x_train
x_test_noisy = # --fill here-- # add noise to x_test

x_train_noisy = np.clip(x_train_noisy, 0., 1.) # required to put all the values in [0,1]
x_test_noisy = np.clip(x_test_noisy, 0., 1.)

#### Show some noisy samples

In [0]:
n = 10
plt.figure(figsize=(20, 2))
for i in range(1,n):
    ax = plt.subplot(1, n, i)
    plt.imshow(x_train_noisy[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

#### Build a Convolutional Autoencoder

In [0]:
input_img = # --fill here-- #
x = # --fill here-- #
x = # --fill here-- #
x = # --fill here-- #
encoded = # --fill here-- #

# at this point the representation is (7, 7, 32)

x = # --fill here-- #
x = # --fill here-- #
x = # --fill here-- #
x = # --fill here-- #
decoded = # --fill here-- #

autoencoder = # --fill here-- #

#### Compile the model

In [0]:
# --fill here-- #

#### Train the model

In [0]:
# --fill here-- #

#### Visualize reconstructed (denoised) images

In [0]:
# --fill here-- #

#### Try to play with the noise_factor value: is there a threshold from which is not possible to reconstruct the image?
#### How does the increase of the noise_factor value affect the number of epochs required in the training process?

## 3.2 Generative Adversarial Model

GAN is a framework in which two models are simultaneously trained:
     - a generative model G that captures the data distribution
     - a discriminative model D that estimates the probability tha a sample came from the training data rather than G.
<br>
    
<img src="http://mlclass.epizy.com/lab3_images_notebook/gan.png" width=700px><br><br>


$G$ is pitted against an adversary, $D$, that learns to determine whether a sample is from the model distribution or the data distribution.<br>
$G$  generates samples by passing random noise through a MLP; $D$ is also a MLP so we can train both models using only backpropagation and dropout algorithms and samples from $G$ using only forward propagation.<br>

Schematically, the GAN looks like this:
    1. A generator network maps vectors of shape (latent_dim,) to images of shape (32, 32, 3).
    2. A discriminator network maps images of shape (32, 32, 3) to a binary score estimating the probability that the image is real.
    3. A gan network chains the generator and the discriminator together: gan(x)=discriminator(generator(x)). Thus this gan         network maps latent space vectors to the discriminator’s assessment of the realism of these latent vectors as decoded by       the generator.
    4. You train the discriminator using examples of real and fake images along with “real”/“fake” labels, just as you train       any regular image-classification model.
    5 To train the generator, you use the gradients of the generator’s weights with regard to the loss of the gan model. 
    This means, at every step, you move the weights of the generator in a direction that makes the discriminator more likely to classify as “real” the images decoded by the generator (you train the generator to fool the discriminator).

#### Load cifar10 dataset

In [0]:
(x_train, y_train), (_, _) = tf.keras.datasets.cifar10.load_data() 

# select only one class (defined as 0 airplane in example)
x_train = x_train[y_train.flatten() == 0] 
# 0 airplane, 1 automobile, 2 bird, 3 cat, 4 deer, 5 dog, 6 frog, 7 horse, 8 ship, 9 truck

#normalize
x_train = x_train.astype('float32') / 255.

#### Set the network parameters

In [0]:
latent_dim = 32
height = x_train.shape[1]
width = x_train.shape[2]
channels = x_train.shape[3]

#### 1. Generator
Instead of max pooling, is recommended to use strided convolutions for downsampling, and LeakyReLU layer instead of a ReLU activation.<br>
It’s similar to ReLU, but it relaxes sparsity constraints by allowing small negative activation values.<br><br>
<img src="http://mlclass.epizy.com/lab3_images_notebook/leakyrelu.jpeg" width=600px>

In [0]:
generator_input = tf.keras.layers.Input(shape=(latent_dim,))

# Transforms the input into a 16 × 16 128-channel feature map
x = tf.keras.layers.Dense(128 * 16 * 16)(generator_input)
x = tf.keras.layers.LeakyReLU()(x)
x = tf.keras.layers.Reshape((16, 16, 128))(x)

x = tf.keras.layers.Conv2D(256, 5, padding='same')(x)
x = tf.keras.layers.LeakyReLU()(x)

# Upsamples to 32 × 32
x = tf.keras.layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)
x = tf.keras.layers.LeakyReLU()(x)

x = tf.keras.layers.Conv2D(256, 5, padding='same')(x) 
x = tf.keras.layers.LeakyReLU()(x)
x = tf.keras.layers.Conv2D(256, 5, padding='same')(x)
x = tf.keras.layers.LeakyReLU()(x) 
x = tf.keras.layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)

# Instantiates the generator model, which maps the input of shape (latent_dim,) into an image of shape (32, 32, 3)
generator = tf.keras.models.Model(generator_input, x) 
generator.summary()

#### 2. Discriminator

In [0]:
discriminator_input = tf.keras.layers.Input(shape=(height, width, channels))

x = tf.keras.layers.Conv2D(128, 3)(discriminator_input)
x = tf.keras.layers.LeakyReLU()(x)
x = tf.keras.layers.Conv2D(128, 4, strides=2)(x)
x = tf.keras.layers.LeakyReLU()(x)
x = tf.keras.layers.Conv2D(128, 4, strides=2)(x)
x = tf.keras.layers.LeakyReLU()(x)
x = tf.keras.layers.Conv2D(128, 4, strides=2)(x)
x = tf.keras.layers.LeakyReLU()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dropout(0.4)(x)

# Classification layer
x = tf.keras.layers.Dense(1, activation='sigmoid')(x)

# Instantiates the discriminator model, which turns a (32, 32, 3) input into a binary classification decision (fake/real)
discriminator = tf.keras.models.Model(discriminator_input, x)
discriminator.summary()

#### Compile the model

In [0]:
# another suggestion is lr =0.0005 if the discrimnator is prevailing
discriminator_optimizer = tf.keras.optimizers.RMSprop(lr=0.0008,clipvalue=1.0,decay=1e-8)

# --fill here-- # compile the descriminator: loss to be used is binary crossentropy

#### 3. Adversarial Network (chains the generator and the discriminator)

In [0]:
#Sets discriminator weights to non-trainable (this will only apply to the gan model)
discriminator.trainable = False 

gan_input = tf.keras.layers.Input(shape=(latent_dim,)) 
gan_output = discriminator(generator(gan_input))

gan = tf.keras.models.Model(gan_input, gan_output)

#### Compile the model

In [0]:
gan_optimizer = tf.keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0,decay=1e-8)

# --fill here-- # compile the gan: loss to be used is binary crossentropy

#### Train the gan 
Schematically, the training process looks like: <br>
    - For each epoch:
        1. Draw random points in the latent space (random noise)
        2. Generate images with generator using this random noise
        3. Mix the generated images with real ones
        4. Train discriminator using these mixed images (“real”/“fake” (generated by the generator))
        5. Draw new random points in the latent space
        6. Train gan using these random vectors, with targets that all say “these are real images.” This updates the weights of the generator (only, because the discriminator is frozen inside gan) to move them toward getting the discriminator to predict “these are real images” for generated images: this trains the generator to fool the discriminator.

In [0]:
#!!! PREDEFINED ONLY 100 ITERATIONS, BUT WILL NEED MORE FOR DECENT RESULTS !!!#
iterations = 100
batch_size = 20
save_dir = 'generated_images'
start = 0

for step in range(iterations):
    
    # 1. sample random point in the latent space
    random_latent_vectors = # --fill here-- # use np.random.normal
    
    # 2. generate images with generator
    generated_images = generator.predict(random_latent_vectors)
    
    # 3. mix them with real images
    stop = start + batch_size
    real_images = x_train[start: stop]
    combined_images = # --fill here-- # use np.concatenate
    
    # assembles labels, discriminating real from fake images
    labels = np.concatenate([np.ones((batch_size, 1)),np.zeros((batch_size, 1))])
    
    # adds random noise to the labels
    labels += 0.05 * np.random.random(labels.shape)
    
    # 4. train the discriminator
    d_loss = discriminator.train_on_batch(combined_images, labels)
    
    # 5. draw new random points in the latent space
    random_latent_vectors = # --fill here-- #
    
    # assembles latent space labels that say “these are all real images” (all zeros)
    misleading_targets = # --fill here-- #
    
    # 6. trains the generator (via the gan model, where the discriminator weights are frozen)
    # train_on_batch methods runs a single gradient update on a single batch of data.
    a_loss = gan.train_on_batch(random_latent_vectors,misleading_targets)
    
    start += batch_size
    if start > len(x_train) - batch_size:
        start = 0

    if step % 10 == 0:
        gan.save_weights('gan.h5')
        print('discriminator loss:%s; adversarial loss:%s'%(d_loss, a_loss))
    
    if step % 100 == 0:
        imgG = image.array_to_img(generated_images[0] * 255., scale=False)
        plt.subplot(1, 2, 1)
        plt.imshow(imgG)
        #img.save(os.path.join(save_dir,'generated' + str(step) + '.png'))
        imgR = image.array_to_img(real_images[0] * 255., scale=False)
        plt.subplot(1, 2, 2)
        plt.imshow(imgR)
        plt.show()
        #img.save(os.path.join(save_dir,'real' + str(step) + '.png'))

When training, you may see the adversarial loss begin to increase considerably, while the discriminative loss tends to zero so the discriminator may end up dominating the generator.<br>
If that’s the case, try reducing the discriminator learning rate, and increase the dropout rate of the discriminator.

#### Plot some examples from training set

In [0]:
plt.figure(figsize=(8,8))

for i in range(36):
    plt.subplot(6, 6, i + 1)
    plt.axis("off")
    plt.imshow(image.array_to_img(x_train[i] * 255., scale=False))

plt.subplots_adjust(hspace=0, wspace=0)
plt.show()

#### Plot some generated images

In [0]:
plt.figure(figsize=(8,8))

for i in range(36):
    plt.subplot(6, 6, i + 1)
    plt.axis("off")
    plt.imshow(image.array_to_img(generated_images[i] * 255., scale=False), aspect='auto')

plt.subplots_adjust(hspace=0, wspace=0)
plt.show()