<a href="https://colab.research.google.com/github/franciscodlsb/MLSS2020TU/blob/master/MLSS2020TU_Practical_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Practical 5: Variational AutoEncoder

© Machine Learning Summer School - Telkom University

---


<table  class="tfo-notebook-buttons" align="left"><tr><td>
    
<a href="https://colab.research.google.com/github/adf-telkomuniv/MLSS2020_Telkom/blob/master/practical%205/MLSS2020TU%20-%20Practical%205.ipynb" source="blank" ><img src="https://colab.research.google.com/assets/colab-badge.svg"></a>
    
</td><td>
<a href="https://github.com/adf-telkomuniv/MLSS2020_Telkom/blob/master/practical%205/MLSS2020TU%20-%20Practical%205.ipynb" source="blank" ><img src="https://i.ibb.co/6NxqGSF/pinpng-com-github-logo-png-small.png"></a>
    
</td></tr></table>


An **Autoencoder** is a neural network which is an unsupervised learning algorithm which uses back propagation to generate output value which is almost close to the input value. It takes input such as image or vector anything with a very high dimensionality and run through the neural network and tries to compress the data into a smaller representation.

<p align='center'>
<img src='https://i.ibb.co/cb3dQLv/ae.png' width=80% />


While the basic idea behind a **Variational Autoencoder** is that instead of mapping an input to fixed vector, input is mapped to a distribution. The only difference between the autoencoder and variational autoencoder is that bottleneck vector is replaced with two different vectors one representing the mean of the distribution and the other representing the standard deviation of the distribution.

<p align='center'>
<img src="https://i.ibb.co/Sv2Tx2R/vae2.png" width=80% />


In this notebook we will examine the difference between Vanilla (Convolutional) AutoEncoder and Variational AutoEncoder in generating MNIST images

---
---
#[Part 0] Import Libraries and Load Data

---
## 1 - Import Library

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Model

from tensorflow.keras.layers import Lambda, Input, Dense
from tensorflow.keras.layers import Conv2D, Flatten
from tensorflow.keras.layers import Reshape, Conv2DTranspose

from tensorflow.keras import datasets
from tensorflow.keras.losses import mse, binary_crossentropy
from tensorflow.keras.utils import plot_model
from tensorflow.keras import backend as K

from IPython.display import Image


import numpy as np
import imageio, glob, os, datetime
import matplotlib.pyplot as plt
np.set_printoptions(precision=4)

%matplotlib inline
%load_ext tensorboard

tf.random.set_seed(13)

---
## 2 - Load MNIST Dataset

We will work with MNIST dataset this time. However, if you ever feel bored using it, you can use the other MNIST-like datasets such as the Clothing [Fashion-MNIST](https://www.tensorflow.org/datasets/catalog/fashion_mnist), Hiragana [Kuzushiji-MNIST](https://www.tensorflow.org/datasets/catalog/kmnist), or the Mini-SketchRNN data that we've prepared.

Uncomment to use MNIST Dataset

In [None]:
(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()
labels = range(10)

Uncomment to use Fashion-MNIST Dataset

In [None]:
# (X_train, y_train), (X_test, y_test) = datasets.fashion_mnist.load_data()
# labels = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

Uncomment to use Mini-Sketch Dataset

In [None]:
# !wget -q 'https://raw.githubusercontent.com/adf-telkomuniv/MLSS2020_Telkom/master/recources/mini-sketch.npy'
# (X_train, y_train), (X_test, y_test), labels = np.load('mini-sketch.npy', allow_pickle=True)

Reshape and Normalize Data

In [None]:
image_size = X_train.shape[1]

X_train = np.reshape(X_train, [-1, image_size, image_size, 1])
X_test  = np.reshape(X_test, [-1, image_size, image_size, 1])

X_train = X_train.astype('float32') / 255
X_test  = X_test.astype('float32') / 255

Visualize the first 20 images

In [None]:
print('Labels:',labels)

fig, ax = plt.subplots(2,10,figsize=(18,5))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for j in range(0,2):
    for i in range(0, 10):
        ax[j,i].imshow(X_train[i+j*10].reshape(28, 28), cmap='gray')
        ax[j,i].set_title(labels[y_train[i+j*10]] )
        ax[j,i].axis('off')
plt.show()

---
## 3 - Helper Functions

Below are several helper functions to visualize the generated image

---
### a. Plot Latent Space

This function visualize the latent space distribution extracted from `encoder` model

In [None]:
def plot_latent_space(encoder_model, data, batch_size=128, vae=False):

    x, y = data
    if vae:
        z, _, _ = encoder_model.predict(x, batch_size=batch_size)
    else:
        z       = encoder_model.predict(x, batch_size=batch_size)

    print('z range: ('+str(np.min(z))+','+str(np.max(z))+')')
    
    plt.figure(figsize=(12, 10))
    plt.scatter(z[:, 0], z[:, 1], c=y)
    plt.colorbar()
    plt.xlabel("z[0]")
    plt.ylabel("z[1]")
    plt.show()


---
### b. Generate Image

This function generate &nbsp;$ n $ &nbsp;images from random latent space &nbsp;$ z $  &nbsp;. 

The default range of &nbsp;$ z $  &nbsp; is $(-1..1)$

In [None]:
def generate_image(decoder_model, z_range=(-1,1), n=5):
    a,b = z_range
    z_sample = np.random.uniform(a, b, (n, 2))
    X_decoded = decoder_model.predict(z_sample)

    fig, ax = plt.subplots(1,n,figsize=(15,4.5))
    fig.subplots_adjust(hspace=0.1, wspace=0.1)
    for i in range(n):
        digit = X_decoded[i].reshape(28, 28)
        ax[i].imshow(digit, cmap='gray')
        ax[i].set_title(str(z_sample[i]))
        ax[i].axis('off')
    plt.show()

---
### c. Plot Interpolating Images
This function generate &nbsp;$n\times n$&nbsp; interpolating images generated from latent space &nbsp;$z$&nbsp;

The default range of &nbsp;$ z $  &nbsp; is $(-1..1)$

In [None]:
def plot_interpolating(decoder_model, z_range=(-1,1), n=20, 
                       save=False, filename='img.png', 
                       figsize=(10, 10)):

    # display a 30x30 2D manifold of digits
    a,b = z_range
    digit_size = 28
    figure = np.zeros((digit_size * n, digit_size * n))

    # linearly spaced coordinates corresponding to the 2D plot
    # of digit classes in the latent space
    grid_x = np.linspace(a, b, n)
    grid_y = np.linspace(a, b, n)[::-1]
    
    for i, yi in enumerate(grid_y):
        for j, xi in enumerate(grid_x):
            z_sample = np.array([[xi, yi]])
            X_decoded = decoder_model.predict(z_sample)
            digit = X_decoded[0].reshape(digit_size, digit_size)
            figure[i * digit_size: (i + 1) * digit_size,
                   j * digit_size: (j + 1) * digit_size] = digit

    start_range    = digit_size // 2
    end_range      = n * digit_size + start_range + 1
    pixel_range    = np.arange(start_range, end_range, digit_size)
    sample_range_x = np.round(grid_x, 1)
    sample_range_y = np.round(grid_y, 1)

    fig = plt.figure(figsize=figsize)
    plt.xticks(pixel_range, sample_range_x)
    plt.yticks(pixel_range, sample_range_y)
    plt.xlabel("z[0]")
    plt.ylabel("z[1]")
    plt.imshow(figure, cmap='Greys_r')

    if save:
        plt.savefig(filename)
        plt.close(fig)
    else:
        print('range:(',a,':',b,')')
        plt.show()

---
### d. Save Image Callback

This class defines a Keras callback to save interpolating image generated each training epoch

In [None]:
class SaveImage(tf.keras.callbacks.Callback):
    def __init__(self, decoder=None, base_dir=None):
        super(SaveImage, self).__init__()
        self.decoder = decoder
        self.base_dir = base_dir

    def on_epoch_end(self, epoch, logs={}):
        filename = self.base_dir+'/image'+str(epoch)+'.png'
        plot_interpolating(self.decoder, z_range=(-1,1), n=5,
                           save=True, filename=filename, 
                           figsize=(5,5))
        

---
### e. Plot History

In [None]:
def plot_history(history):
    plt.rcParams['figure.figsize'] = [6, 4]
    plt.plot(ae_hist.history['loss'])
    plt.plot(ae_hist.history['val_loss'])
    plt.title('Model loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'])
    plt.show()

---
### f. Generate GIF

This function generates a GIF animation from the saved images

In [None]:
def show_gif(base_dir, anim_file):    
    with imageio.get_writer(anim_file, mode='I') as writer:
        filenames = glob.glob(base_dir+'/image*.png')
        filenames = sorted(filenames)
        last = -1
        for i,filename in enumerate(filenames):
            frame = (i**0.5)
            if round(frame) > round(last):
                last = frame
            else:
                continue
            image = imageio.imread(filename)
            writer.append_data(image)
            writer.append_data(image)
        image = imageio.imread(filename)
        writer.append_data(image)
    print('GIF saved as', anim_file)
    
    with open(anim_file,'rb') as f:
      display(Image(data=f.read(), format='png'))

---
---
# [Part 1] Convolutional AutoEncoder

Now let's build our Convolutional AutoEncoder

If the input is denoted by $x$, the encoder $E$ and the decoder $D$, the reconstruction is $\hat{x} = D(E(x))$. In order to encourage reconstruction, we will minimize the mean squared error

<center>
<img src='https://i.ibb.co/1bfHYBS/ae2.png' width=50%>
</center>



$$
loss = ||x-\hat{x}||^2 = ||x-d(z)||^2 = ||x-d(e(x))||^2
$$

<br />

The space of representations is often called the latent space. We are interested in AEs as this latent space can potentially be a smaller dimensional and better representation of our data. We may also generate new data examples with an autoencoder, but let's return to this later.



---
## 1 - Conv-AE Encoder

First, define the Encoder model
> <font color='red'>**EXERCISE:** </font> Complete the Encoder

The architecture is as follow:
<pre>
    * the input shape is 3-dimensional <font color='blue'><b>(28,28,1)</b></font>  
    * <b>conv</b> layer with <font color='blue'><b>16</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * <b>conv</b> layer with <font color='blue'><b>32</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * flatten layer
    * <b>dense</b> layer with <font color='blue'><b>16</b></font> neurons using relu activation
    * output <b>dense</b> layer with <font color='blue'><b>2</b></font> neuron without activation
</pre>

In [None]:
input_shape = (image_size, image_size, 1)
latent_dim  = 2

# create Input() with shape of input_shape
encoder_input = Input(shape=input_shape, name='encoder_input')

# add conv2d layer to encoder_input with 16 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??(encoder_input)

# add conv2d layer to x with 32 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??

# add flatten layer to x 
x = ??

# add dense layer to x with 16 neurons and relu activation
x = ??

# add output layer to x using dense layer with latent_dim neurons, without activation
encoder_output = Dense(latent_dim, name='encoder_output')(x)


---

Then to instantiate encoder model


In [None]:
ae_encoder = Model(encoder_input, encoder_output, name='ae_encoder')

ae_encoder.summary()

Visualize the network architecture

In [None]:
plot_model(ae_encoder, show_shapes=True, show_layer_names=False, dpi=65, rankdir='LR')

---
## 2 - Conv-AE Decoder

Next, we define the Decoder model

> <font color='red'>**EXERCISE:** </font> Complete the Decoder

The architecture is as follow:
<pre>
    * the input shape is 1-dimensional <font color='blue'><b>(2,)</b></font>  
    * <b>dense</b> layer with <font color='blue'><b>7*7*32</b></font> neurons using relu activation
    * <b>reshape</b> layer to convert input into <font color='blue'><b>(7,7,32)</b></font> 3-dimensional matrix
    * <b>conv2dtranspose</b> layer with <font color='blue'><b>32</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * <b>conv2dtranspose</b> layer with <font color='blue'><b>16</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * output <b>conv2dtranspose</b> layer with <font color='blue'><b>1</b></font> filters of <font color='blue'><b>3x3</b></font>, and <b>sigmoid</b> activation
</pre>

In [None]:
# create Input() with shape of latent_dim
decoder_input = Input(shape=(latent_dim,), name='decoder_input')


# add dense layer to decoder_input with 7*7*32 neurons and relu activation
x = ??(decoder_input)

# add reshape layer to x to reshape back into (7, 7, 32)
x = ??

# add con2dtranspose to x with 32 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??

# add con2dtranspose to x with 16 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??


# add output layer to x using con2dtranspose layer with 1 filter, 
# kernel size 3, sigmoid activation, and padding same
decoder_output = Conv2DTranspose(filters=1, kernel_size=3, activation='sigmoid', 
                                 padding='same', name='decoder_output')(x)



---

Then to instantiate decoder model


In [None]:
ae_decoder = Model(decoder_input, decoder_output, name='ae_decoder')

ae_decoder.summary()

Visualize the network architecture

In [None]:
plot_model(ae_decoder, show_shapes=True, show_layer_names=False, dpi=65, rankdir='LR')

---
## 3 - Conv-AE Complete 

Lastly, we combine the Encoder and Decoder into single AutoEncoder model

In [None]:
# call ae_encoder() function with input encoder_input
encoded        = ae_encoder(encoder_input)

# call ae_decoder() function with input encoded
decoder_output = ae_decoder(encoded)

# call Model() function with input encoder_input and decoder_output
autoenc = Model(encoder_input, decoder_output, name='conv_autoencoder')

autoenc.summary()

Visualize the network architecture

In [None]:
plot_model(autoenc, show_shapes=True, show_layer_names=False, dpi=55, rankdir='LR', expand_nested=True)

---
## 4 - Train Convolutional AutoEncoder

The next step is to train the model

---
First let's compile using the model using `rmsprop` optimizer and `binary_crossentropy` loss

> <font color='red'>**EXERCISE:** </font> Try using other optimizer

In [None]:
autoenc.compile(optimizer='rmsprop', loss='binary_crossentropy')

---

Then train the model for 30 epochs

In [None]:
batch_size  = 256
epochs      = 30

base_dir = 'conv_ae'
tf.io.gfile.mkdir(base_dir)
myCallback = SaveImage(ae_decoder, base_dir)

logdir = os.path.join(base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

ae_hist = autoenc.fit(X_train, X_train,
                      epochs=epochs,
                      batch_size=batch_size,
                      validation_data=(X_test, X_test),
                      callbacks = [myCallback, tensorboard_callback],
                      verbose=2
                      )


---
Visualize Loss History

In [None]:
plot_history(ae_hist)

In [None]:
%tensorboard --logdir conv_ae

---
## 5 - Visualize Generated Images

Now let's visualize the result

---
### a. Latent Space Distribution

Plots labels and MNIST digits as function of 2-dim latent vector

You should see that the distribution of latent vector is clustered by class as AutoEncoder is intended to perform Dimensionality Reduction or Feature Extraction

The clustered distribution is good for Classification purposes, but not for Image Generation

In [None]:
batch_size=128
data_test = (X_test, y_test)
plot_latent_space(ae_encoder, data_test, batch_size=batch_size, vae=False)

---
### b. Reconstructed Images

Next, let's visualize the Reconstructed Images from several data test

You'll see that using shallow ConvNet and several epochs, we can already perform dimensionality reduction, and reconstruct the reduced data back into its original image

In [None]:
ori_img = X_test[:10]
rec_img = autoenc.predict(ori_img, verbose=0)

fig, ax = plt.subplots(2,10,figsize=(15,4.5))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for i in range(10):
    ax[0,i].imshow(ori_img[i].reshape(28, 28), cmap='gray')
    ax[0,i].set_title(y_test[i])
    ax[0,i].axis('off')
    ax[1,i].imshow(rec_img[i].reshape(28, 28), cmap='gray')
    ax[1,i].axis('off')
plt.show()


---
### c. Randomly Generated

Now let's try to generate &nbsp;$5$&nbsp; new images from a random latent vector &nbsp;$z$&nbsp;

You should see that the result is quite good

The network is able to generate new image from random input

In [None]:
generate_image(ae_decoder)

But what happened if we try to widen the vector random range to $(-5..5)$?

You might see that some images get corrupted because the decoder model is not trained to generate images from that input area

In [None]:
generate_image(ae_decoder, z_range=(-5,5))

You can try to generate single image by running the cell below

you can use a manually selected latent vector or generate a random vector

In [None]:
z_sample = np.random.uniform(-10, 10, (1, 2))
# z_sample = np.array([[0,0]])

X_decoded = ae_decoder.predict(z_sample)
digit = X_decoded.reshape(28, 28)
plt.imshow(digit, cmap='gray')
plt.axis('off')
plt.show()

---
### d. Plot Interpolation

Now let's generate the image interpolation generated from a range of latent vectors

First we generate $10\times 10$ image ranged from $(-1..1)$

In [None]:
plot_interpolating(ae_decoder, n=10)

Now let's try to widen the range. For that we deepen the interpolation to $20\times 20$

Again, you may notice that in some input ranges, the generated images begin to be unrecognizable

In [None]:
plot_interpolating( ae_decoder, n=20, z_range=(-10,10))

---
### e. Generate GIF

Now this is just for fun, we combine the saved image generated each epoch while training into a GIF animation

In [None]:
base_dir = 'conv_ae'

show_gif(base_dir, 'conv_autoenc.gif')

---
---
# [Part 2] Variational AutoEncoder

As already mentioned before, the basic idea behind a **Variational Autoencoder** is that instead of mapping an input to fixed vector, input is mapped to a distribution. 

A variational autoencoder can be defined as being an autoencoder whose training is regularised to avoid overfitting and ensure that the latent space has good properties that enable generative process.

<p align='center'>
<img src="https://i.ibb.co/k1hNtgs/vae3.png" width=65% />


The only difference between the autoencoder and variational autoencoder is that bottleneck vector is replaced with two different vectors one representing the mean of the distribution and the other representing the standard deviation of the distribution.



---
## 0 - Random Sampling Function

Function below is a helper function to generate a random latent vector &nbsp;$z$&nbsp; from input &nbsp;$mean$&nbsp; and &nbsp;$variance$

In [None]:
def sampling(args):
    
    z_mean, z_log_var = args

    batch = K.shape(z_mean)[0]
    dim = K.int_shape(z_mean)[1]

    # by default, random_normal has mean=0 and std=1.0
    epsilon = K.random_normal(shape=(batch, dim))

    return z_mean + K.exp(0.5 * z_log_var) * epsilon
    

---
## 1 - VAE Encoder

This defines the generative model which takes a latent encoding as input, and outputs the parameters for a conditional distribution of the observation, i.e. $p(x|z)$. Additionally, we use a unit Gaussian prior $p(z)$ for the latent variable.

<center>
<img src='https://i.ibb.co/4pbr5rL/vae4.png' width=40%>
</center>




Now to define the Encoder model

> <font color='red'>**EXERCISE:** </font> Complete the Encoder

The architecture is as follow:
<pre>
    * the input shape is 3-dimensional <font color='blue'><b>(28,28,1)</b></font>  
    * <b>conv</b> layer with <font color='blue'><b>16</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * <b>conv</b> layer with <font color='blue'><b>32</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * flatten layer
    * <b>dense</b> layer with <font color='blue'><b>16</b></font> neurons using relu activation

    * output 1: <b>dense</b> layer with <font color='blue'><b>2</b></font> neuron without activation
    * output 2: <b>dense</b> layer with <font color='blue'><b>2</b></font> neuron without activation
</pre>

In [None]:
input_shape = (image_size, image_size, 1)
latent_dim  = 2

# create Input() with shape of input_shape
v_encoder_input = Input(shape=input_shape, name='v_encoder_input')

# add conv2d layer to v_encoder_input with 16 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??(v_encoder_input)

# add conv2d layer to x with 32 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??

# add flatten layer to x 
x = ??

# add dense layer to x with 16 neurons and relu activation
x = ??

# ---

# generate latent vector Q(z|X)
z_mean    = Dense(latent_dim, name='z_mean')(x)

# add first output layer to x 
z_log_var = Dense(latent_dim, name='z_log_var')(x)

# generate random sampling from z_mean and z_log_var 
z = Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var])

# combine all outputs into a single list
v_encoder_output = [z_mean, z_log_var, z]



---

Then to instantiate encoder model


In [None]:
vae_encoder = Model(v_encoder_input, v_encoder_output, name='vae_encoder')

vae_encoder.summary()

Visualize the network architecture

In [None]:
plot_model(vae_encoder, show_shapes=True, show_layer_names=False, dpi=65, rankdir='LR')

---
## 2 - VAE Decoder

Then we define the Decoder model exactly the same as the previous Vanilla Decoder

> <font color='red'>**EXERCISE:** </font> Complete the Decoder

The architechture is as follow:
<pre>
    * the input shape is 1-dimensional <font color='blue'><b>(2,)</b></font>  
    * <b>dense</b> layer with <font color='blue'><b>7*7*32</b></font> neurons using relu activation
    * <b>reshape</b> layer to convert input into <font color='blue'><b>(7,7,32)</b></font> 3-dimensional matrix
    * <b>conv2dtranspose</b> layer with <font color='blue'><b>32</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * <b>conv2dtranspose</b> layer with <font color='blue'><b>16</b></font> filters of <font color='blue'><b>3x3</b></font>, using stride <font color='blue'><b>2</b></font>, and relu activation
    * output <b>conv2dtranspose</b> layer with <font color='blue'><b>1</b></font> filters of <font color='blue'><b>3x3</b></font>, and <b>sigmoid</b> activation
</pre>

In [None]:
# create Input() with shape of latent_dim
# set the name as 'v_decoder_input'
v_decoder_input = Input(shape=(latent_dim,), name='v_decoder_input')


# add dense layer to v_decoder_input with 7*7*32 neurons and relu activation
x = ?? (v_decoder_input)

# add reshape layer to x to reshape back into (7, 7, 32)
x = ??

# add con2dtranspose to x with 32 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??

# add con2dtranspose to x with 16 filters, kernel size 3, and strides 2
# use relu activation and padding same
x = ??


# add output layer to x using con2dtranspose layer with 1 filter, 
# kernel size 3, sigmoid activation, and padding same
v_decoder_output = Conv2DTranspose(filters=1, kernel_size=3, activation='sigmoid', 
                                 padding='same', name='v_decoder_output')(x)


---

Then to instantiate decoder model


In [None]:
vae_decoder = Model(v_decoder_input, v_decoder_output, name='vae_decoder')

vae_decoder.summary()

Visualize the network architecture

In [None]:
plot_model(vae_decoder, show_shapes=True, show_layer_names=False, dpi=65, rankdir='LR')

---
## 3 - VAE Complete

Now instantiate VAE model = VEncoder + VDecoder

In [None]:
# call vae_encoder() function with input v_encoder_input
v_encoded        = vae_encoder(v_encoder_input)[2]

# call vae_decoder() function with input v_encoded
v_decoder_output = vae_decoder(v_encoded)

# call Model() function with input v_encoder_input and v_decoder_output
vae = Model(v_encoder_input, v_decoder_output, name='variational_autoencoder')

vae.summary()

Visualize the network architecture

In [None]:
plot_model(vae, show_shapes=True, show_layer_names=False, dpi=55, rankdir='LR', expand_nested=True)

---
## 4 - Train Variational AutoEncoder

Next to train the Variational AutoEncoder

---
### a. Define Loss

<center>
<img src='https://i.ibb.co/bLLWkcy/vae5.png' width=50%>
</center>


\begin{align}
loss & = C||x-\hat{x}||^2 + KL[\mathcal{N}(\mu_x, \sigma_x), \mathcal{N}(0, I) ]\\\\
& = C||x-f(x)||^2 + KL[\mathcal{N}(g(x), h(x)), \mathcal{N}(0, I) ]\\\\
\end{align}

---

VAEs train by maximizing the evidence lower bound (ELBO) on the marginal log-likelihood:

$$\log p(x) \ge \text{ELBO} = \mathbb{E}_{q(z|x)}\left[\log \frac{p(x, z)}{q(z|x)}\right].$$

then we calculate the KL term to optimize it, thus we have
$$
D_{KL} = \frac{1}{2}\sum_k\Big(\exp(\Sigma(X))+\mu^2(X)-1-\Sigma(X)\Big)
$$

Now to calculate the total loss is defined by

    VAE loss = mse_loss + kl_loss


In [None]:
from tensorflow.keras.losses import mse

# calculate construction loss
reconstruction_loss = mse(K.flatten(v_encoder_input), K.flatten(v_decoder_output))
reconstruction_loss *= 28 * 28

# calculate KL Loss
kl_loss = K.exp(z_log_var) + K.square(z_mean) - 1 - z_log_var
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= 0.5

# calculate VAE Loss
vae_loss = K.mean(reconstruction_loss + kl_loss)

---
### b. Compile Model
Add loss to model and compile it

In [None]:
vae.add_loss(vae_loss)

vae.compile(optimizer='rmsprop')

---
### c. Train DC-VAE

In [None]:
batch_size  = 256
epochs      = 30

base_dir = 'var_ae'
tf.io.gfile.mkdir(base_dir)
myCallback = SaveImage(vae_decoder, base_dir)

logdir = os.path.join(base_dir, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

vae_hist = vae.fit(X_train, 
                   epochs=epochs,
                   batch_size=batch_size,
                   validation_data=(X_test, None),
                   callbacks = [myCallback, tensorboard_callback],
                   verbose=2
                   )


---
### d. Visualize Loss History

In [None]:
plot_history(vae_hist)

In [None]:
%tensorboard --logdir var_ae

---
## 5 - Visualize Generated Images

Now let's visualize the result

---
### a. Latent Space Distribution

Plots labels and MNIST digits as function of 2-dim latent vector

You should see that now the latent vector is more smoothly distributed across the space

In [None]:
batch_size=128
data_test = (X_test, y_test)

plot_latent_space(vae_encoder, data_test, batch_size=batch_size, vae=True)

---
### b. Randomly Generated

Now we skip directly to generate &nbsp;$5$&nbsp; new images from a random latent vector &nbsp;$z$&nbsp;

You should see that the result is better

The network is able to generate new image from random input

In [None]:
generate_image(vae_decoder)

But what happened if we try to widen the vector random range to $(-5..5)$?

You might see that the image is much better that using Vanilla AutoEncoder, even though the network is not trained to generate image from that range

In [None]:
generate_image(vae_decoder, z_range=(-5,5))

You can try to generate single image by running the cell below

you can use predefined latent vector or generate a random vector

In [None]:
z_sample = np.array([[0,0]])
X_decoded = vae_decoder.predict(z_sample)
digit = X_decoded.reshape(28, 28)
plt.imshow(digit, cmap='gray')
plt.axis('off')
plt.show()

---
### c. Plot Interpolation

Next let's generate the image interpolation generated from a range of latent vectors

First we generate $10\times 10$ image ranged from $(-1..1)$

You may see that the interpolation is done more smoothly

In [None]:
plot_interpolating(vae_decoder, n=10)

Now let's try to widen the range. For that we deepen the interpolation to $20\times 20$

You might see that the interpolation still results smooth images across wide range

In [None]:
plot_interpolating( vae_decoder, n=20, z_range=(-5,5))

---
### d. Generate GIF

Now this is just for fun, we combine the saved image generated each epoch while training into a GIF animation


In [None]:
base_dir = 'var_ae'

show_gif(base_dir, 'var_autoenc.gif')

---

# Congratulation

<font size=5> You've Completed Practical 5</font>

<p>Copyright &copy;  <a href=https://www.linkedin.com/in/andityaarifianto/>2020 - ADF</a> </p>