### Autoencoders 

One of the models used for generative modeling is *autoencoder*. It consists of **encoder** part, which tries to encode an item in a latent space, and **decoder**, which turns this latent representation back to the original space, trying to re-create it. This way, we can only use a decoder part to generate totally new items. 

Encodings are also called *embeddings*

In [13]:
import numpy as np
from tensorflow.keras import layers 
from tensorflow.keras import models
import tensorflow.keras.backend as K

In [1]:
from tensorflow.keras import datasets
(x_train, y_train), (x_test, y_test) = datasets.fashion_mnist.load_data()

In [6]:
x_train.shape

(60000, 28, 28)

In [9]:
def preprocess(imgs):
    imgs = imgs.astype("float32") / 255.0
    imgs = np.pad(imgs, ((0, 0), (2, 2), (2, 2)), constant_values=0.0)
    imgs = np.expand_dims(imgs, -1)
    return imgs 

x_train = preprocess(x_train)
x_test = preprocess(x_test)

In [10]:
x_train.shape

(60000, 32, 32, 1)

#### Autoencoder Architecture
It will consist of *encoder* + *decoder*, where the training process will consist of reconstructing the original input. This way, the network will learn the latent representation of the data. 

In [16]:
encoder_input = layers.Input(
    shape=(32, 32, 1), name="encoder_input"
)
# We can think of each filter as capturing 
# a different set of characteristics 
x = layers.Conv2D(filters=32, kernel_size=(3, 3), strides=2, activation='relu', padding='same')(encoder_input)
x = layers.Conv2D(filters=64, kernel_size=(3, 3), strides=2, activation='relu', padding='same')(x)
x = layers.Conv2D(filters=128, kernel_size=(3, 3), strides=2, activation='relu', padding='same')(x)

shape_before_flattening = K.int_shape(x)[1:]

x = layers.Flatten()(x)
encoder_output = layers.Dense(2, name='encoder_output')(x)

encoder = models.Model(encoder_input, encoder_output)

In [18]:
encoder.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 encoder_input (InputLayer)  [(None, 32, 32, 1)]       0         
                                                                 
 conv2d_3 (Conv2D)           (None, 16, 16, 32)        320       
                                                                 
 conv2d_4 (Conv2D)           (None, 8, 8, 64)          18496     
                                                                 
 conv2d_5 (Conv2D)           (None, 4, 4, 128)         73856     
                                                                 
 flatten_1 (Flatten)         (None, 2048)              0         
                                                                 
 encoder_output (Dense)      (None, 2)                 4098      
                                                                 
Total params: 96,770
Trainable params: 96,770
Non-trainable

In [17]:
print(shape_before_flattening)

(4, 4, 128)


In [None]:
decoder_input = layers.Input(2, name='decoder_input')

x = layers.Dense(shape_before_flattening)(decoder_input)
x = layers.Reshape(target_shape=(4, 4, 128))
x = layers.Conv2DTranspose(128, (3, 3), strides=2, activation='relu', padding='same')(x)
x = layers.Conv2DTranspose(64, (3, 3), strides=2, activation='relu', padding='same')(x)
x = layers.Conv2DTranspose(32, (3, 3), strides=2, activation='relu', padding='same')(x)
decoder_output = layers.Conv2D(1, (3, 3), strides=1, activation='sigmoid', padding='same', name='decoder_output')(x)

decoder = models.Model(decoder_input, decoder_output)