## For the interested reader:

Even though the current script is public, there is still much more work needed to fully complete it. I am personally aiming to add visualization to better understand autoencoders, and I want to add well developed summaries after each implemeneted autoencoder section. Furthermore, I aim to also go in more detail for latent space exploration on the denoising deep autoencoder, and I still need to start a walkthrough for the variation autoencoder. That being said, I did finish writing the script for a simple fully connected autoencoder and deep autoencoder along with a denoising deep autoencoder. Lastly, please leave comments on how should I add more information and context to improve this comprehensive walkthrough on autoencoders. 

# How do we implement Autoencoders? 


Autoencoders are unique because they are considered generative neural networks, and each autoencoder contains a latent space which potentially learns relevant features either in a lower or high representation than the input data. For our walkthrough, we will always focus on latent spaces which are lower dimensional than the input data, that way, we can work with lower-dimensional data amenable for further human analysis. The main takeways of an autoencoder is the neural networks map the input data back to itself. However, in most successful autoencoders, constraints are placed on top of the input data before mapping to the outputs. For example, denoising autoencoders constraints the mapping by adding noise to the input data and then generating the original data (before adding noise) as the output data. Adding constraints is important to have learn useful features in the latent space because these same constraints helps avoid f (neural network mapping) become an identity function. 

We will walk through various autoencoders: 

    1.) Fully connected autoencoder
    2.) Deep Convolutional Auotencoder
    3.) Denoising Convolutional Autoencoder: Added noise to input image + convolutional operations
    4.) Variational Autoencoder (to be completed ...): Assume and constraint the latent space as a Gaussian distribution
    
    
Not only does autoencoder help lower dimensional of the input data and extract important features within the latent space, autoencoders can also generate completely new data by exploring the latent space and using the decoder end of the network to output data.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import numpy as np 
import pandas as pd
import os

In [None]:
import tensorflow as tf
tf.__version__
device_name = tf.test.gpu_device_name()
if "GPU" not in device_name:
    print("GPU device not found")
print('Found GPU at: {}'.format(device_name))

In [None]:
tf.test.is_gpu_available()

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.metrics import Precision, Recall, AUC
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Input, Flatten,BatchNormalization,Activation
from tensorflow.keras.layers import GlobalMaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.callbacks import ModelCheckpoint, Callback, EarlyStopping

In [None]:
main_path = '../input/digit-recognizer/'
train_df = pd.read_csv(main_path + 'train.csv')
test_df = pd.read_csv(main_path + 'test.csv')

### Prep the data:

In [None]:
i = 3*32
img_size = (28*28,)
X, y = np.zeros((i,) + img_size, dtype = "float32"), np.zeros((i,) + img_size, dtype = "float32")
for sample in range(i):
    img = train_df.iloc[sample,1:]
    X[ sample,:] = img
    y[ sample,:] = img

In [None]:
img.shape, X.shape, X.reshape(96,784,).shape

In [None]:
# split the training and testing dataframes    
def split_train_DF(df, train_perc):
    # train_perc --> the percentage in the training set
    final_train_df = df.iloc[0:round(train_perc*len((df.label))),:]
    val_df = df.iloc[round(train_perc*len((df.label))):,:]
    
    return final_train_df, val_df    


# catch statement: checks and verifies if the split was correct and 
# information/data was lost    
def verify_traintest_split(dataset, train_set, test_set):
    
    Total, train_A, train_B = len(dataset), len(train_set), len(test_set)
    if Total == (train_A + train_B): print('Splitting the dataset into testing and training is successful ...\n')
    else: print('Splitting the dataset into testing and training failed ...')
    return

In [None]:
from sklearn.model_selection import KFold
import matplotlib.pyplot as plt


train_perc = 0.8
new_train_df, val_df = split_train_DF(train_df, train_perc)
# check if the dataframe was properly split ... 
verify_traintest_split(train_df, new_train_df, val_df)
column_names = train_df.columns

In [None]:
from tensorflow.keras import layers
from tensorflow.keras.models import Model




def get_model(latent_space = 32):

    input_img = Input(shape = (28*28,))
    encoded = Dense(latent_space, activation = 'relu')(input_img)
    decoded = Dense(28*28, activation = 'sigmoid')(encoded)
    return Model(input_img, decoded)


perc_autoencoder = get_model()
opt = tf.keras.optimizers.SGD(learning_rate=0.0100000231231)
perc_autoencoder.compile(optimizer=opt, loss='binary_crossentropy')
perc_autoencoder.summary()

In [None]:
train_df.iloc[:,0]

In [None]:
from tqdm import tqdm 
from sklearn.model_selection import train_test_split
import numpy as np


X = np.zeros((len(train_df), 28*28,), dtype='float32')
y = np.zeros((len(train_df), 1), dtype='float32')

for sample in tqdm(range(len(train_df))):
    X[sample,:] = train_df.iloc[sample,1:].values
    X[sample,:] *= 1/255.0
    y[sample,0] = train_df.iloc[sample, 0]
    
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.20, random_state = 42)

In [None]:
train_history = perc_autoencoder.fit(X_train,X_train,  validation_data = (X_val,X_val), 
                     epochs = 200, batch_size = 250, verbose = 2, shuffle = True)

In [None]:
plt.figure(figsize = (12,7))
plt.plot(train_history.history['loss'], 'r', LineWidth = 3, alpha = 0.45, label = 'training loss')
plt.plot(train_history.history['val_loss'], 'b', LineWidth = 3, alpha = 0.45, label = 'validation loss')
plt.xlabel('Epochs', fontsize = 18)
plt.ylabel('Loss', fontsize = 18)
plt.title('Comparing  loss between training and validation datasets', fontsize = 18, fontweight = 'bold')
plt.legend(fontsize = 18)
plt.show()

# Deep Convolutional Autoencoder: 

Earlier, we showed how to construct a shallow autoencoder and train it on MNIST image data curated as vector representations. We achieved a poor loss performance ~0.3 for the validation and training data. This poor learning is mainly due to the lack of depth for the autoencoder, and more importantly, the lack of implementing convoltuional layers within the architecture. These convolutional layers allow to learn higher spatial features, drastically improving the represetnations found in the latent space of the autoencoder. So within the next example, we will construct a deep convoltuional autoencoder with a latent space that is amenable for our exploration and feature extraction.

Before we continue with implementing a deep convolutional autoencoder, let's free up some memory within our RAM ...

In [None]:
import gc
import psutil

# Free up space
process = psutil.Process(os.getpid())
print('Before deleting training data:', process.memory_info().rss)
del X, X_train, X_val; gc.collect()
print('After deleting training data:', process.memory_info().rss)

Here we convert the dataframe data columns corresponding to the pixel values to tensors, representing 2D image data rather 1D vectors.

In [None]:
X = np.zeros((len(train_df), 28, 28, 1))
y = np.zeros((len(train_df), 1))
for sample in tqdm(range(len(train_df))):
    X[sample,:,:,0] = train_df.iloc[sample,1:].values.reshape(28,28,)
    y[sample,0] = train_df.iloc[sample,0]
print('Final shape:', X.shape, y.shape)

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.20, random_state = 42)

Here we construct the model architecture

In [None]:
from tensorflow.keras import layers
from tensorflow.keras.models import Model


def gen_DCNN_autoencoder():
    input_layer = Input(shape = (28,28,1))

    x = layers.Conv2D(8, (3,3), padding = 'same', activation = 'relu')(input_layer)
    b = layers.BatchNormalization()(x)
    x = layers.Conv2D(16, (3,3), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(x)
    mp = layers.MaxPooling2D((2,2), padding = 'same')(b)
    x = layers.Conv2D(32, (3,3), padding = 'same', activation = 'relu')(mp)
    b = layers.BatchNormalization()(x)
    x = layers.Conv2D(32, (3,3), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(x)    

    encoder = layers.MaxPooling2D((2,2), padding = 'same', name = 'encoding_z-space')(b)


    t = layers.Conv2DTranspose(32, (3,3), strides = (2,2),  padding = 'same', activation = 'relu')(encoder)
    b = layers.BatchNormalization()(t)
    t = layers.Conv2DTranspose(32, (3,3), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(t)
    t = layers.Conv2DTranspose(16, (3,3),  strides = (2,2), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(t)
    t = layers.Conv2DTranspose(8, (3,3), padding = 'same', activation = 'relu')(b)

    decoded = layers.Conv2D(1, (3,3), padding = 'same', activation = 'sigmoid')(t)

    autoencoder = Model(input_layer, decoded)
    autoencoder.compile(optimizer = 'adam', loss = 'mean_squared_error')

    return autoencoder

In [None]:
# normalize the data such that pixels are binned between [0., 1.]
X_train *= 1/255.0
X_val *= 1/255.0

In [None]:
autoencoder = gen_DCNN_autoencoder()
history = autoencoder.fit(X_train, X_train, validation_data = (X_val,X_val), epochs = 30, batch_size = 250, shuffle=True, verbose = 2)

With out loss results on the training and validation datasets, we should be very impressed. However, in general for autoencoders, this is not a good sign because our neural network is potentially only acting as an identity function which maps the input image back to itself. This is an issue since we lose the opportunity to learn imperative feature representations in the latent space, and these imperative feature representations can be extracted so that we can find way to better construct classification models. To avoid these issues, we will need to add extra constraints to the autoencoder, e.g., we can always decrease the complexity of the model. While this a plausible approach, we will decrease the size of the latent space and also implement a denosing autoencoder in the section.

In [None]:
plt.figure(figsize = (25,5))
plt.plot(history.history['loss'], 'r', LineWidth = 3, alpha = 0.45, label = 'training loss')
plt.plot(history.history['val_loss'], 'b', LineWidth = 3, alpha = 0.45, label = 'validation loss')
plt.xlabel('Epochs', fontsize = 18)
plt.ylabel('Loss', fontsize = 18)
plt.title('Comparing loss between training and validation datasets', fontsize = 18, fontweight = 'bold')
plt.legend(fontsize = 18)
plt.show()

Below, we will juxtapose the original input image with the correponding generate image from the neural network. We can see that the autoencoder generate a lower intensity image comparative to the original image. However, overall, the original and generated image is completely identical in terms of the spatial features.

In [None]:
for sample in range(10):
    plt.figure(figsize = (5,5))
    plt.subplot(1,2,1)
    plt.imshow(X_val[sample].reshape(28,28,), cmap = 'gray')
    plt.title("Original image", fontsize = 18, fontweight = 'bold')
    plt.axis('off')

    plt.subplot(1,2,2)
    plt.imshow(autoencoder.predict(X_val[sample].reshape(1,28,28,1)).reshape(28,28,), cmap = 'gray')
    plt.title("Model prediction", fontsize = 18, fontweight = 'bold')
    plt.axis('off')
    plt.tight_layout()
    plt.show()

Below, we will generate an image that contains only pixels plucked from a uniform distribution between [0,1] and compare it with the autoencoder generate image. Even though the output is nonsense, it is interesting to see how the output drastically changes comparative to the input.

In [None]:
noise = np.random.uniform(0,1,(28,28,1))
plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.imshow(noise.reshape(28,28,), cmap = 'gray')
plt.title('Uniform distribution between [0,1]', fontsize = 18, fontweight = 'bold')
plt.axis('off')

plt.subplot(1,2,2)
plt.imshow(autoencoder.predict(noise.reshape(1,28,28,1)).reshape(28,28,), cmap = 'gray')
plt.title('Model predicted image', fontsize = 18, fontweight = 'bold')
plt.axis('off')

plt.tight_layout()
plt.show()

# Denoising Deep Autoencoder: 

Here, we add constraints to the previous Deep Autoencoder in two ways: 
    1. Decrease the size of the latent space
    2. Add gaussian noise to the input images and let the neural network generate the original image
    
The benefite of decreasing the latent space is that we lower dimensional of the feature space, allowing for additional analysis that much more amenable to humans comparative to deep neural networks, e.g., a logisitic regression or SVM. Additionally, adding noise to an input image and attempting to create a mapping function that generates the original image help avoid the trivial solution, where the neural network acts as an identity function.

In [None]:

def Denoise_autoencoder(z):
    input_layer = Input(shape = (28,28,1))

    x = layers.Conv2D(8, (3,3), padding = 'same', activation = 'relu')(input_layer)
    b = layers.BatchNormalization()(x)
    mp = layers.MaxPooling2D((2,2), padding = 'same')(b)
    x = layers.Conv2D(16, (3,3), padding = 'same', activation = 'relu')(mp)
    b = layers.BatchNormalization()(x)
    mp = layers.MaxPooling2D((2,2), padding = 'same')(b)
    x = layers.Conv2D(16, (3,3), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(x)    
    mp = layers.MaxPooling2D((2,2), padding = 'same')(b)
    x = layers.Conv2D(32, (3,3), padding = 'same', activation = 'relu')(mp)
    b = layers.BatchNormalization()(x)    
    mp = layers.MaxPooling2D((2,2), padding = 'same')(b)
    x = layers.Conv2D(64, (3,3), padding = 'same', activation = 'relu')(mp)
    b = layers.BatchNormalization()(x)    
    mp = layers.MaxPooling2D((2,2), padding = 'same')(b)
    x = layers.Conv2D(z, (3,3), padding = 'same', activation = 'relu')(mp)
    b = layers.BatchNormalization()(x)    
    
    
    encoder = layers.MaxPooling2D((2,2), padding = 'same', name = 'encoding_z-space')(b)
    

    t = layers.Conv2DTranspose(64, (3,3), strides = (2,2),  padding = 'valid', activation = 'relu')(encoder)
    b = layers.BatchNormalization()(t)
    t = layers.Conv2DTranspose(32, (3,3),  strides = (2,2), padding = 'valid', activation = 'relu')(b)
    b = layers.BatchNormalization()(t)
    t = layers.Conv2DTranspose(16, (3,3),  strides = (2,2), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(t)
    t = layers.Conv2DTranspose(16, (3,3),  strides = (2,2), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(t)
    t = layers.Conv2DTranspose(8, (3,3), padding = 'same', activation = 'relu')(b)
    b = layers.BatchNormalization()(t)

    
    
    decoded = layers.Conv2D(1, (3,3), padding = 'same', activation = 'sigmoid')(b)

    opt = tf.keras.optimizers.Adam(learning_rate=0.000100000231231)
    autoencoder = Model(input_layer, decoded)
    autoencoder.compile(optimizer = 'adam', loss = 'mean_squared_error')

    return autoencoder
denoise_ae = Denoise_autoencoder(3)
denoise_ae.summary()

In [None]:
import cv2 as cv
import random

sample = random.randint(0, 3000)
img = X_train[sample,:,:,0].copy()
noise_uni = cv.randu(img,(0),(1))

# Present the before and after adding noise ...
plt.imshow(X_train[sample,:,:,0].reshape(28,28,), cmap='gray')
plt.show()
plt.imshow((X_train[sample,:,:,0] + noise_uni), cmap='gray')
plt.show()

In [None]:
X_train_noise, X_val_noise = np.zeros((len(X_train), 28,28,1)), np.zeros((len(X_val), 28, 28, 1))

for sample in tqdm(range(len(X_train))):
    
    img = X_train[sample,:,:,0].copy()
    noise_img = cv.randu(img,(0),(0.35))
    X_train_noise[sample,:,:,0] =  X_train[sample,:,:,0] + noise_img
    
    if sample < len(X_val):
        img2 = X_val[sample,:,:,0].copy()
        noise_img2 = cv.randu(img2,(0),(0.35))
        X_val_noise[sample,:,:,0] =  X_val[sample,:,:,0] + noise_img2

In [None]:
plt.imshow(X_val[0,:,:,:].reshape(28,28,))
plt.show()
plt.imshow(X_val_noise[0,:,:,:].reshape(28,28,))
plt.show()

plt.imshow(X_train[0,:,:,:].reshape(28,28,))
plt.show()
plt.imshow(X_train_noise[0,:,:,:].reshape(28,28,))
plt.show()

In [None]:
denoise_ae = Denoise_autoencoder(3)
denoise_ae_history = denoise_ae.fit(X_train_noise, X_train, validation_data = (X_val_noise,X_val), epochs = 30, batch_size = 32, shuffle=True, verbose = 2)

In [None]:
alpha = random.randint(0, len(X_val))

plt.figure(figsize = (15,5))
plt.subplot(1,3,1)
plt.imshow(X_train[alpha,:,:,:].reshape(28,28,))
plt.title('original image')
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(X_train_noise[alpha,:,:,:].reshape(28,28,))
plt.title('+ noise')
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(denoise_ae.predict(X_train_noise[alpha,:,:,:].reshape(1,28,28,1)).reshape(28,28,))
plt.title('generated image')
plt.axis('off')
plt.show()



plt.figure(figsize = (15,5))
plt.subplot(1,3,1)
plt.imshow(X_val[alpha,:,:,:].reshape(28,28,))
plt.title('original image')
plt.axis('off')
plt.subplot(1,3,2)
plt.imshow(X_val_noise[alpha,:,:,:].reshape(28,28,))
plt.title('+ noise')
plt.axis('off')
plt.subplot(1,3,3)
plt.imshow(denoise_ae.predict(X_val_noise[alpha,:,:,:].reshape(1,28,28,1)).reshape(28,28,))
plt.title('generated image')
plt.axis('off')
plt.show()

With the additional constraints (lower dimensional latent space and added noise), our neural network still learns to manages to generate the majority of the digits. Changing the structure of the network and hyperparameters can help will improving the overall results, but here, we will move on and explore the latent space.

Below we construct the encoder which take input images and output low dimensional vector representations called the latent space. We will also construct decoder model which inputs that same low dimensional lower dimensional vector representation and outputs images of the corresponding label. We can also explore this latent space by generating new vectors and seeing what the decoder generates on the other side.

In [None]:

encoder = Model(denoise_ae.input, denoise_ae.layers[-12].output)
encoder.summary()

In [None]:

train_latent_space = np.zeros((500, 3))
val_latent_space = np.zeros((500, 3))
for sample in tqdm(range(500)):
    train_latent_space[sample,:] = encoder.predict(X_train[sample,:,:,0].reshape(1,28,28,1))[0][0][0]
    val_latent_space[sample,:] = encoder.predict(X_val[sample,:,:,0].reshape(1,28,28,1))[0][0][0]

In [None]:


fig = plt.figure(figsize = (10, 7)) 
ax = plt.axes(projection ="3d") 
  
# Creating plot 
ax.scatter3D(train_latent_space[:,0], train_latent_space[:,1], train_latent_space[:,2], alpha = 0.4); 
ax.scatter3D(val_latent_space[:,0], val_latent_space[:,1], val_latent_space[:,2], alpha = 0.4); 
plt.title("simple 3D scatter plot") 
# show plot 
plt.show()

plt.figure(figsize = (12,3))
plt.subplot(1,3,1)
plt.scatter(train_latent_space[:,0], train_latent_space[:,1], alpha = 0.4)
plt.scatter(val_latent_space[:,0], val_latent_space[:,1], alpha = 0.4)
#plt.scatter(train_latent_space_8[:,0], train_latent_space_8[:,1])
plt.subplot(1,3,2)
plt.scatter(train_latent_space[:,0], train_latent_space[:,2], alpha = 0.4)
plt.scatter(val_latent_space[:,0], val_latent_space[:,2], alpha = 0.4)

#plt.scatter(train_latent_space_8[:,0], train_latent_space_8[:,2])
plt.subplot(1,3,3)
plt.scatter(train_latent_space[:,1], train_latent_space[:,2], alpha = 0.4)
plt.scatter(val_latent_space[:,1], val_latent_space[:,2], alpha = 0.4)

#plt.scatter(train_latent_space_8[:,1], train_latent_space_8[:,2])
plt.show()