## Building the Sparse Autoencoder

Building the sparse autoencoder is just as same as building the autoencoder except that here we use sparse regularizer in the encoder and decoder. 


## Import the libraries

First, let us import the necessary libraries:

In [1]:
import warnings
warnings.filterwarnings('ignore')

#modelling
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras import backend as K
from tensorflow.keras import regularizers
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

#plotting
import matplotlib.pyplot as plt
%matplotlib inline

#dataset
from keras.datasets import mnist
import numpy as np

Using TensorFlow backend.


## Prepare the Dataset

Let us load the MNIST dataset. We don't need the labels for autoencoder. Since we are reconstructing the given input we don't need the labels. So, we just load x_train for training and x_test for testing:

In [2]:
(x_train, _), (x_test, _) = mnist.load_data()


Normalize the data by dividing with max pixel value which is 255:


In [3]:
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255


Reshape the images as 2D array:

In [4]:
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))


Now the shape of data would become:


In [5]:
print(x_train.shape, x_test.shape)

((60000, 784), (10000, 784))


## Define the Sparse Regularizer

We know that Sparse Regularizer is given as:


$  \beta (\sum_{j=1}^{{l}^{(h)}} log \frac{\rho}{\hat{\rho_j}} + (1-\rho) log \frac{1-\rho}{1-\hat{\rho_j}} ) $

Define the sparse regularizer:

In [6]:
def sparse_regularizer(activation_matrix):
    p = 0.01
    beta = 3
    p_hat = K.mean(activation_matrix) 
  
    KL_divergence = p*(K.log(p/p_hat)) + (1-p)*(K.log(1-p/1-p_hat))
    
    sum = K.sum(KL_divergence) 
   
    return beta * sum

## Define the Encoder

Define the encoder which takes the images as an input and returns the encodings.

Define the size of the encodings:

In [7]:
encoding_dim = 200

Set the value of lambda:


In [8]:
lambda_ = 0.001 

Define the shape of input:

In [9]:
input_img = Input(shape=(784,))

Define the encoder which takes the images as the inputs and returns the code:

In [10]:
encoder = Dense(encoding_dim, 
                activation='sigmoid',
                kernel_regularizer=regularizers.l2(lambda_/2),activity_regularizer=sparse_regularizer)(input_img)


##  Define the Decoder

Define the Decoder which takes the code returned by the encoder and returns the reconstructed input:

In [11]:
decoder = Dense(784,
                activation='sigmoid',
                kernel_regularizer=regularizers.l2(lambda_/2),activity_regularizer=sparse_regularizer)(encoder)

## Build the model

Now that we defined encoder and decoder layers, we define the model which takes images as input and returns the output of the decoder layer which is the reconstructed image:

In [12]:
model = Model(input_img, decoder)

Let us look at summary of the model:

In [13]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 200)               157000    
_________________________________________________________________
dense_1 (Dense)              (None, 784)               157584    
Total params: 314,584
Trainable params: 314,584
Non-trainable params: 0
_________________________________________________________________


Compile the model with loss as binary cross entropy and we minimize the loss using sgd optimizer:

In [14]:
model.compile(optimizer='sgd', loss='mse')

Now, let us train the model.

Generally, we feed the data to the model as model.fit(x,y) where x is the input and y is the label. But since autoencoders reconstruct its inputs, the input and output to the model should be the same. So we feed the data to the model as model.fit(x_train, x_train):

In [15]:
model.fit(x_train, x_train, epochs=10, batch_size=256, shuffle=True, validation_data=(x_test, x_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f947d1c9690>

In the next section, we will learn about another regularized variant of autoencoders called contractive autoencoders.