Autoencoders are artificial networks capable of learning efficient representation of the input data, called codings, without any supervision (i.e., the training set is unlabeled). 

Autoencoders act as powerful feature detectors, and they can be used for unsupervised pretraining of deep neural networks.

They are capable of randomly generating new data that looks  very similar to the training data; this is called a generative model.

for example, I can train an autoencoder on pictures of faces, and it would then be able to generate new faces.



Autoencoders work by simply learning to copy their inputs to their outputs. For example, you can limit the size of the internal representation, or you can add noise to the inputs and train the network to recover the original inputs. These constraints prevent the autoencoder from trivially copying the inputs directly to the outputs, which forces it to learn efficient ways of representing the data.

Autoencoders can be used for dimensionality reduction, feature extraction, unsupervised pretraining or as generative models.

An autoencoder is always composed of two parts 'an encoder' (or recognition network) that converts the inputs to an internal representation, followed by 'a decoder' (or generative network) that converts the internal representation to the outputs.

In autoencoders, the number of neurons in the output layer must be equal to the number of inputs. The output layer are often called the reconstruction since the autoencoder tries to reconstruct the inputs, and the cost function contains a reconstruction loss that penalizes the model when the reconstructions are different from the inputs.

# performing PCA with an undercomplete Linear Autoencoder

In [1]:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior() 


Instructions for updating:
non-resource variables are not supported in the long term


In [None]:
from tensorflow.compat.v1.layers import fully_connected
n_inputs = 3 #3D inputs
n_hidden = 2 #2D codings

n_outputs = n_inputs
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape = [None, n_inputs])
hidden = fully_connected(X, n_hidden, activation_fn = None)
outputs = fully_connected(hidden, n_outputs, activation_fn = None)

reconstruction_loss = tf.reduce_mean(tf.square(output - x)) # MSE

optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(reconstruction_loss)

init = tf.global_variables_initializer()

This code isn't differnt from all MLPs we built in the past chapters. The two things to note are:
-The number of outputs is equal to the number of inputs.
-To perform simple PCA, we set activation_fn = None (i.e., all neurons are linear) and the cost function is the MSE.


In [None]:

X_train, X_test = [.....]# load the dataset

codings = hidden # the output of the hidden layer provides the coding.

with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        training_op.run(feed_dict={X:X_train}) # no labels
    coding_val = codings.eval(feed_dict = {X:X_test})

# Stacked Autoencoders

Autoencoders can have multiple hidden layers. In this case, they are called stacked Autoencoders (or deep autoencoders). Adding more layers helps the autoencoder learn more complex coding. 

Tensorflow implementation

In [None]:
n_inputs = 28 * 28
n_hideen1 = 300
n_hidden2 = 150
n_hidden3 = n_hidden1
n_outputs = n_outputs

learning_rate = 0.01
l2_reg = 0.001

X = tf.placeholder(tf.float32, shape = [None, n_inputs])

with tf.contrib.framework.arg_scope([fully_connected],activation = tf.nn.elu,
                                    weights_initializer = tf.contrib.layers.variance_scaling_initializer(),
                                    weights_regualarizer = tf.contrib.layers.l2_regularizer(l2_reg)):
    
    hidden1 = fully_connected(X, n_hidden1)
    hidden2 = fully_connected(hidden1, n_hidden2)
    hidden3 = fully_connected(hidden2, n_hidden3)
    outputs = fully_connected(hidden3, n_outputs, activation_fn = None)
    
reconstruction_loss = tf.reduce_mean(tf.square(outputs - X)) # MSE

reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss = tf.add_n([reconstruction_loss]+reg_losses)

optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()


n_epochs = 5
batch_size = 150

with tf.session() as sess:
    init.run()
    for epoch in range(n_epochs):
        n_batches = mnist.train.num_examples //batch_size
        for iteration in  range(n_batches):
            x_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict = {X:x_batch})
            
            

# Tying weights

This halves the number of weights in the model, speeding up training and limiting the risk of overfitting. Specifically, if the autoencoder has a total of N layers (not counting the input layer).

In [None]:
activation = tf.nn.elu
regularizer = tf.contrib.layers.l2_regularizer(l2_reg)
initializer = tf.contrib.layers.variance_scaling_initializer()

X = tf.placeholder(tf.float32, shape = [None,n_inputs])

weights1_init = intializer([n_inputs, n_hidden1])
weights2_init = initializer([n_hidden1, n_hidden2])

weights1 = tf.Variable(weights1_init, dtype = tf.float32, name = 'weights1')
weights2 = tf.Variable(weights2_init, dtype = tf.float32, name = 'weights2')
weights3 = tf.transpose(weights2, name = 'weights3') # tied weights
weights4 = tf.transpose(weights1, name = 'weights4') # tied weights

biases1 = tf.Variable(tf.zeros(n_hidden1), name = 'biases1')
biases2 = tf.Variable(tf.zeros(n_hidden2), name = 'biases2')
biases3 = tf.Variable(tf.zeros(n_hidden3), name = 'biases3')
biases4 = tf.Variable(tf.zeros(n_outputs), name = 'biases4')

hidden1 = activation(tf.matmul(X, weights1) + biases1)
hidden2 = activation(tf.matmul(hidden1, weights2) + biases2)
hidden3 = activation(tf.matmul(hidden2, weights3) + biases3)
output = tf.matmul(hidden3, weights4) + biases4

reconstruction_loss = tf.reduce_mean(tf.square(output = X))
reg_loss = regularizer(weights1) + regularizer(weights)

loss = reconstruction_loss + reg_loss

optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

