<h1>Stacked Denoising Autencoder</h1>

This is an implemenation of a stacked denoising autoencoder to classify MNIST digits. This will be a greedy implemenation in which the each of the encoding layers will be trained to reproduce its input and noise will be injected using Additive Gaussian Noise at each layer. Each layer will be fed the uncorrupted version of the prior layer. Likewise when the softmax layer is added to the network to classify digits it will be fed the uncorrupted version.

The encoding layers and the classifier are independently trained. The second encoding layer is fed a trained version of the output of the first encoding layer. The classifier is fed a trained version of the output of the second encoding layer. An advantage of this network is that larger networks are more complex to train, but obviously since its a greedy implmentation there's no guarantee that there is an optimal coordination of the weights between network
layers. 

Finally, we will compare our encoded dataset's performance (50 dimensions) to the performance on the original dataset (784 dimensions). Note that the batch function is random so it is not a true one to one comparision, but nevertheless it should be a pretty accurate way of comparing performance. 

I learned about the network here:
http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf

Loosely followed:
https://github.com/tensorflow/models/blob/master/autoencoder/autoencoder_models/DenoisingAutoencoder.py (simple implementaiton of adding the noise)


In [1]:
#Dependencies:
from __future__ import division, print_function, absolute_import

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import numpy as np

In [2]:
tf.__version__

'1.0.1'

In [3]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


<h2>Architecture variables</h2>

In [4]:
input_dim = 784
batch_size = 128
plots_outdir="./png"
n_hidden1 = 500
n_hidden2 = 50
scale = 0.1

<h2>Encoding Layer 1 Training</h2>

We train the first encoding layer by adding gaussian noise, then try to recreate the input.

In [5]:
train_graph = tf.Graph()
with train_graph.as_default():
    x_in = tf.placeholder(tf.float32, (None, input_dim))
    drop = tf.placeholder(tf.float32, None)

In [6]:
with train_graph.as_default():
    with tf.variable_scope('enc_one'):
        enc_w1 = tf.get_variable("enc_1", shape=[input_dim, n_hidden1], initializer=tf.contrib.layers.xavier_initializer())
        enc_b1 = tf.get_variable('b_1', shape=(n_hidden1), initializer = tf.constant_initializer(0.1))
        dec_greed_w1 = tf.get_variable('dec_greed_1', shape=[n_hidden1, input_dim], initializer=tf.contrib.layers.xavier_initializer())
        dec_greed_b1 = tf.get_variable('dec_greed_b1', shape=(input_dim), initializer=tf.constant_initializer(0.1))             

In [11]:
with train_graph.as_default():
    enc_1_hidden= tf.nn.relu(tf.add(tf.matmul(x_in + scale * tf.random_normal((input_dim,)), enc_w1), enc_b1))
    dec_1_output = tf.nn.sigmoid(tf.add(tf.matmul(enc_1_hidden, dec_greed_w1), dec_greed_b1))
    cost1 = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(x_in , dec_1_output), 2.0))
    with tf.variable_scope('optim'):
        optimizer1 = tf.train.AdamOptimizer( name='optim1').minimize(cost1)


In [12]:
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(2001):
        x, _= mnist.train.next_batch(batch_size)

        _, c = sess.run([optimizer1, cost1], feed_dict={x_in: x, drop: 0.9})
        if i % 200 == 0:
            print("Iteration: %d cost: %f" %(i, c))
         
    enc_layer_1 = enc_1_hidden.eval(feed_dict={x_in:mnist.train.images, drop: 1.0})
 

Iteration: 0 cost: 12835.944336
Iteration: 200 cost: 1008.949646
Iteration: 400 cost: 552.172302
Iteration: 600 cost: 344.026093
Iteration: 800 cost: 334.097839
Iteration: 1000 cost: 256.179565
Iteration: 1200 cost: 250.946381
Iteration: 1400 cost: 202.875122
Iteration: 1600 cost: 263.229431
Iteration: 1800 cost: 176.801727
Iteration: 2000 cost: 180.986023


<h2>Encoding Layer 2 Training</h2>

We train the second encoding layer on the trained output of the first encoding layer (without noise). 
We then add noise to the second layer and attempt to recreate its input. 

In [14]:
def batch(x, y, batch_size):
    index = list(range(len(x)))
    batch_index = np.random.choice(index, batch_size)
    return x[batch_index], y[batch_index]

In [15]:
with train_graph.as_default():
    with tf.variable_scope('enc_tw'):
        enc_w2 = tf.get_variable("enc_21", shape=[n_hidden1, n_hidden2], initializer=tf.contrib.layers.xavier_initializer())
        enc_b2 = tf.get_variable('b_21', shape=(n_hidden2), initializer = tf.constant_initializer(0.1))
        dec_greed_w2 = tf.get_variable('dec_greed_w21', shape=[n_hidden2, n_hidden1], initializer=tf.contrib.layers.xavier_initializer())
        dec_greed_b2 = tf.get_variable('dec_greed_b21', shape=(n_hidden1), initializer=tf.constant_initializer(0.1))  
        

In [17]:
with train_graph.as_default():
    enc_layer1 = tf.placeholder(tf.float32, (None, n_hidden1))
    enc_2_hidden= tf.nn.elu(tf.add(tf.matmul( enc_layer1 + scale * tf.random_normal((n_hidden1,)), enc_w2), enc_b2))
    dec_2_output = tf.nn.elu(tf.add(tf.matmul(enc_2_hidden, dec_greed_w2), dec_greed_b2))
    cost2 = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(enc_layer1, dec_2_output), 2.0))
    with tf.variable_scope('optim51'):
        optimizer2 = tf.train.AdamOptimizer(name='optim51').minimize(cost2)

<h2>Setup to Use Encoder for Classification</h2>
Now we will transform the inputs using the final layer of the encoder.

In [19]:
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(10001):
        x, _= batch(enc_layer_1 ,mnist.train.labels, batch_size)

        _, c = sess.run([optimizer2, cost2], feed_dict={enc_layer1: x, drop: 0.9})
        if i % 2000 == 0:
            print("Iteration: %d cost: %f" %(i, c))
    
    enc_layer_2 = enc_2_hidden.eval(feed_dict={enc_layer1:enc_layer_1, drop: 1.0})

Iteration: 0 cost: 68411.554688
Iteration: 2000 cost: 4130.927734
Iteration: 4000 cost: 3347.262207
Iteration: 6000 cost: 2897.959961
Iteration: 8000 cost: 3029.081299
Iteration: 10000 cost: 3110.814453



<h2>Classification Step Encoded Data</h2>
Now that the model is trained, its just a matter of plugging the encoded data (50 dimensions) into a simple feed forward network to classify digits.

In [20]:
class_graph = tf.Graph()

In [23]:
with class_graph.as_default():
    y_ = tf.placeholder(tf.float32, (None, 10))
    x = tf.placeholder(tf.float32, (None, 50)) 
    W = tf.Variable(tf.zeros([50, 10]))
    b = tf.Variable(tf.zeros([10]))
    y = tf.matmul(x, W) + b
    
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
    with tf.variable_scope('opti'):
        train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
    correct_ = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_, tf.float32))
    
#basic classifier derived from: 
#https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/examples/tutorials/mnist/mnist_softmax.py

In [25]:
with tf.Session(graph=class_graph) as sess:
    sess.run(tf.global_variables_initializer()) 
    for i in range(5000):
        x2, y2 = batch(enc_layer_2, mnist.train.labels, batch_size)
        sess.run(train, feed_dict={x:x2, y_:y2})
        if i % 500 == 0:
            print("Iteration:  %d accuracy: %f " %(i, sess.run(accuracy, feed_dict={x: x2, y_:y2})))

Iteration:  0 accuracy: 0.234375 
Iteration:  500 accuracy: 0.882812 
Iteration:  1000 accuracy: 0.890625 
Iteration:  1500 accuracy: 0.968750 
Iteration:  2000 accuracy: 0.914062 
Iteration:  2500 accuracy: 0.859375 
Iteration:  3000 accuracy: 0.914062 
Iteration:  3500 accuracy: 0.906250 
Iteration:  4000 accuracy: 0.921875 
Iteration:  4500 accuracy: 0.906250 


<h2>Classification Step Original Data</h2>
We will now run the original data (784 dimensions) through the same network.

In [26]:
class_graph = tf.Graph()

In [27]:
with class_graph.as_default():
    y_ = tf.placeholder(tf.float32, (None, 10))
    x = tf.placeholder(tf.float32, (None, 784)) 
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))
    y = tf.matmul(x, W) + b
    
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
    with tf.variable_scope('opti'):
        train = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
    correct_ = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_, tf.float32))
    

In [28]:
with tf.Session(graph=class_graph) as sess:
    sess.run(tf.global_variables_initializer()) 
    for i in range(5000):
        x2, y2 = batch(mnist.train.images, mnist.train.labels, batch_size)
        sess.run(train, feed_dict={x:x2, y_:y2})
        if i % 500 == 0:
            print("Iteration:  %d accuracy: %f " %(i, sess.run(accuracy, feed_dict={x: x2, y_:y2})))

Iteration:  0 accuracy: 0.328125 
Iteration:  500 accuracy: 0.960938 
Iteration:  1000 accuracy: 0.898438 
Iteration:  1500 accuracy: 0.937500 
Iteration:  2000 accuracy: 0.960938 
Iteration:  2500 accuracy: 0.921875 
Iteration:  3000 accuracy: 0.914062 
Iteration:  3500 accuracy: 0.929688 
Iteration:  4000 accuracy: 0.929688 
Iteration:  4500 accuracy: 0.929688 


<h2>Conclusion</h2>

By feeding the classifier an encoded version of the data we were able to achieve approx. 92% accuracy on the training set vs 96% on the original data. We can thus conclude that the encoded data that is roughly 16 times smaller than the original dataset has learned important features about the dataset. 