# Exercise 4b: Fully connected autoencoder on MNIST dataset

Your challenge, if you should accept it, is to tune a convolutional autoencoder on the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database). 

## Convolutional Autoencoder

<img src="https://cdn-images-1.medium.com/max/1600/1*MxWMSjR0BZzb7bnVGJpdng.png" alt="nn" style="width: 400px;"/>

## MNIST Dataset Review

This example is using MNIST handwritten digits. The dataset contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 1. For simplicity, each image has been flattened and converted to a 1-D numpy array of 784 features (28*28).

![MNIST Dataset](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

More info: http://yann.lecun.com/exdb/mnist/

In [3]:
import numpy as np
import tensorflow as tf 
import matplotlib.pyplot as plt
import gzip, binascii, struct
from IPython.display import Image
%matplotlib inline

### Import, scale, transform data

In the next few cells, we will be downloading the mnist data, scaling the images so that pixel values are between [0, 1], and transforming images to numpy arrays. Note that functions for extracting and transforming labels are present, but not called - we will not use labels in this activity. These initial sections are repeated from previous exercises and so will be done without further comment.  

In [4]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [13]:
train_images = mnist.train.images
test_images = mnist.test.images
print("Shape of the training set: ", train_images.shape)
print("Shape of the test set: ", test_images.shape)

Shape of the training set:  (55000, 784)
Shape of the test set:  (10000, 784)


### Reshape data into 2D images

In [14]:
train_images = train_images.reshape(-1, 28, 28, 1)
test_images = test_images.reshape(-1, 28, 28, 1)

### Segmenting data into training, test, and validation

The final step in preparing our data is to split it into three sets: training, test, and validation. This isn't the format of the original data set, so we'll take a small slice of the training data and treat that as our validation set.

In [15]:
VALIDATION_SIZE = 5000

# first, let's split the flattened data - we will skip doing the labels, since we are doing an autoencoder
validation_images = train_images[:VALIDATION_SIZE, :, :, :]
train_images = train_images[VALIDATION_SIZE:, :, :, :]

train_size = train_images.shape[0]

print('Validation shape', validation_images.shape)
print('Train size', train_size)

Validation shape (5000, 28, 28, 1)
Train size 50000


## Defining the model

Now that we've downloaded and prepared our data, we're ready to define our model.

#### Define Hyperparameters

First, let's define some hyperparameters. You may want to change these later to try for better results!

In [16]:
# this gives us the size of input and output data - you shouldn't change this
IMAGE_SIZE = train_images.shape[1]
CHANNELS = train_images.shape[-1]

# Tunable hyperparameters - tweak away!
BATCH_SIZE = 60

# The random seed that defines initialization.
SEED = 42

# Optimizer hyperparameters, can be tuned
LEARNING_RATE = 0.002
BETA1 = 0.9
BETA2 = 0.999
EPSILON = 1e-08

# Define the optimizer operation - can also pick a different optimizer 
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE,
                                  beta1=BETA1,
                                  beta2=BETA2,
                                  epsilon=EPSILON,
                                  )

#### Define placeholder tensor to feed data

Normally, we would also define a placeholder tensor for y. Since we are doing an autoencoder, we are training X vs. X, so we only need an X placeholder!

In [17]:
X = tf.placeholder(tf.float32, shape=(None, IMAGE_SIZE, IMAGE_SIZE, CHANNELS))

#### Defining weights for the fully connected autoencoder

Note that this autoencoder is using fully connected layers, so the weights will be two dimensional matrices. 

The tunable hyperparameters are not the only thing that affects your model....your architecture and initialization of weights also. Try changing weight initializations, number of filters, or adding more layers. 

In [47]:
# encoder portion where dimensionality decreases
enc1_weights = tf.Variable(
  tf.truncated_normal([3, 3, CHANNELS, 32],  # 32 filters
                      stddev=0.1,
                      seed=SEED))
enc1_biases = tf.Variable(tf.constant(0.1, shape=[32]))

enc2_weights = tf.Variable(
  tf.truncated_normal([3, 3, 32, 32], # 64 filters
                      stddev=0.1,
                      seed=SEED))
enc2_biases = tf.Variable(tf.constant(0.1, shape=[32]))

# decoder portion where dimensionality increases
dec1_weights = tf.Variable(  
  tf.truncated_normal([3, 3, 32, 32],
                      stddev=0.1,
                      seed=SEED))
dec1_biases = tf.Variable(tf.constant(0.1, shape=[32]))

upsample1_weights = tf.Variable(
  tf.truncated_normal([3, 3, 32, 32],
                      stddev=0.1,
                      seed=SEED))

upsample2_weights = tf.Variable(
  tf.truncated_normal([3, 3, 32, 32],
                      stddev=0.1,
                      seed=SEED))

dec2_weights = tf.Variable(
  tf.truncated_normal([3, 3, 32, CHANNELS],
                      stddev=0.1,
                      seed=SEED))
dec2_biases = tf.Variable(tf.constant(0.1, shape=[CHANNELS]))



In [48]:
def conv_encoder(data, train=False):
    """The convolutional encoder model portion."""
    # first encoding layer
    conv_enc1 = tf.nn.conv2d(data,
                            enc1_weights,
                            strides=[1, 1, 1, 1],
                            padding='SAME')
    # Now 28x28x32

    # Bias and rectified linear non-linearity.
    relu1 = tf.nn.relu(tf.nn.bias_add(conv_enc1, enc1_biases)) 

    # Max pooling. The kernel size spec ksize also follows the layout of
    # the data. Here we have a pooling window of 2, and a stride of 2.
    pool1 = tf.nn.max_pool(relu1,
                          ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1],
                          padding='SAME')
    # Now 14x14x32
    
    # second encoding layer
    conv_enc2 = tf.nn.conv2d(pool1,
                            enc2_weights,
                            strides=[1, 1, 1, 1],
                            padding='SAME')
    # Now 14x14x32

    # Bias and rectified linear non-linearity.
    relu2 = tf.nn.relu(tf.nn.bias_add(conv_enc2, enc2_biases)) 

    # Max pooling. The kernel size spec ksize also follows the layout of
    # the data. Here we have a pooling window of 2, and a stride of 2.
    pool2 = tf.nn.max_pool(relu2,
                          ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1],
                          padding='SAME')
    # Now 7x7x32    
    return pool2


def conv_decoder(encoded, train=False):
    """The convolutional decoder model portion."""
    conv_dec1 = tf.nn.conv2d(encoded,
                            dec1_weights,
                            strides = [1, 1, 1, 1],
                            padding = 'SAME')
    #Now 7x7x32 
    
    relu1 = tf.nn.relu(tf.nn.bias_add(conv_dec1, dec1_biases))
    
    # get shape of inputs b/c tensorflow doesn't like "None"
    input_size = tf.shape(encoded)[0]
    
    upsample1 = tf.nn.conv2d_transpose(relu1,
                                      upsample1_weights,
                                      output_shape = [input_size, 14, 14, 32],
                                      strides = [1, 2, 2, 1],
                                      padding = 'SAME')
    # Now 14x14x32
    
    upsample2 = tf.nn.conv2d_transpose(upsample1,
                                      upsample2_weights,
                                      output_shape = [input_size, 28, 28, 32],
                                      strides = [1, 2, 2, 1],
                                      padding = 'SAME')
    # Now 28x28x32
    
    conv_dec2 = tf.nn.conv2d(upsample2,
                            dec2_weights,
                            strides = [1, 1, 1, 1],
                            padding = 'SAME')
    #Now 28x28x1
    
    # using a sigmoid because it forces values to be between 0 and 1...but this might not be the best choice.
    recon = tf.nn.sigmoid(tf.nn.bias_add(conv_dec2, dec2_biases))    
    return recon



Define loss and training operations. We are comparing the reconstructed output to the input.


In [49]:
encoded = conv_encoder(X)
decoded = conv_decoder(encoded)

loss_op = tf.reduce_mean((X - decoded)**2) # pixel-wise MSE

In [50]:
train_op = optimizer.minimize(loss_op)

In [51]:
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

This code will plot your test images vs. the reconstructed images from the autoencoder. This will help you visualize how well your autoencoder is doing! We will call this function in the train block

In [52]:
def plot_reconstruction(X_orig, X_decoded, n = 10, plotname = None):
    '''
    inputs: X_orig (2D np array of shape (nrows, 784))
            X_recon (2D np array of shape (nrows, 784))
            n (int, number of images to plot)
            plotname (str, path to save file)
    '''
    fig = plt.figure(figsize=(n*2, 4))
    for i in range(n):
        # display original
        ax = fig.add_subplot(2, n, i + 1)
        plt.imshow(X_orig[i].reshape(28, 28))
        plt.gray()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

        # display reconstruction
        ax = fig.add_subplot(2, n, i + 1 + n)
        plt.imshow(X_decoded[i].reshape(28, 28))
        plt.gray()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
        
        fig.suptitle('Reconstructed inputs')

    if plotname:
        plt.savefig(plotname)
    else:
        plt.show()       

Finally, a function to train your autoencoder... 

In [59]:
def train(num_epochs): 
    """Function that trains model """
    train_size = train_images.shape[0]
    steps = num_epochs * train_size // BATCH_SIZE
    steps_per_epoch = train_size // BATCH_SIZE
          
    with tf.Session() as sess:

        # Run the initializer
        sess.run(init)    
          
        for step in range(steps):
            # Compute the offset of the current minibatch in the data.
            # Note that we could use better randomization across epochs.
            offset = (step * BATCH_SIZE) % (train_size - BATCH_SIZE)
            batch_data = train_images[offset:(offset + BATCH_SIZE), :, :, :]
          
            # Run the training operation to update the weights, use a feed_dict to use the batches you created above
            sess.run(train_op, feed_dict={X: batch_data})
        
            # display output if desired
            if step % steps_per_epoch == 0:
                loss = sess.run(loss_op, feed_dict={X: batch_data})
                print("Epoch {}, Minibatch MSE= {:.3f}".format(str(step//steps_per_epoch), loss))
            
                val_loss = sess.run(loss_op, feed_dict = {X: validation_images})
                print("Validation MSE : {}".format(val_loss))

        print("Optimization Finished!")
          
        # Run the accuracy operations for the CIFAR10 test images
        test_decoded, test_loss = sess.run([decoded, loss_op], feed_dict={X: test_images})
        print("Test MSE : {}".format(test_loss))
        plot_reconstruction(test_images, test_decoded) 

Finally, train your autoencoder. 

With the given inputs, you will most likely get stuck in a local minimum. Happy tuning!

Warning....this one takes longer to train than the fully connected autoencoder! You may want to subset the data to make this go a little faster

In [None]:
train(5)