# Deep Learning with TensorFlow
## Recitation Notebook

### Authors: Trevin Gandhi, Jordan Hurwitz

This recitation will consist of two parts:  
1) Building a feedforward Deep Neural Network in TensorFlow and discussing some best practices  
2) Building an RNN (specifically, an LSTM) in TensorFlow and using it for text generation  

### Section 1: Building a Deep Feedforward Neural Network
(Based on the TensorFlow tutorials)

In [1]:
# First, we do the basic setup.
import tensorflow as tf
sess = tf.InteractiveSession()

In [2]:
# We will be training this deep neural network on MNIST,
# so let's first load the dataset.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
# Now let's initialize some placeholders

# Here, x is a placeholder for our input data. Since MNIST
# uses 28x28 pixel images, we "unroll" them into a 784-pixel
# long vector. The `None` indicates that we can input an
# arbitrary amount of datapoints. Thus we are saying x is a
# matrix with 784 columns and an arbitrary (to be decided 
# when we supply the data) number of rows.
x  = tf.placeholder(tf.float32, [None, 784])

# We define y_ to be the placeholder for our *true* y's. 
# We are giving y_ 10 rows because each row will be a
# one-hot vector with the correct classification of the
# image.
y_ = tf.placeholder(tf.float32, shape=[None, 10])

In [4]:
test = tf.truncated_normal([4, 10], stddev=0.1)
sess.run(test)

array([[ 0.0527015 ,  0.16926853,  0.02503373, -0.04121994,  0.02684036,
        -0.01603753, -0.06777612, -0.04115577, -0.04395923, -0.09951807],
       [ 0.06317057, -0.06287027,  0.04967545, -0.16697605,  0.00188705,
        -0.09903455,  0.12493267,  0.08164623, -0.05969295,  0.04415423],
       [-0.0604263 ,  0.00805518,  0.09937415, -0.09267912,  0.05353724,
        -0.08030748, -0.08336922,  0.0610524 , -0.0877239 ,  0.03455708],
       [ 0.0082681 , -0.0897309 , -0.03808483, -0.04349993, -0.07961068,
         0.00809762, -0.05464205, -0.12478473, -0.10297551,  0.03020481]], dtype=float32)

In [5]:
# Here we make a handy function for initializing biases. 
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

In [6]:
# Let's define the first set of weights and biases (corresponding to our first layer)
# We use Xavier initialization for the weights as good practice for when we're training
# deeper networks.
num_neurons = 512
w1 = tf.get_variable("w1", shape=[784, num_neurons], initializer=tf.contrib.layers.xavier_initializer())
b1 = bias_variable([num_neurons])

# Now let's define the computation that takes this layer's input and runs it through
# the neurons. Note that we use the ReLU activation function to avoid problems
# with our gradients.
h1 = tf.nn.relu(tf.matmul(x, w1) + b1)

# We also apply dropout after this layer and the next. Dropout is a form of regularization
# in neural networks where we "turn off" randomly selected neurons during training.
keep_prob = tf.placeholder(tf.float32)
h1_drop = tf.nn.dropout(h1, keep_prob)

In [7]:
# Define the second layer, similarly to the first.
w2 = tf.get_variable("w2", shape=[num_neurons, num_neurons], initializer=tf.contrib.layers.xavier_initializer())
b2 = bias_variable([num_neurons])
h2 = tf.nn.relu(tf.matmul(h1_drop, w2) + b2)
h2_drop = tf.nn.dropout(h2, keep_prob)

# And define the third layer to output the log probabilities 
w3 = tf.get_variable("w3", shape=[num_neurons, 10], initializer=tf.contrib.layers.xavier_initializer())
b3 = bias_variable([10])
y  = tf.matmul(h2_drop, w3) + b3

In [8]:
# We define our loss function to be cross entropy over softmax probabilities.
cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

In [9]:
# We will use the `Adam` optimizer. Adam is an improved variant of
# Stochastic Gradient Descent.
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess.run(tf.global_variables_initializer())

for i in range(20000):
    batch = mnist.train.next_batch(50)
    if i%500 == 0:
        train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g"%(i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

# Need to change this to be clean
test_accuracy = 0
for i in range(20):
    batch = mnist.test.next_batch(500)
    test_accuracy += 500 * accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})

print("test accuracy %g"%(test_accuracy / 10000))
# print("test accuracy %g"%accuracy.eval(feed_dict={
#     x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.18
step 500, training accuracy 0.84
step 1000, training accuracy 0.9
step 1500, training accuracy 0.94
step 2000, training accuracy 0.94
step 2500, training accuracy 0.9
step 3000, training accuracy 0.92
step 3500, training accuracy 0.96
step 4000, training accuracy 0.94
step 4500, training accuracy 1
step 5000, training accuracy 0.98
step 5500, training accuracy 1
step 6000, training accuracy 0.98
step 6500, training accuracy 1
step 7000, training accuracy 0.94
step 7500, training accuracy 0.92
step 8000, training accuracy 1
step 8500, training accuracy 1
step 9000, training accuracy 1
step 9500, training accuracy 0.98
step 10000, training accuracy 1
step 10500, training accuracy 0.98
step 11000, training accuracy 0.98
step 11500, training accuracy 0.98
step 12000, training accuracy 0.98
step 12500, training accuracy 1
step 13000, training accuracy 0.98
step 13500, training accuracy 1
step 14000, training accuracy 0.98
step 14500, training accuracy 1
step 1