In [1]:
import tensorflow as tf
old_v = tf.logging.get_verbosity()
tf.logging.set_verbosity(tf.logging.ERROR)

<h2>Extract MNIST data</h2>
<p style="font-size:20px">You can change the option of one_hot encoding.

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
#get mnist data, with one_hot encoding
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)
#suppress warnings
tf.logging.set_verbosity(old_v)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [3]:
num_train = mnist.train.num_examples #55,000
num_validation = mnist.validation.num_examples #5000
num_test = mnist.test.num_examples #10,000

<h2>Define hyperparameters</h2>

In [4]:
#learning rate
lr = 0.01
#number of traning steps
num_steps = 1500
#number of batch_size
batch_size = 128

#network parameters
n_hidden_1 = 1500
n_hidden_2 = 750
n_hidden_3 = 375
num_input = 784
num_classes = 10

In [5]:
tf.reset_default_graph()

<h2>Define placeholder and Variables</h2>

In [6]:
#tf graph input
X = tf.placeholder(tf.float32,[None,num_input],name='X')
Y = tf.placeholder(tf.int32,[None,num_classes],name='Y')

#Layers weight & bias
weights = {
    'W1': tf.Variable(tf.random_normal([num_input, n_hidden_1]),name='W1'),
    'W2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]),name='W2'),
    'W3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3]),name='W3'),
    'Wout': tf.Variable(tf.random_normal([n_hidden_3, num_classes]),name='Wout')
}

biases = {
    'b1': tf.Variable(tf.zeros(shape=[n_hidden_1]),name='b1'),
    'b2': tf.Variable(tf.zeros(shape=[n_hidden_2]),name='b2'),
    'b3': tf.Variable(tf.zeros(shape=[n_hidden_3]),name='b3'),
    'bout': tf.Variable(tf.zeros(shape=[num_classes]),name='bout')
}

Instructions for updating:
Colocations handled automatically by placer.


<h2>Define neural network</h2>

In [7]:
#define a neural net model
def neural_net(x):
    layer_1_out = tf.nn.relu(tf.add(tf.matmul(x,weights['W1']),biases['b1']))
    layer_2_out = tf.add(tf.matmul(layer_1_out,weights['W2']),biases['b2'])
    layer_3_out = tf.add(tf.matmul(layer_2_out,weights['W3']),biases['b3'])
    out = tf.add(tf.matmul(layer_3_out,weights['Wout']),biases['bout'])
    return out

<h2>Define cost function and accuracy</h2>

In [8]:
#predicted labels
logits = neural_net(X)

#define loss
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,labels=Y),name='loss')
#define optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
train_op = optimizer.minimize(loss)

#compare the predicted labels with true labels
correct_pred = tf.equal(tf.argmax(logits,1),tf.argmax(Y,1))

#compute the accuracy by taking average
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32),name='accuracy')

#Initialize the variables
init = tf.global_variables_initializer()

<h2>Execute training</h2>

In [9]:
with tf.Session() as sess:
    sess.run(init)
    
    for i in range(num_steps):
        #fetch batch
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        #run optimization
        sess.run(train_op, feed_dict={X:batch_x, Y:batch_y})
        if i % 500 ==0:
            acc = sess.run(accuracy,feed_dict={X:batch_x, Y:batch_y})
            print("step "+str(i)+", Accuracy= {:.3f}".format(acc))
    
    print("Training finished!")
    
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={X:mnist.test.images, Y:mnist.test.labels}))

step 0, Accuracy= 0.344
step 500, Accuracy= 0.969
step 1000, Accuracy= 0.977
Training finished!
Testing Accuracy: 0.9607


In [11]:
batch_y.shape

(128, 10)

<h2>Your results</h2>

<h3>Parameters vs Performance table</h3>

In [None]:
from IPython.display import Image
Image(filename='hw1_P3.png')

<h3>Summary of the findings</h3>

<h3> The results above show that when varying each parameter independently while keeping others default values, switching activation function from None to Relu has the most noticeable improvement on performance. However, all the other paramters seem to show similar performance as we vary their values. Increasing number of neurons does seem to improve the performance little bit but not much, possibly due to overfitting issue. Adding additional layer also didn't improve much, but this seems to be due to not having any activation function for all three layers. Increasing number of epochs or batch sizes also had minimal effect on performance. This could be mostly due to the relatively simple nature (i.e. not very diverse) of dataset and increasing batch size may improve the training speed but doesn't really affect the performance. </h3>

<h3> The best performance seems to be attained by changing multiple parameters. In my example, I constructed the three layers network with sizes 1500, 750 and 375, and used ReLu activation function for the first layer. The total steps were 1500 with batch size 128, or equivalently about 4 epochs. Additionally, I used AdamOptimizer instead of Gradient Descent. This way, I was able consistently get >96% performance on test data </h3>