## Neural Network Implement Overview  

Recall that we have three step to build and train our neural network.
1. Build model
2. Define cost
3. Optimization 

This concept is the same when coding in tensorflow. Usually, you code should look like this.

![Alt text](./images/dnn_implement/tf_schema.png)


## MNIST
MNIST is a computer vision dataset. It consists of black and white images from zero to nine. Each image is 28 * 28 and have been flatten to 784 dimension vector. Also, it includes labels for each image, telling us which digit it is.

![Alt text](./images/dnn_implement/Selection_017.png)
![Alt text](./images/dnn_implement/Selection_018.png)


The MNIST data is split into three parts: 
1. 55000 training data (mnist.train) with a shape of [55000, 784]
2. 10000 test data (mnist.test) with a shape of [10000, 784]
3. 5000 validation data (mnist.validation) with a shape of [5000, 784]

you can access:  
training image as `mnist.train.images` (see below picture)  
training label as `mnist.train.labels` (see below picture)  
test image as `mnist.test.images`   
test label as `mnist.test.labels`   

Note that label is encoded as "one-hot vectors", which mean if the target image is 2, the label should be [0,0,1,0,0,0,0,0,0,0]


![Alt text](./images/dnn_implement/Selection_021.png)
![Alt text](./images/dnn_implement/Selection_020.png)

In [9]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()

# Load mnist dataset
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# Define image input 784 = 28 * 28. Note that DNN input is a vector
# [None, 784] mean that there are a batch of data and each of them is 784 dimension vector
x = tf.placeholder(tf.float32, [None, 784])

# Define label. There are totally 10 class (0-9)
y = tf.placeholder(tf.float32, [None, 10])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y_predict = tf.matmul(x, W) + b


# Define cost
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predict))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Calculate accuracy 
correct_prediction = tf.equal(tf.argmax(y_predict, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        
        train_step_, cross_entropy_ = sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
        if step % 50 == 0:
            # print cross_entropy every 50 steps
            print("step {}: cross_entropy is {}".format(step, cross_entropy_))
    # Load test data to validate the model  
    accuracy_ = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
    print('Testing...... accuracy is {}'.format(accuracy_))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0: cross_entropy is 2.3025853633880615
step 50: cross_entropy is 0.34718087315559387
step 100: cross_entropy is 0.35041946172714233
step 150: cross_entropy is 0.3277951180934906
step 200: cross_entropy is 0.3836279809474945
step 250: cross_entropy is 0.5161219239234924
step 300: cross_entropy is 0.23777730762958527
step 350: cross_entropy is 0.37666282057762146
step 400: cross_entropy is 0.33953574299812317
step 450: cross_entropy is 0.3562777638435364
step 500: cross_entropy is 0.2710689604282379
step 550: cross_entropy is 0.3360309898853302
step 600: cross_entropy is 0.23308411240577698
step 650: cross_entropy is 0.30350497364997864
step 700: cross_entropy is 0.2830130457878113
step 750: cross_entropy is 0.34542033076286316
step 800: cross_entropy is 0.34182003140449524
step 850: cross_en

Let's see how DNN with three hidden layers to improve the work. 

In [3]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

INPUT_NODE =784

LAYER1_NODE = 128
LAYER2_NODE = 64
LAYER3_NODE = 10


x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

W1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))
b1 = tf.Variable(tf.truncated_normal([LAYER1_NODE], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, LAYER2_NODE], stddev=0.1))
b2 = tf.Variable(tf.truncated_normal([LAYER2_NODE], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([LAYER2_NODE, LAYER3_NODE], stddev=0.1))
b3 = tf.Variable(tf.truncated_normal([LAYER3_NODE], stddev=0.1))

layer_1 = tf.matmul(x, W1) + b1
out1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(out1, W2) + b2
out2 = tf.nn.relu(layer_2)
layer_3 = tf.matmul(out2, W3) + b3
out3 = tf.nn.relu(layer_3)

y_predict = out3

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predict))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_predict, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        
        train_step_, cross_entropy_ =sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
        if step % 50 == 0:
            print("step {}: cross_entropy is {}".format(step, cross_entropy_))
    accuracy_ = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
    print('Testing...... accuracy is {}'.format(accuracy_))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0: cross_entropy is 2.329409122467041
step 50: cross_entropy is 1.0036280155181885
step 100: cross_entropy is 1.1816593408584595
step 150: cross_entropy is 0.8087384104728699
step 200: cross_entropy is 0.9154343605041504
step 250: cross_entropy is 0.67885422706604
step 300: cross_entropy is 0.6047723293304443
step 350: cross_entropy is 0.20516090095043182
step 400: cross_entropy is 0.11308196187019348
step 450: cross_entropy is 0.10255002975463867
step 500: cross_entropy is 0.14689059555530548
step 550: cross_entropy is 0.09609723091125488
step 600: cross_entropy is 0.13839812576770782
step 650: cross_entropy is 0.24563270807266235
step 700: cross_entropy is 0.15865127742290497
step 750: cross_entropy is 0.12630750238895416
step 800: cross_entropy is 0.17303167283535004
step 850: cross_entr

How about if we change learning rate very big?

In [4]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

INPUT_NODE =784

LAYER1_NODE = 128
LAYER2_NODE = 64
LAYER3_NODE = 10


x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

W1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))
b1 = tf.Variable(tf.truncated_normal([LAYER1_NODE], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, LAYER2_NODE], stddev=0.1))
b2 = tf.Variable(tf.truncated_normal([LAYER2_NODE], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([LAYER2_NODE, LAYER3_NODE], stddev=0.1))
b3 = tf.Variable(tf.truncated_normal([LAYER3_NODE], stddev=0.1))

layer_1 = tf.matmul(x, W1) + b1
out1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(out1, W2) + b2
out2 = tf.nn.relu(layer_2)
layer_3 = tf.matmul(out2, W3) + b3
out3 = tf.nn.relu(layer_3)

y_predict = out3

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predict))
train_step = tf.train.GradientDescentOptimizer(2).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_predict, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        
        train_step_, cross_entropy_ =sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
        if step % 50 == 0:
            print("step {}: cross_entropy is {}".format(step, cross_entropy_))
    accuracy_ = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
    print('Testing...... accuracy is {}'.format(accuracy_))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0: cross_entropy is 2.314532995223999
step 50: cross_entropy is 2.3025853633880615
step 100: cross_entropy is 2.3025853633880615
step 150: cross_entropy is 2.3025853633880615
step 200: cross_entropy is 2.3025853633880615
step 250: cross_entropy is 2.3025853633880615
step 300: cross_entropy is 2.3025853633880615
step 350: cross_entropy is 2.3025853633880615
step 400: cross_entropy is 2.3025853633880615
step 450: cross_entropy is 2.3025853633880615
step 500: cross_entropy is 2.3025853633880615
step 550: cross_entropy is 2.3025853633880615
step 600: cross_entropy is 2.3025853633880615
step 650: cross_entropy is 2.3025853633880615
step 700: cross_entropy is 2.3025853633880615
step 750: cross_entropy is 2.3025853633880615
step 800: cross_entropy is 2.3025853633880615
step 850: cross_entropy is 2

How about if we initialize all of our variables zeros? 

In [5]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

INPUT_NODE =784

LAYER1_NODE = 128
LAYER2_NODE = 64
LAYER3_NODE = 10


x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

W1 = tf.Variable(tf.zeros([INPUT_NODE, LAYER1_NODE]))
b1 = tf.Variable(tf.zeros([LAYER1_NODE]))
W2 = tf.Variable(tf.zeros([LAYER1_NODE, LAYER2_NODE]))
b2 = tf.Variable(tf.zeros([LAYER2_NODE]))
W3 = tf.Variable(tf.zeros([LAYER2_NODE, LAYER3_NODE]))
b3 = tf.Variable(tf.zeros([LAYER3_NODE]))

layer_1 = tf.matmul(x, W1) + b1
out1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(out1, W2) + b2
out2 = tf.nn.relu(layer_2)
layer_3 = tf.matmul(out2, W3) + b3
out3 = tf.nn.relu(layer_3)

y_predict = out3

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predict))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_predict, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        
        train_step_, cross_entropy_ =sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
        if step % 50 == 0:
            print("step {}: cross_entropy is {}".format(step, cross_entropy_))
    accuracy_ = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
    print('Testing...... accuracy is {}'.format(accuracy_))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0: cross_entropy is 2.3025853633880615
step 50: cross_entropy is 2.3025853633880615
step 100: cross_entropy is 2.3025853633880615
step 150: cross_entropy is 2.3025853633880615
step 200: cross_entropy is 2.3025853633880615
step 250: cross_entropy is 2.3025853633880615
step 300: cross_entropy is 2.3025853633880615
step 350: cross_entropy is 2.3025853633880615
step 400: cross_entropy is 2.3025853633880615
step 450: cross_entropy is 2.3025853633880615
step 500: cross_entropy is 2.3025853633880615
step 550: cross_entropy is 2.3025853633880615
step 600: cross_entropy is 2.3025853633880615
step 650: cross_entropy is 2.3025853633880615
step 700: cross_entropy is 2.3025853633880615
step 750: cross_entropy is 2.3025853633880615
step 800: cross_entropy is 2.3025853633880615
step 850: cross_entropy is 

How about change batch size to one?

In [6]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

INPUT_NODE =784

LAYER1_NODE = 128
LAYER2_NODE = 64
LAYER3_NODE = 10


x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

W1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))
b1 = tf.Variable(tf.truncated_normal([LAYER1_NODE], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, LAYER2_NODE], stddev=0.1))
b2 = tf.Variable(tf.truncated_normal([LAYER2_NODE], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([LAYER2_NODE, LAYER3_NODE], stddev=0.1))
b3 = tf.Variable(tf.truncated_normal([LAYER3_NODE], stddev=0.1))

layer_1 = tf.matmul(x, W1) + b1
out1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(out1, W2) + b2
out2 = tf.nn.relu(layer_2)
layer_3 = tf.matmul(out2, W3) + b3
out3 = tf.nn.relu(layer_3)

y_predict = out3

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predict))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_predict, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(1)

        train_step_, cross_entropy_ =sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
        if step % 50 == 0:
            print("step {}: cross_entropy is {}".format(step, cross_entropy_))
    accuracy_ = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
    print('Testing...... accuracy is {}'.format(accuracy_))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0: cross_entropy is 2.4037392139434814
step 50: cross_entropy is 2.3025851249694824
step 100: cross_entropy is 2.3025851249694824
step 150: cross_entropy is 2.3025851249694824
step 200: cross_entropy is 2.3025851249694824
step 250: cross_entropy is 2.3025851249694824
step 300: cross_entropy is 2.3025851249694824
step 350: cross_entropy is 2.3025851249694824
step 400: cross_entropy is 2.3025851249694824
step 450: cross_entropy is 2.3025851249694824
step 500: cross_entropy is 2.3025851249694824
step 550: cross_entropy is 2.3025851249694824
step 600: cross_entropy is 2.3025851249694824
step 650: cross_entropy is 2.3025851249694824
step 700: cross_entropy is 2.3025851249694824
step 750: cross_entropy is 2.3025851249694824
step 800: cross_entropy is 2.3025851249694824
step 850: cross_entropy is 

How about change different optimization?

In [8]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

INPUT_NODE =784

LAYER1_NODE = 128
LAYER2_NODE = 64
LAYER3_NODE = 10


x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])

W1 = tf.Variable(tf.truncated_normal([INPUT_NODE, LAYER1_NODE], stddev=0.1))
b1 = tf.Variable(tf.truncated_normal([LAYER1_NODE], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([LAYER1_NODE, LAYER2_NODE], stddev=0.1))
b2 = tf.Variable(tf.truncated_normal([LAYER2_NODE], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([LAYER2_NODE, LAYER3_NODE], stddev=0.1))
b3 = tf.Variable(tf.truncated_normal([LAYER3_NODE], stddev=0.1))

layer_1 = tf.matmul(x, W1) + b1
out1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(out1, W2) + b2
out2 = tf.nn.relu(layer_2)
layer_3 = tf.matmul(out2, W3) + b3
out3 = tf.nn.relu(layer_3)

y_predict = out3

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predict))
train_step = tf.train.AdamOptimizer(0.005).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_predict, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for step in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)

        train_step_, cross_entropy_ =sess.run([train_step, cross_entropy], feed_dict={x: batch_xs, y: batch_ys})
        if step % 50 == 0:
            print("step {}: cross_entropy is {}".format(step, cross_entropy_))
    accuracy_ = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})
    print('Testing...... accuracy is {}'.format(accuracy_))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
step 0: cross_entropy is 2.3241727352142334
step 50: cross_entropy is 0.6834895610809326
step 100: cross_entropy is 0.6634771823883057
step 150: cross_entropy is 0.36657410860061646
step 200: cross_entropy is 0.4801430106163025
step 250: cross_entropy is 0.35082969069480896
step 300: cross_entropy is 0.361601859331131
step 350: cross_entropy is 0.4063527584075928
step 400: cross_entropy is 0.3643345534801483
step 450: cross_entropy is 0.32750022411346436
step 500: cross_entropy is 0.34682929515838623
step 550: cross_entropy is 0.3291756510734558
step 600: cross_entropy is 0.44918328523635864
step 650: cross_entropy is 0.2542789876461029
step 700: cross_entropy is 0.27747201919555664
step 750: cross_entropy is 0.24614015221595764
step 800: cross_entropy is 0.1614810973405838
step 850: cross_entro

you can also define your model in a function to make your code more elegent.

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

# Network Parameters
n_hidden_1 = 128 # 1st layer number of features
n_hidden_2 = 64 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

# Create model
def multilayer_perceptron(x, weights, biases):
  
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    out_1 = tf.nn.relu(layer_1)

    layer_2 = tf.add(tf.matmul(out_1, weights['h2']), biases['b2'])
    out_2 = tf.nn.relu(layer_2)
  
    out_layer = tf.matmul(out_2, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

pred = multilayer_perceptron(x, weights, biases)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch {}, cost= {}".format(epoch+1,avg_cost))

    print("Optimization Finished!")


    print("Accuracy: {}".format(accuracy.eval({x: mnist.test.images, y: mnist.test.labels})))

  return f(*args, **kwds)


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Epoch 1, cost= 56.33713734713476
Epoch 2, cost= 15.839478158083825
Epoch 3, cost= 9.99515647151253
Epoch 4, cost= 7.184229089075868
Epoch 5, cost= 5.464726668433704
Epoch 6, cost= 4.318168989528308
Epoch 7, cost= 3.480042829892852
Epoch 8, cost= 2.832105774422263
Epoch 9, cost= 2.3586498567272884
Epoch 10, cost= 1.9433652651513165
Epoch 11, cost= 1.6225692376203025
Epoch 12, cost= 1.3615707165755164
Epoch 13, cost= 1.1293913494581778
Epoch 14, cost= 0.9589313662846374
Epoch 15, cost= 0.7887930416909036
Optimization Finished!
Accuracy: 0.922100

## Tensorboard Introduction
Tensorboard is a visualization tool used in tensorflow. It help developer easily illustrate useful information. 

In [3]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
tf.reset_default_graph()
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

# Network Parameters
n_hidden_1 = 128 # 1st layer number of features
n_hidden_2 = 64 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])

# Create model
def multilayer_perceptron(x, weights, biases):
  
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    out_1 = tf.nn.relu(layer_1)
    tf.summary.histogram("relu1", out_1)
    
    layer_2 = tf.add(tf.matmul(out_1, weights['h2']), biases['b2'])
    out_2 = tf.nn.relu(layer_2)
    tf.summary.histogram("relu2", out_2)
    
    out_layer = tf.matmul(out_2, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}
with tf.name_scope('DNN_Model'):
    pred = multilayer_perceptron(x, weights, biases)

with tf.name_scope('Cost'):
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))

with tf.name_scope('SGD'):
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

with tf.name_scope('Accuracy'):
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

# Create a summary to monitor cost tensor
tf.summary.scalar("loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("accuracy", accuracy)
# Create summaries to visualize weights
for var in tf.trainable_variables():
    tf.summary.histogram(var.name.replace(':','_'), var)


# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
    
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    summary_writer = tf.summary.FileWriter('./tensorboard_data', graph=tf.get_default_graph())
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c, summary = sess.run([optimizer, cost, merged_summary_op], feed_dict={x: batch_x, y: batch_y})

            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch {}, cost= {}".format(epoch+1,avg_cost))

    print("Optimization Finished!")
    print("Accuracy: {}".format(accuracy.eval({x: mnist.test.images, y: mnist.test.labels})))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Epoch 1, cost= 79.33581046537928
Epoch 2, cost= 15.671157075275076
Epoch 3, cost= 9.610744973529467
Epoch 4, cost= 6.865333985740492
Epoch 5, cost= 5.197681124968961
Epoch 6, cost= 4.114182637929916
Epoch 7, cost= 3.309836923283609
Epoch 8, cost= 2.711409770263864
Epoch 9, cost= 2.238252088237892
Epoch 10, cost= 1.8440492352030478
Epoch 11, cost= 1.5413093381849852
Epoch 12, cost= 1.280805748368518
Epoch 13, cost= 1.0719211351157119
Epoch 14, cost= 0.8993942561862064
Epoch 15, cost= 0.7580923894155875
Optimization Finished!
Accuracy: 0.9203000068664551


You can use `tensorboard --logdir [directory_name]` to monitor everything you like.


![Alt text](./images/dnn_implement/scalars.png)
![Alt text](./images/dnn_implement/tensor_graph.png)
![Alt text](./images/dnn_implement/histograms.png)