In [1]:
import tensorflow as tf

First, we take our input data, and we need to send it to hidden layer 1. Thus, we weight the input data, and send it to layer 1, where it will undergo the activation function, so the neuron can decide whether or not to fire and output some data to either the output layer, or another hidden layer. We will have three hidden layers in this example, making this a Deep Neural Network. From the output we get, we will compare that output to the intended output. We will use a cost function (alternatively called a **loss function**), to determine how wrong we are. Finally, we will use an **optimizer** function, Adam Optimizer in this case, to minimize the cost (how wrong we are). The way cost is minimized is by tinkering with the weights, with the goal of hopefully lowering the cost. How quickly we want to lower the cost is determined by the **learning rate**. The lower the value for learning rate, the slower we will learn, and the more likely we'll get *better results*. The higher the learning rate, the quicker we will learn, giving us faster training times, but also may *suffer on the results*. There are diminishing returns here, you cannot just keep lowering the learning rate and always do better, of course.

The act of sending the data straight through our network means we're operating a feed forward neural network. The adjusting of weights backwards is our back propagation.

We do this feeding forward and back propagation however many times we want. The cycle is called an epoch. We can pick any number we like for the number of epochs, but you would probably want to avoid too many, causing **overfitting**.

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot = True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting ./train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting ./train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./t10k-labels-idx1-ubyte.gz


In [3]:
# Specifying number of nodes per layer, number of classes and batch size
n_nodes_hl1 = 500
n_nodes_hl2 = 500
n_nodes_hl3 = 500
n_classes = 10
batch_size = 100

# Placeholders for some values in the graph
x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')

In [17]:
train = mnist.train
test = mnist.test

In [18]:
test_images = test.images
test_labels = test.labels

Notice that [None,784] has been used as a 2nd parameter in the first placeholder. This is an optional parameter. It can be useful, however, to be explicit like this. If you are not explicit, TensorFlow will stuff anything in there. If you are explicit about the shape, TensorFlow will throw an error if something out of shape attempts to hop into that variable's place.

In [19]:
# Building the neural net
def neural_network_model(data):
    hidden_1_layer = {'weights': tf.Variable(tf.random_normal([784, n_nodes_hl1])),
                     'biases': tf.Variable(tf.random_normal([n_nodes_hl1]))}
    
    hidden_2_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
                     'biases': tf.Variable(tf.random_normal([n_nodes_hl2]))}
    
    hidden_3_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
                     'biases': tf.Variable(tf.random_normal([n_nodes_hl3]))}
    
    output_layer = {'weights': tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
                     'biases': tf.Variable(tf.random_normal([n_classes]))}
    
    # Feed forward flow
    l1 = tf.add(tf.matmul(data, hidden_1_layer['weights']), hidden_1_layer['biases'])
    l1 = tf.nn.relu(l1)

    l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
    l2 = tf.nn.relu(l2)

    l3 = tf.add(tf.matmul(l2, hidden_3_layer['weights']), hidden_3_layer['biases'])
    l3 = tf.nn.relu(l3)

    output = tf.matmul(l3, output_layer['weights']) + output_layer['biases']

    return output

All we've done so far is create a starting definition for our weights and biases. These definitions are just random values, for the shape that the layer's matrix should be (this is what tf.random_normal does for us, it outputs random values for the shape we want). Nothing has actually happened yet, and no flow (feed forward) has occurred yet.

Feed forward begins from 2nd block of the code.

In [22]:
# Training and Running the Session
def train_neural_network(x):
    prediction = neural_network_model(x)
    
    # Cost/loss function
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))
    
    # Optimizing our cost/loss function using AdamOptimizer
    # Learning rate is an adjustable parameter
    optimizer = tf.train.AdamOptimizer(learning_rate = 0.001).minimize(cost)
    
    hm_epochs = 10
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        
        # Steps for each epoch
        for epoch in range(hm_epochs):
            epoch_loss = 0
            for _ in range(int(mnist.train.num_examples/batch_size)):
                epoch_x, epoch_y = mnist.train.next_batch(batch_size)
                _, c = sess.run([optimizer, cost], feed_dict = {x: epoch_x, y: epoch_y})
                epoch_loss += c
                
            print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)
            
        # How many predictions made that perfectly matched labels
        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
    
        # Ending accuracy on test set
        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
        print('Accuracy:', accuracy.eval({x: mnist.test.images, y:mnist.test.labels}))
    
    

For each epoch, and for each batch in our data, we're going to run our optimizer and cost against our batch of data. To keep track of our loss/cost at each step of the way, we are adding the total cost per epoch up. For each epoch, we output the loss, which should be declining each time. This can be useful to track, so you can see the diminishing returns over time.

In [23]:
train_neural_network(x)

Epoch 0 completed out of 10 loss: 1609737.10741
Epoch 1 completed out of 10 loss: 404343.848866
Epoch 2 completed out of 10 loss: 223559.825254
Epoch 3 completed out of 10 loss: 128891.431697
Epoch 4 completed out of 10 loss: 81826.0876669
Epoch 5 completed out of 10 loss: 51046.9450561
Epoch 6 completed out of 10 loss: 34235.0298815
Epoch 7 completed out of 10 loss: 27910.37323
Epoch 8 completed out of 10 loss: 20791.6410753
Epoch 9 completed out of 10 loss: 18983.4971714
Accuracy: 0.9512


In [None]:
# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1