## A graph representation of a simple calculation

This notebook is adapted from a tensorflow tutorial from this [blog post](http://adventuresinmachinelearning.com/python-tensorflow-tutorial/).

![](http://adventuresinmachinelearning.com/wp-content/uploads/2017/03/Simple-graph-example.png)

```
d = b + c
e = c + 2
a = d * e
```

### Some simple tensorflow for this simple calculation

These variables don't even get declared until the whole thing (including the graph) is initialized.

In [9]:
import tensorflow as tf
import numpy as np

In [2]:
# Input constant and variables
const = tf.constant(2.0, name='const')
b = tf.Variable(2.0, name='b')
c = tf.Variable(1.0, name='c')

Now for the operations (only setting these up)

In [3]:
d = tf.add(b, c, name='d')
e = tf.add(c, const, name='e')
a = tf.multiply(d, e, name='a')

The next step is to setup an object to initialise the variables and the graph structure

In [4]:
init_op = tf.global_variables_initializer()

The TensorFlow session is an object where all operations are run.  Using the with Python syntax, we can run the graph with the following code

In [7]:
with tf.Session() as sess:
    # Initialise the (global) variables
    sess.run(init_op)
    
    # Comput the output of the graph
    a_out = sess.run(a)
    
    # Print
    print('Variable a = {}'.format(a_out))

Variable a = 9.0


### Let's complicate it:  what if b was an unknown array of values

We can use a placeholder and declare an unknown-size array

In [11]:
b = tf.placeholder(tf.float32, shape=[None, 1], name='b')

In [12]:
# Let's redefine the other variables
const = tf.constant(2.0, name='const')
c = tf.Variable(1.0, name='c')

# And operations to the graph
d = tf.add(b, c, name='d')
e = tf.add(c, const, name='e')
a = tf.multiply(d, e, name='a')

In [13]:
init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    # Initialise the (global) variables
    sess.run(init_op)
    
    # Comput the output of the graph
    a_out = sess.run(a, feed_dict={b: np.arange(0, 10)[:, np.newaxis]})
    
    # Print
    print('Variable a = {}'.format(a_out))

Variable a = [[  3.]
 [  6.]
 [  9.]
 [ 12.]
 [ 15.]
 [ 18.]
 [ 21.]
 [ 24.]
 [ 27.]
 [ 30.]]


## Neural network time

Let's build a three-layer dense NN!

Get the MNIST data from tensorflow examples.

In [14]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### Set up the network

Summary of steps:
1.  Define parameters.
2.  Input and output layer placeholders.
3.  Input layer to hidden layer tensors (1. weights and 2. bias)
4.  Hidden layer to output layer tensors (1. weights and 2. bias)
5.  Input, hidden layer and output operations.
6.  Cost function for the optimizer.
7.  Optimizer.
8.  Initialization operation.
9.  Accuracy operation.

Input layer is 784 nodes.  Output layer is 10 nodes.

In [15]:
learning_rate = 0.5
epochs = 10
batch_size = 100

# Input for x - dim is 28x28 or 784 pixel values
x = tf.placeholder(tf.float32, shape=[None, 784])
# Input for y (the one-hot labels, or output)
y = tf.placeholder(tf.float32, shape=[None, 10])

Now we need to setup the weight and bias variables for the three layer neural network.  There are always L-1 number of weights/bias tensors, where L is the number of layers.  So in this case, we need to setup two tensors for each.


This neural network will have 300 nodes in the hidden layer, so the size of the weight tensor W1 is [784, 300]. Likewise, we create W2 and b2 variables to connect the hidden layer to the output layer of the neural network.

In [17]:
# Input layer to hidden layer tensors
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')

# Hidden layer to output layer tensors
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')

Our input and hidden layer operations.  Node inputs and activation function setup.

In [18]:
hidden_out = tf.add(tf.matmul(x, W1), b1)
hidden_out = tf.nn.relu(hidden_out)

Our output layer operations.  Softmax to get probabilities.

In [19]:
y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))


The cost function for the optimizer to work on.

The first line is an operation converting the output `y_` to a clipped version, limited between 1e-10 to 0.999999.  This is to make sure that we never get a case were we have a log(0) operation occurring during training.

The second line is the cross entropy calculation.  (See the blog post for more details around this calculation)

In [21]:
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
cross_entropy = -tf.reduce_mean(tf.reduce_sum(
        y * tf.log(y_clipped)
        + (1 - y)
        * tf.log(1 - y_clipped),
    axis=1))

Set up the optimizer.  This function will then perform the gradient descent and the backpropagation for you. 

In [22]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

Set up the variable initialisation operation and an operation to measure the accuracy of our predictions.

In [23]:
init_op = tf.global_variables_initializer()

# Accuracy calculation
correct_prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, dtype=tf.float32))

### Set up the training

In [26]:
with tf.Session() as sess:
    # Run the initialization
    sess.run(init_op)
    # How many batches - training samples / minibatch size we set
    total_batch = int(len(mnist.train.labels) / batch_size)
    for epoch in range(epochs):
        avg_cost = 0
        # Go through each training sample in minibatch chunk
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
            # Cost
            _, c = sess.run([optimizer, cross_entropy],
                           feed_dict={x: batch_x, y: batch_y})
            avg_cost += c
        print('Epoch ', epoch+1, 'Cost = ', '{:.3f}'.format(avg_cost / total_batch))
    print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

Epoch  1 Cost =  0.684
Epoch  2 Cost =  0.252
Epoch  3 Cost =  0.188
Epoch  4 Cost =  0.150
Epoch  5 Cost =  0.125
Epoch  6 Cost =  0.105
Epoch  7 Cost =  0.091
Epoch  8 Cost =  0.076
Epoch  9 Cost =  0.066
Epoch  10 Cost =  0.057
0.9748
