# Basic Neural Network

http://adventuresinmachinelearning.com/python-tensorflow-tutorial/

## Simple Tensorflow example 1

Objective: Calculate $a = (b+c) * (c+2)$. 

This function can be broken down to:
$$d = b+c$$
$$e = c+2$$
$$a = d* e$$

Declare _constants_ and _variables_. The first argument is the value to be assigned; the second argument is optional string name (useful for later visualizations).

In [1]:
import tensorflow as tf
import numpy as np

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
# create constant
const = tf.constant(2.0, name="const")

# create variables
b = tf.Variable(2.0, name='b')
c = tf.Variable(1.0, name='c')

Once variables and constants are declared, create _operations_.

In [3]:
# create operations
d = tf.add(b, c, name='d')
e = tf.add(c, const, name='e')
a = tf.multiply(d, e, name='a')

Next, set up an object to initialize variables and graph structure.

In [4]:
# setup variable init
init_op = tf.global_variables_initializer()

Start TensorFlow session `tf.Session` to run operations. 

By having defined the variables and operations, we have created a graph structure. TF Session is an object to compile and build the graph structure. Run the graph by using Python `with` syntax.

In [5]:
# start session
with tf.Session() as sess:
    # init variables
    sess.run(init_op)
    # compute output of graph
    a_out = sess.run(a)
    print("Variable a is {}".format(a_out))

Variable a is 9.0


**End of example 1.**

Objective: Calculate $a = (b+c) * (c+2)$. 

This function can be broken down to:
$$d = b+c$$
$$e = c+2$$
$$a = d* e$$

## Example 2: Placeholder

Take the same situation as above, but what if we didn't know the value of *b*? In the previous example, $b = 2.0$ and $c = 1.0$ and this was declared as variables.

In the case where *b* is unknown, `tf.placeholder` can be used to declare the basic data structure when declaring the variable.

Below, the first argument declares the data type for the element in the tensor - *float32*. The second argument sets the shape of the data to be "injected" into the variable - *(?x1) array*. The *None* declared in the size allows us to inject as much 1-d data as we want into *b*.

In [6]:
# create constant
const = tf.constant(2.0, name="const")

# create variables
b = tf.placeholder(tf.float32, [None, 1], name='b')
c = tf.Variable(1.0, name='c')

In [7]:
# create operations
d = tf.add(b, c, name='d')
e = tf.add(c, const, name='e')
a = tf.multiply(d, e, name='a')

# setup variable init
init_op = tf.global_variables_initializer()

Now if we want to run the operation with a TensorFlow Session, we have to make sure we feed in the value for *b*. This is done in `a_out`.

Below, we specify that the array is to be a 1d range from 0 to 10. To use `feed_dict`, one must supply a Python dictionary, with the placeholder name as the key.

In [8]:
with tf.Session() as sess:
    sess.run(init_op)
    a_out = sess.run(a, feed_dict={b: np.arange(0, 10)[:, np.newaxis]})
    print("Variable a is {}".format(a_out))

Variable a is [[ 3.]
 [ 6.]
 [ 9.]
 [12.]
 [15.]
 [18.]
 [21.]
 [24.]
 [27.]
 [30.]]


## Example 3: Neural Network

Creating a simple three layer neural network, then applying it to the MNIST dataset provided by TensorFlow.

About MNIST dataset:
- 28x28 px grayscale images of hand-written digits
- 55,000 training rows
- 10,000 testing rows
- 5,000 validation rows

### Load dataset

First, load the data. The `one_hot` argument specifies if labels should be one hot encoded. Eg. "4" is [0, 0, 0, 0 ,1, 0, 0, 0, 0, 0]. One hot encoding allows us to easily feed it to the output layer of the neural network.

In [None]:
# load data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

In [9]:
from tensorflow.models.official.mnist import dataset
mnist = input_data.read_data_sets

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


### Setup placeholders

To begin setting up the graph structure, setup the placeholder variables for the training data and training parameters.

In [10]:
# Python optimization variables - training params
learning_rate = 0.5
epochs = 10
batch_size = 100

# declare training data placeholder
# input x - 28x28 px = 784 nodes
x = tf.placeholder(tf.float32, [None, 784])
# output data placeholder (labels - 10 digits)
y = tf.placeholder(tf.float32, [None, 10])

### Setup weight and bias variables

Setup weight and bias variables for the three layer NN. The number of weights/bias tensors is always $L-1$, where $L$ is the number of layers. This means that we need to set up 2 tensors for each layer.

In [11]:
# declare weights/bias connecting input to hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# declare weights/bias connecting hidden layer to output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')

There will be 300 nodes in the hidden layer, so the size of W1 is [784, 300], as there are 784 inputs (28x28 pixels).

`tf.random_normal` is used to initialize the weight values with a random normal distribution, with 0 mean and 0.03 standard deviation. It creates a matrix of the given size, populating it with random samples drawn from the given distribution.

### Setup hidden layer

Setup node inputs and activation functions of hidden layer nodes by executing the following equations:

$$z^{(l+1)} = W^{(l)}x + b^{(l)}$$
$$h^{(l+1)} = f(z^{(l+1)})$$

The first equation multiplies the weight with the input vector and adds the bias. Use `tf.matmul` to execute matrix multiplication.

The second equation applies the ReLU (rectified linear unit) activation function. Use the built in ReLU from TensorFlow - `tf.nn.relu`.

In [12]:
# calculate output of hidden layer
hidden_out = tf.add(tf.matmul(x, W1), b1)
hidden_out = tf.nn.relu(hidden_out)

### Setup output layer

Setup output layer *y_* by multiplying the weight with the hidden layer output and adding the second bias variable. Here we use softmax activation for the output layer with `tf.nn.softmax`.

In [13]:
# calculate hidden layer output - softmax activated output layer
y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))

### Cost/loss function

Use cross entropy cost function:
$$ J = - \frac{1}{m} \sum_{i=1}^{m} \sum_{j=1}^{n} y_j^{(i)} log(y_{j-}^{(i)}) + (1 - y_j^{(i)}) log(1 - y_{j-} ^{(i)})$$

where: 
- $y_j^{(i)}$ is the *i* th training label for output node *j*,
- $y_{j-}^{(i)}$ is the *i* th predicted label for output node *j*,
- *m* is the number of training / batch samples,
- *n* is the number

The first summation (n): of logarithmic products and additions of all output nodes.

The second summation (m): mean of first summation across all training samples.

The cross-entropy loss will output a probability value between 0 and 1. This value indicates the performance of classification. The lower the loss, the better the performance (more accurate in predictions).

In [16]:
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped)
                        + (1-y) * tf.log(1-y_clipped), axis=1))

print(y_clipped)

Tensor("clip_by_value_2:0", shape=(?, 10), dtype=float32)


`y_clipped` is a clipped version of `y_`. This is to prevent $log(0)$ from happening during training. This would return `NaN` and break the training process. With `y_clipped`, the cross entropy is calculated. 

`tf.reduce_sum` takes the sum of the cross-entropy for a single node and training sample. As y_clipped is (m x 10), the sum is done across the second axis, hence `axis=1` (Python uses zero-based indexing). Suppose y_clipped was to be (10 x m), then the sum would have to be done across the first axis (`axis=0`).

`tf.reduce_mean` takes the mean of the tensor and completes the cross entropy cost calculation.

### Setup optimizer

Use the gradient descent optimizer. This is provided by TensorFlow. Initialize it with the learning rate then minimize the cost operation. The function will perform gradient descent and backpropagation.

Before running the operationis, setup the operator object, as well as the operation for evaluating the predictions.

In [17]:
# add optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

In [19]:
# setup initialization operator
init_op = tf.global_variables_initializer()

# define accuracy assessment operation
correct_pred = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

Tensor("Equal_1:0", shape=(?,), dtype=bool)


`tf.equal` function returns a boolean depending on the given argument. `tf.argmax` returns the index of the max value in the tensor (like numpy with vectors). Thus, `correct_pred` returns values of whether the digit is correctly predicted (shape: m x 1) per sample.

Before being able to calculate the mean for the correct predictions with `tf.reduce_mean`. The boolean is transformed to a float using `tf.cast`. 

### Training

Start session and run the training in batches. 

Use TensorFlow's `next_batch` to extract batches of the training data. This is a function provided by the MNIST dataset for extracting randomized batches of samples. 

In [26]:
# start session
with tf.Session() as sess:
    # init variables
    sess.run(init_op)
    total_batch = int(len(mnist.train.labels) / batch_size)
    for epoch in range(epochs):
        avg_cost = 0
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
            _, c = sess.run([optimizer, cross_entropy],
                           feed_dict={x: batch_x, y: batch_y})
            avg_cost += c / total_batch
        print("Epoch:", (epoch+1), "cost =", "{:.3f}".format(avg_cost))
    print("Training complete")
    print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
    

Epoch: 1 cost = 0.588
Epoch: 2 cost = 0.210
Epoch: 3 cost = 0.151
Epoch: 4 cost = 0.120
Epoch: 5 cost = 0.097
Epoch: 6 cost = 0.078
Epoch: 7 cost = 0.063
Epoch: 8 cost = 0.051
Epoch: 9 cost = 0.040
Epoch: 10 cost = 0.033
Training complete
0.9783


After extracting the batch samples, two operations are then run (within the same line). These are the optimizer operation and the cross entropy. The outputs are assigned to `_` and `c`. Batch samples (x and y) are injected into both operations.

The output from the optimizer is not important. The cross_entropy output is what is needed to calculate the average cost per epoch.

Once training is complete, the accuracy operation is run, injecting test images and labels to get the prediction accuracy on the test set, which is approximately 98%.

To improve the model, regularization could be added.