## TensorFlow code for a Multilayer Perceptron

In this section we will go through the code for a multilayer perceptron in TensorFlow.

Built around the implementation by [Aymeric Damien](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/multilayer_perceptron.py)

First of all we set up the required imports and define the location of the mnist data.

In [None]:
from __future__ import division, print_function, absolute_import
import os
from time import time
from datetime import datetime
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.ERROR)

mnist = input_data.read_data_sets("../scratch/", one_hot=True) #

Here are the relevant network parameters and graph input for context.

In [None]:
# Hyper-Parameters
learning_rate = 0.001 # Initial learning rate
training_epochs = 50 # Number of epochs to train
batch_size = 100 # Number of images per batch
display_step = 2 # How often to output model metrics during training

# Network Parameters
n_hidden_1 = 256 # 1st layer number of neurons
n_hidden_2 = 256 # 2nd layer number of neurons
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input placeholders
X = tf.placeholder("float", [None, n_input], name='X') # Input data
Y = tf.placeholder("float", [None, n_classes], name='Y') # Input labels


Initialise weights and biases for the network.

We are giving weights to every feature (in the first layer this is for each pixel, in the second layer this is every feature we extracted in the first layer). These weights are to inform the model how important the features are in making up the next set of features. As we don't actually know how important each feature is yet, we initialise with  values randomly drawn from a normal distribution with mean 0 and variance 1.

Biases are additional constants attached to neurons and added to the weighted input before the activation function is applied.

Generally, the features in the hidden layers are not easily discoverable. These hidden layers are considered something of a black-box.

In [None]:
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1]), name='h1'),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]), name='h2'),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]), name='h_out')
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1]), name='b1'),
    'b2': tf.Variable(tf.random_normal([n_hidden_2]), name='b2'),
    'out': tf.Variable(tf.random_normal([n_classes]), name='b_out')
}

### Model Creation
The model is ‘multi-layer’ because there is more than one hidden layer, as below we define `layer_1` and `layer_2`.

The MLP definition below does two things:

1. It defines the model in model_perceptron()
2. It initialises and assigns values to each layer of the network as follows: (input, weights, biases)

In [None]:
def multilayer_perceptron(x):
    # Hidden fully connected layer with 256 neurons
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'], name='layer_1')
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'], name='layer_2')
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_2, weights['out'], name='out_layer') + biases['out']
    return out_layer

Next we construct a model object, passing the X placeholder as input.

In [None]:
logits = multilayer_perceptron(X)

### Define loss and optimizer

In the following snippet we define our loss operation, optimiser and initialise our global variables.

`tf.reduce_mean` - Computes the mean of elements across dimensions of a tensor.

`tf.train.AdamOptimizer` - Adam optimiser provides an adaptive gradient algorithm.

`optimizer.minimize` - Takes care of both computing the gradients and applying them with respect to `loss_op`.

In [None]:
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y), name='loss_op')
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op, name='train_op')

# Initializing the variables
init = tf.global_variables_initializer()

### Setup tensorboard

In [None]:
prediction = tf.nn.softmax(logits, name='prediction') # Raw prediction values
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1), name='correct_pred') # Predictions that are correct
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy') # Calculate accuracy
# Define writer for tensorbord log output
writer = tf.summary.FileWriter(os.path.join(os.getcwd(),"mlp-tb-" + str(datetime.fromtimestamp(time())) ), graph=tf.get_default_graph())

# Define and name tensorboard histograms
tf.summary.histogram("loss", loss_op)
tf.summary.histogram("accuracy", accuracy)

# Create a summary to monitor cost tensor
#tf.summary.scalar("loss", loss_op)
# Create a summary to monitor accuracy tensor
#tf.summary.scalar("accuracy", accuracy)

# Merge all summaries into a single output
merged_summary_op = tf.summary.merge_all()

### Train and evaluate the model

In [None]:
from matplotlib import pyplot as plt
with tf.Session() as sess:
    sess.run(init)

    # Training loop
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size) # Get number of batches
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size) # Train batch
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per display_step
        if epoch % display_step == 0:
            loss, acc, summary = sess.run([loss_op, accuracy, merged_summary_op], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            writer.add_summary(summary, epoch) # Write current step output to tensorboard
            
            #print("loss ", loss, "\nacc ", acc,"\nsummary ", summary)
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")

    
    # Test model
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))
    
    # Build confusion matrix from ground truth labels and model predictions
    conf_mat = tf.confusion_matrix(tf.argmax(Y, 1),tf.argmax(pred, 1)).eval({X: mnist.test.images, Y: mnist.test.labels})
    %matplotlib inline
    # Plot matrix
    plt.matshow(conf_mat)
    plt.colorbar()
    plt.ylabel('Real Class')
    plt.xlabel('Predicted Class')
    plt.show()

### Setup tensorboard using an ngrok tunnel

In [None]:
import time
import subprocess
import os
import signal

def get_process_pid(pstring):
    pid = None
    for line in os.popen("ps ax | grep " + pstring + " | grep -v grep | grep -v defunct"):
        fields = line.split()
        pid = fields[0]
    return pid

LOG_DIR = os.getcwd()
NG_DIR = LOG_DIR
# Uncomment if running locally
#NG_DIR = os.path.dirname(LOG_DIR)
NG_ZIP = os.path.join(NG_DIR, 'ngrok-stable-linux-amd64.zip')
NG_BIN = os.path.join(NG_DIR, 'ngrok')

# Download ngrok binary
if not os.path.isfile(NG_ZIP):
    !wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip \
        -P {NG_DIR}
if not os.path.isfile(NG_BIN):        
    !unzip -o {NG_DIR}/ngrok-stable-linux-amd64.zip -d {NG_DIR}

# If tensorboard is alredy running kill it and restart with the correct logdir
tb_pid = get_process_pid('tensorboard')
if tb_pid:
    print("Killing old tensorboard")
    os.kill(int(tb_pid), signal.SIGKILL)
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)
tb_pid = get_process_pid('tensorboard')
print ("Started tensorboard with pid %s" % tb_pid)

# If ngrok is alredy running do nothing
ng_pid = get_process_pid('ngrok')
if not ng_pid:
    proc = subprocess.Popen(['%s/ngrok' % NG_DIR , 'http', '6006'])
    print ("Started ngrok with pid %s" % proc.pid)
    time.sleep(5)
else:
    print ("ngrok alredy runing")
ng_pid = get_process_pid('ngrok')

# Get ngrok link
try:
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
except:
    print("Error getting ngrok link. Retrying...")
    time.sleep(5)
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

In [None]:
# Cleanup
#procs = [tb_pid, ng_pid]
#[os.kill(int(x), signal.SIGKILL) for x in procs if x is not None]
#!rm -rf mlp-tb-*

### Experiment
Now try experimenting with the model. What effects do you see when changing the model parameters?
 - learning_rate
 - training_epochs
 - batch_size
 - n_hidden_1
 - n_hidden_2

Try adding an additional hidden layer to the model. What impact does this have?

## End of MLP Notebook