# TensorFlow code for a Convoloutional Neural Network
In this section we will go through the code for Convoloutional Neural Network in TensorFlow.

Built around the implementation by [Aymeric Damien](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py)

First of all we set up the required imports and define the location of the mnist data.

In [None]:
from __future__ import division, print_function, absolute_import
import os
from time import time
from datetime import datetime
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.ERROR)

mnist = input_data.read_data_sets("../scratch/", one_hot=False)

Here are the relevant training and network parameters.
The batch size given is 128. That means that each time, at most 128 images are fed into our model.

In [None]:
# Training Parameters
learning_rate = 0.001 # Initial learning rate
training_epochs = 2000 # Number of epochs to train
batch_size = 128 # Number of images per batch

# Network Parameters
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.25 # Dropout, probability to drop a neuron

### Model Creation
In the following `conv_net` definition we have 5 types of layers.

1. convolution layers,
2. max pooling layers,
3. layers for reshaping input,
4. fully-connected layers and
5. dropout layers.

#### conv2d layer

A convolution layer tries to extract higher-level features by replacing data for each individual pixel with a value computed from the pixels covered by the specified filter centered on that pixel. We slide the filter across the width and height of the input image and compute the dot products between the pixels within the filter and input at each position.

#### max_pooling2d layer

Pooling layers reduce the overall size of the output by replacing values in the input by a function of those values. For example in max pooling we take the maximum out of every pool, such as 2x2,  as the new value for that pool.

#### reshape layer

There are two layers in this model that reshape the data. The first takes the imput data and reshapes it from a 1D array to a 3D structure, of (Height x Width x Channel), and the second reverses this process.

#### Fully Connected (FC) layer

The fully connected layer does a Wx + b for each neuron (number of neurons = number of outputs). Multiply each input by a weight, sum all those results up and then add the bias to get the output for that neuron.

#### Dropout Layer

Dropout sets a fraction of outputs passed on to the next layer to zero. These zeroed outputs are chosen randomly. This reduces overfitting by ensuring that the network can provide the right output even if some neurons are dropped out.

In [None]:
# Create the neural network
def conv_net(x_dict, n_classes, dropout, reuse, is_training):
    # Define a scope for reusing the variables
    sc_name = 'train' if is_training else 'test'
    with tf.variable_scope('ConvNet', reuse=reuse, default_name=sc_name+'_scope'):
        # TF Estimator input is a dict, in case of multiple inputs
        x = x_dict['images']

        # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
        # Reshape to match picture format [Height x Width x Channel]
        # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
        x = tf.reshape(x, shape=[-1, 28, 28, 1], name='Reshape')

        # Convolution Layer with 32 filters and a kernel size of 5
        conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu, name='Conv_1')
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv1 = tf.layers.max_pooling2d(conv1, 2, 2, name='Pool_1')

        # Convolution Layer with 64 filters and a kernel size of 3
        conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu, name='Conv_2')
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv2 = tf.layers.max_pooling2d(conv2, 2, 2, name='Pool_2')

        # Flatten the data to a 1-D vector for the fully connected layer
        fc1 = tf.contrib.layers.flatten(conv2)

        # Fully connected layer (in tf contrib folder for now)
        fc1 = tf.layers.dense(fc1, 1024, name='fc_layer')
        # Apply Dropout (if is_training is False, dropout is not applied)
        fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training, name='fc1_dropout')

        # Output layer, class prediction
        out = tf.layers.dense(fc1, n_classes, name='out_layer')

    return out

### Define model function, loss and optimizer

Define the model function using the tensorflow estimator template.

NOTE: As we do not want any dropout during testing we specify a computation graph for each train and test, controlling the dropout with the `is_training` papameter.

We also define:

`tf.reduce_mean` - Computes the mean of elements across dimensions of a tensor.

`tf.train.AdamOptimizer` - Adam optimiser provides an adaptive gradient algorithm.

`optimizer.minimize` - Takes care of both computing the gradients and applying them with respect to `loss_op`.

In [None]:
def model_fn(features, labels, mode):
    # Build the neural network
    # Because Dropout have different behavior at training and prediction time, we
    # need to create 2 distinct computation graphs that still share the same weights.
    logits_train = conv_net(features, num_classes, dropout, reuse=False,
                            is_training=True)
    logits_test = conv_net(features, num_classes, dropout, reuse=True,
                           is_training=False)

    # Predictions
    pred_classes = tf.argmax(logits_test, axis=1, name='pred_classes')
    pred_probas = tf.nn.softmax(logits_test, name='pred_probas')

    # If prediction mode, early return
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)

    # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)), name='loss_op')
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(loss_op,
                                  global_step=tf.train.get_global_step(), name='train_op')

    # Evaluate the accuracy of the model
    acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes, name='acc_op')

    # Include additional scalar for training accuracy
    tf.summary.scalar('accuracy_train', acc_op[1])
    
    # TF Estimators requires to return a EstimatorSpec, that specify
    # the different ops for training, evaluating, ...
    estim_specs = tf.estimator.EstimatorSpec(
        mode=mode,
        predictions=pred_classes,
        loss=loss_op,
        train_op=train_op,
        eval_metric_ops={'accuracy_eval': acc_op})

    return estim_specs

Next we construct a model object, passing the required inputs, and the tensorboard output directory.

In [None]:
tf.logging.set_verbosity(tf.logging.INFO)
output_dir = os.path.join( os.getcwd(),"cnn-tb-" + str(datetime.fromtimestamp(time())) )

model = tf.estimator.Estimator(model_fn, model_dir=output_dir)

Define the input function for training, which returns a function that will feed a dictionary of numpy arrays into the model.

In [None]:
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.train.images}, y=mnist.train.labels,
    batch_size=batch_size, num_epochs=None, shuffle=True)


### Train the model

In [None]:
model.train(input_fn, steps=training_epochs)

### Evaluate the Model

In [None]:
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.test.images}, y=mnist.test.labels,
    batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
eval = model.evaluate(input_fn)

print("Testing Accuracy:", eval['accuracy_eval'])

labels = list(mnist.test.labels)
raw_predictions = model.predict(input_fn)
predictions = (list(raw_predictions))

from sklearn.metrics import confusion_matrix
from matplotlib import pyplot as plt
# Build confusion matrix from ground truth labels and model predictions
conf_mat = confusion_matrix(y_true=labels, y_pred=predictions)
%matplotlib inline
# Plot matrix
plt.matshow(conf_mat)
plt.colorbar()
plt.ylabel('Real Class')
plt.xlabel('Predicted Class')
plt.show()

### Setup tensorboard using an ngrok tunnel

In [None]:
import time
import subprocess
import os
import signal

def get_process_pid(pstring):
    pid = None
    for line in os.popen("ps ax | grep " + pstring + " | grep -v grep | grep -v defunct"):
        fields = line.split()
        pid = fields[0]
    return pid

LOG_DIR = os.getcwd()
NG_DIR = LOG_DIR
# Uncomment if running locally
#NG_DIR = os.path.dirname(LOG_DIR)
NG_ZIP = os.path.join(NG_DIR, 'ngrok-stable-linux-amd64.zip')
NG_BIN = os.path.join(NG_DIR, 'ngrok')

# Download ngrok binary
if not os.path.isfile(NG_ZIP):
    !wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip \
        -P {NG_DIR}
if not os.path.isfile(NG_BIN):        
    !unzip -o {NG_DIR}/ngrok-stable-linux-amd64.zip -d {NG_DIR}

# If tensorboard is alredy running kill it and restart with the correct logdir
tb_pid = get_process_pid('tensorboard')
if tb_pid:
    print("Killing old tensorboard")
    os.kill(int(tb_pid), signal.SIGKILL)
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)
tb_pid = get_process_pid('tensorboard')
print ("Started tensorboard with pid %s" % tb_pid)

# If ngrok is alredy running do nothing
ng_pid = get_process_pid('ngrok')
if not ng_pid:
    proc = subprocess.Popen(['%s/ngrok' % NG_DIR , 'http', '6006'])
    print ("Started ngrok with pid %s" % proc.pid)
    time.sleep(5)
else:
    print ("ngrok alredy runing")
ng_pid = get_process_pid('ngrok')

# Get ngrok link
try:
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
except:
    print("Error getting ngrok link. Retrying...")
    time.sleep(5)
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

In [None]:
# Cleanup
#procs = [tb_pid, ng_pid]
#[os.kill(int(x), signal.SIGKILL) for x in procs if x is not None]
#!rm -rf cnn-tb-*

### Experiment
Now try experimenting with the model. What effects do you see when changing the model parameters?
 - learning_rate
 - training_epochs
 - batch_size
 - dropout

Try adding an extra convolutional layer to the model.

## End of CNN Notebook