# Minimalist TensorFlow implementation of fully connected neural network
Some takeaways
* Associate a variable with a name. And likewise for `name_scope` of a group of nodes
* Initialize weight by truncated normal
* Pass the activation function as `tf.nn.relu`
* `if tf... is not None`
* Output layer is from a fully connected layer of num_output as number of neurons
* Keep input as batch so define with flexible size (None, INPUT_SIZE)
* What does the session run? It runs the training op, which is to minimize the optimisation objective. 
* Output can be 
    * one-hot-code value: `tf.placeholder(tf.int64, shape=(None, OUTPUT_SIZE), name="y")`
    * a scalar value: `tf.placeholder(tf.int64, shape=(None), name="y")`
* More layers with less number of weights can produce similar performance. But they tend to be more overfitting (training error ~ 0, while test error large)
* How values of weights remain from one iteration to another

In [1]:
import tensorflow as tf
import numpy as np
from datetime import datetime

  return f(*args, **kwds)


## Load data

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/")

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


* `mnist.train` is a DataSet object, supporting the generator to throw next batch

## Create simple fully connected networks (define a computation flow)
* Create placeholders for input X and output y
* Two hidden layers
* One softmax layer at the output

In [3]:
IMAGE_WIDTH = 28
INPUT_SIZE = IMAGE_WIDTH*IMAGE_WIDTH
OUTPUT_SIZE = 10 # 10 digits
n_hidden_1 = 100  # Subject to change, test
n_hidden_2 = 50
X = tf.placeholder(tf.float32, shape=(None, INPUT_SIZE), name="X")  # Using "name" to show in graph
y = tf.placeholder(tf.int64, shape=(None), name="y") 

#### Define a ReLU activation function
* For plotting, if needed. But not as an activation function for tensorflow because it is not `native`, missing some requirement

In [4]:
def relu(z):
    """Work for both scalar and vector. But not compatible with TensorFlow"""
    return np.maximum(z,0)

assert relu(10) == 10
assert relu(-10) == 0
assert np.array_equal(relu(np.array([10, -10])),np.array([10, 0]))

#### Create a low-level neural network layer

In [5]:
def neural_layer(X, num_neurons, name, activation = None):
    """Input: 
        - X: Input vector to the layer
        - size of the layer (number of neurons)
        - name of the layer
        - activation function on top, before outputing
        Return: 
        - output_vector
    """
    with tf.name_scope(name): # Name scope to group related nodes
        # Get the size of each input
        n_inputs = int(X.get_shape()[1])  # It could be in batch dim(X) = 5x748 (every 5 pictures go together)
        
        # Define the weight matrix, initialized randomly with Gausian distribution, std dev = 2/sqrt(n_inputs)
        stddev = 2/np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, num_neurons), stddev=stddev)
        W = tf.Variable(init, name="kernel")
        
        # Define bias
        b = tf.Variable(tf.zeros([num_neurons]), name="bias")
        
        # Get the output
        Z = tf.matmul(X, W) + b
        
        # Activation is optional, if present, they can be ReLU or sigmoid function
        if activation is not None:
            return activation(Z)
        else:
            return Z

#### Option for a higher level (wrapper) of the single layer
`tf.layers.dense(X, n_hidden1, name="hidden1", activation=tf.nn.relu)`

#### Stack the layers together in a first multi-layer neural networks

In [6]:
with tf.name_scope("dnn"): # Deep neural networks
    hidden_1 = neural_layer(X, n_hidden_1, name="hidden_1", activation=tf.nn.relu)
    hidden_2 = neural_layer(hidden_1, n_hidden_2, name="hidden_2", activation=tf.nn.relu)
    logits = neural_layer(hidden_2, OUTPUT_SIZE, name="outputs")
    
with tf.name_scope("loss"): # Define the loss function for optimisation later
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

#### Optimiser to minimize the loss function

In [7]:
learning_rate = 0.01
with tf.name_scope("train"): # Objective of training is to minimize loss function
    optimizer=tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    training_op = optimizer.minimize(loss)  # Define a training operation as the process of minimizing the optimizer

#### Evaluate

In [8]:
with tf.name_scope("evaluate"):
    correct = tf.nn.in_top_k(logits, y, 1)  # compare true labels y with the prediction, `in_top_k` takes care of argmax
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

## Execution phase (put the computation flow in action)

In [9]:
# Set some administrative parameters
init_variables = tf.global_variables_initializer()
saver = tf.train.Saver()
n_epochs = 40
batch_size = 50
num_samples = mnist.train.num_examples
n_iterations = num_samples//batch_size
now = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)
loss_summary = tf.summary.scalar('loss', loss)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

with tf.Session() as sess:
    # First init all variables
    init_variables.run()
    for epoch in range(n_epochs):
        for iteration in range(n_iterations):
            # Get batch
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X:X_batch, y:y_batch})
        acc_train = accuracy.eval(feed_dict = {X:X_batch, y:y_batch})
        acc_test  = accuracy.eval(feed_dict = {X:mnist.test.images, y:mnist.test.labels})
        if epoch%5==0 or epoch==n_epochs-1: 
            summary_str = loss_summary.eval(feed_dict={X: X_batch, y: y_batch})  # Record training loss
            step = epoch * batch_size + iteration
            file_writer.add_summary(summary_str, step)
            print("Epoch: ", epoch, "\n\tTrain accuracy:", acc_train, "\n\tTest accuracy:", acc_test)
    save_path = saver.save(sess, "./my_model_final.ckpt")
    
file_writer.close()

Epoch:  0 
	Train accuracy: 0.92 
	Test accuracy: 0.9078
Epoch:  5 
	Train accuracy: 0.94 
	Test accuracy: 0.9482
Epoch:  10 
	Train accuracy: 0.98 
	Test accuracy: 0.9595
Epoch:  15 
	Train accuracy: 0.98 
	Test accuracy: 0.9656
Epoch:  20 
	Train accuracy: 0.98 
	Test accuracy: 0.9702
Epoch:  25 
	Train accuracy: 0.98 
	Test accuracy: 0.971
Epoch:  30 
	Train accuracy: 0.98 
	Test accuracy: 0.9729
Epoch:  35 
	Train accuracy: 1.0 
	Test accuracy: 0.9727
Epoch:  39 
	Train accuracy: 1.0 
	Test accuracy: 0.9743


## Visualisation 
TensorBoard can provide the following capability 
* Visualise the computation graph
* Train vs test error as a function of epoch

**In code** it should be 3 steps:
* Create the log writer
```
now = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)
loss_summary = tf.summary.scalar('loss', loss)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
```
* Log the relevant information at certain time step
```
if epoch%5==0 or epoch==n_epochs-1: 
    summary_str = loss_summary.eval(feed_dict={X: X_batch, y: y_batch})  # Record training loss
    step = epoch * batch_size + iteration
    file_writer.add_summary(summary_str, step)
```
* Close the writer `file_writer.close()`

**In command line**, run the following `$tensorboard --logdir [path_to_log_dir]`

Check the result in the address: http://localhost:6006/

**Computation flow**
![](img/architecture.png)

**Train loss vs epochs**
![Train vs epochs](img/train_vs_epochs.png)