# Practicing TensorFlow and TensorBoard with multilayer perceptron

### What is this for?
This is an example of a simple multilayer perceptron coded in tensorflow

### What data am I using?

I am using the mnist dataset.<br>
This dataset is made up of 1797 8x8 images. Each image, like the one shown below, is of a hand-written digit.In order to utilize <br> an 8x8 figure like this, we’d have to first transform it into a feature vector with length 64.
<br>
<br>
<img src="http://theanets.readthedocs.io/en/stable/_images/mnist-digits-small.png" alt="Multilayer Perceptron"/>
<br>
I am loading this dataset from __[sklearn.datasets](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets)__. <br>
This dataset contains 10 classes and approximately 180 samples per class

### What architecture will I use?

I am going to use a multilayer perceptron with two hidden layer, to mantain non-linearity and to explore the benefits of differents hyperparameters

<img src="https://elogeel.files.wordpress.com/2010/05/050510_1627_multilayerp1.png" alt="Multilayer Perceptron"/>


In [1]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import tensorflow as tf
import numpy as np
import pandas as pd

  from ._conv import register_converters as _register_converters


In [2]:
digits = load_digits()

###### The dataset label have only one dimension but ten values on it.

<br> We have to only preserve one class per column to make this architecture work, because we need <br>one neuron to output the given probability of a given class.

In [3]:
X_data = digits.data
y_data = digits.target
print(y_data.shape)
print(np.unique(y_data))

(1797,)
[0 1 2 3 4 5 6 7 8 9]


For that matter I am going to use pandas get_dummies function

In [4]:
y_data = pd.get_dummies(y_data).as_matrix()
print(y_data.shape)

(1797, 10)


Then we split the data in separate train and test slices because we have to test the real precision of our model.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, stratify= y_data,  test_size=0.28)

In this cell we are going to set the paramaters for our neural network and tensorboard, that is why we are going to log our results, in this case we are going to explore differents learning rates and optimizers to find what is best for our perceptron

In [6]:
#Parameters

n_inputs = X_data.shape[1]
n_outputs = y_data.shape[1]
n_hidden_1 = 90
n_hidden_2 = 30

#log path for every run

logs_path_1_run = "/tmp/mnist/1"
logs_path_2_run = "/tmp/mnist/2"


# command to open tensorboard

# tensorboard --logdir=run1:/tmp/mnist/1 --port=6006
# for multiples inputs tensorboard --logdir=run1:/tmp/mnist/1,run2:/tmp/mnist/2 --port=6006

x = tf.placeholder(tf.float32, [None, n_inputs])
y = tf.placeholder(tf.float32, [None, n_outputs])

We define our architecture based, initializing our neural network with random uniform distribution in our weigths matrices

In [7]:
# input layer
weights_1 = tf.Variable(tf.random_uniform([n_inputs, n_hidden_1], -1.0, 1.0))
# hidden layer
weights_2 = tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2], -1.0, 1.0))
#output layer
weights_3 = tf.Variable(tf.random_uniform([n_hidden_2, n_outputs], -1.0, 1.0))

#bias input layer
bias_1 = tf.Variable(tf.zeros([n_hidden_1]), name="bias2")

#bias for hidden
bias_2 = tf.Variable(tf.zeros([n_hidden_2]), name="bias1")

#bias output layer
bias_3 = tf.Variable(tf.zeros([n_outputs]), name="output")

#input layer sigmoid
il = tf.sigmoid(tf.matmul(x,weights_1) + bias_1)
hl = tf.sigmoid(tf.matmul(il, weights_2) + bias_2)
ol = tf.sigmoid(tf.matmul(hl, weights_3) + bias_3)

We try differents optimizers and learning rates here.

In [8]:
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = ol))
adam = tf.train.AdamOptimizer(learning_rate=.5e-3).minimize(loss)

gdo = tf.train.GradientDescentOptimizer(learning_rate=.7e-1).minimize(loss)

#predict the batch 
correct_prediction = tf.equal(tf.argmax(ol,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# log the loss
tf.summary.scalar("cost", loss)

# log the accuracy
tf.summary.scalar("accuracy", accuracy)

# merge all the logs
summary_op = tf.summary.merge_all()

init = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



In [9]:
parameters = [
    {
        'logs': logs_path_1_run,
         'opt': adam
    },
    {
        'logs': logs_path_2_run, 
         'opt': gdo
    }
]

In [10]:
def get_batch(X, Y, batch_size, idx):
    return X[batch_size*idx:batch_size*(idx+1)], Y[batch_size*idx:batch_size*(idx+1)]

In [11]:
epochs = 1000
batch_size= 599
total_numbers_of_batches = int(X_data.shape[0]/batch_size)

In [12]:
for prmtr in range(len(parameters)):
    with tf.Session() as sess:
        sess.run(init)
        writer = tf.summary.FileWriter(parameters[prmtr]["logs"], graph=tf.get_default_graph())
        for e in range(epochs):
            for i in range(total_numbers_of_batches):
                batch_x, batch_y = get_batch(X_train, y_train, batch_size, i)
                _, c, summary = sess.run([parameters[prmtr]["opt"], loss, summary_op], feed_dict = {x: batch_x, y: batch_y})
                writer.add_summary(summary, e * batch_size + i)
        # Calculate accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print("Accuracy:", accuracy.eval({x: X_test, y: y_test}))
        sess.close()

Accuracy: 0.9563492
Accuracy: 0.9047619


#### Results shown by Tensorboard
<br>
The Accuracy shown by Adam trained a lot faster than sgd (our second run) and converge more smoothly


<img src="img_1.png" alt="acc" style="width: 800px;"/>

<br> 

And our loss curve in adam descends in a good way here too without stumbling too much based on the momentum (read the link below)

<img src="img_2.png" alt="acc" style="width: 800px;"/>

### Conclusions

> So our result tested than no optimizer overfitted and adam converged way more faster than sgd, this is based on how adam works. If you have time you can checkout the __[original paper](https://arxiv.org/abs/1412.6980)__. <br>

> TensorBoard It gives many advantages like watching the loss function change in training time, also give the posibility of check the architecture (this is our multilayer perceptron with adam).

<img src="img_3.png" alt="acc" style="width: 800px;"/>


> TensorBoard also give us the posibility to dive into more detail explanation of neural networks behavior which generally serve to tweak hyperparameters more consciously.