# Basic Feed Forward Network in Tensorflow

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Feed Forward Network on MNIST

Here we build a simple fully-connected network for MNIST. The network will have 2 hidden layers: 784 input neurons (28x28 shaped mnist), 2x layers with 256 hidden neurons , and 10 output neurons ( 1 for each digit)

Tensorflow provides a convenient interface for MNIST data. This makes it really easy to test your code on a dataset that is commonly used. The code below shows you how to read MNIST images and store the labels as one-hot vectors

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
MNIST = input_data.read_data_sets("../data/mnist", one_hot = True)

Extracting ../data/mnist/train-images-idx3-ubyte.gz
Extracting ../data/mnist/train-labels-idx1-ubyte.gz
Extracting ../data/mnist/t10k-images-idx3-ubyte.gz
Extracting ../data/mnist/t10k-labels-idx1-ubyte.gz


Create placeholders for X and Y. 
* Note that each MNIST image is 28x28. Additionally, the data will already be flattened into a 784 dimensional vector when we input it into the model
* Each label is 10d - a vector element for every possible digit.
* Make sure the shapes of the placeholders are defined so a variable number of images and labels can be fed in each batch. *This is what index 0 manages. Just put None instead of a dimension in this piece of the net*

In [3]:
with tf.name_scope('input'):
    X = tf.placeholder(...)
    Y = tf.placeholder(...)

Create a weights variable and a biases variable of the appropriate shapes.
* Initialize the weights variable from a truncated normal distribution using tf.truncated_normal(...) - this is better than setting weights to zero because it removes symmetry from backpropagation. [Here's a more in depth discussion](https://datascience.stackexchange.com/a/10930)
* The bias variable should also be set to a small value, such as 0.1. Do this by using tf.constant(...) and inputting the value and the appropriate shape
* When you multiply the feature vector X and the weights variable, the result should be the same shape as the bias tensor so they can be added
* Make sure to use tf.matmul() when multiplying matrices. Using \* will multiply element wise

Declare each layer in the network and the final logits by:
* Creating variables for weights and biases of the appropriate sizes
* Applying ReLu on $X \cdot W + b$


Network Configurations:
* First layer has 784 input features and 256 output features
* Second layer has 256 input features and 256 output features
* Third layer has 256 input features and 10 output features



In [4]:
with tf.name_scope('network'):
    W1 = tf.Variable(tf.truncated_normal(...)
    b1 = tf.Variable(tf.constant(...)
    layer1 = tf.nn.relu(tf.matmul(...) + ...)

    W2 = tf.Variable(...)
    b2 = tf.Variable(...)
    layer2 = ...

    W_out = ...
    b_out = ...
    logits = ...

Compute the entropy using `tf.nn.softmax_cross_entropy_with_logits`. This will apply the softmax function to the logits before calculating the entropy. The loss as the mean over the entropy.

In [5]:
with tf.name_scope('cross_entropy_loss'):
    entropy = tf.nn.softmax_cross_entropy_with_logits(logits=..., labels=...)
    loss = tf.reduce_mean(...)

Declare the optimizer as the `GradientDescentOptimizer` with an appropriate learning rate. Set it to minimize the loss.
* Note: When running the optimizer, if the loss is nan or increasing with each epoch, try decreasing the learning rate

In [6]:
with tf.name_scope('optimizer'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=...).minimize(...)

Compute the accuracy by:
* using `tf.equal` on the predicted label and the true label
* casting that to a float and computing the mean over all examples

In [7]:
Y_pred = tf.nn.softmax(...)
y_pred_cls = tf.argmax(..., 1)
y_cls = tf.argmax(..., 1)
accuracy = tf.reduce_mean(tf.cast(..., tf.float32))

### Create summaries for Tensorboard

To run Tensorboard, paste this into the terminal:

`tensorboard --logdir=/logs/train`

In [8]:
tf.summary.scalar('loss', loss)
tf.summary.scalar('accuracy', accuracy)
tf.summary.histogram('Weights_1', W1)
tf.summary.histogram('Bias_1', b1)
tf.summary.histogram('Weights_2', W2)
tf.summary.histogram('Bias_2', b2)
tf.summary.histogram('Weights_out', W_out)
tf.summary.histogram('Bias_out', b_out)

<tf.Tensor 'Bias_out:0' shape=() dtype=string>

Merge all the summaries together so they can be called easily

In [9]:
summary_op = tf.summary.merge_all()

Start an `InteractiveSession` and initialize all the global variables.
* For each epoch, run the optimizer on each ```X,y``` pair and sum up the loss over all data points
* Print the loss after each epoch

We set the batch size to 128 and epochs to 25. Feel free to play around with these variables. Additionally, every 5 epochs we calculate validation accuracy

In [None]:
batch_size = 128
epochs = 25
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
writer = tf.summary.FileWriter('logs/train', graph=tf.get_default_graph())
n_batches = (int) (MNIST.train.num_examples/batch_size)
for i in range(epochs):
    total_loss = 0
    for batch in range(n_batches):
        X_batch, y_batch = MNIST.train.next_batch(batch_size)
        o, l, summary = sess.run([optimizer, loss, summary_op], feed_dict={X: X_batch, Y: y_batch})
        total_loss += l
        writer.add_summary(summary, i*n_batches + batch)
    print("Epoch {0}: {1}".format(i, total_loss))
    if i % 5 == 0 and i!= 0:
        X_val, y_val = MNIST.validation.next_batch(MNIST.validation.num_examples)
        val_accuracy = sess.run(accuracy, feed_dict={X: X_val, Y: y_val})
        print("\tVal Accuracy {0}".format(val_accuracy))

After training and all validation, you'll want to return your test accuracy

In [None]:
print("Computing accuracy ...")
X_batch, y_batch = MNIST.test.next_batch(MNIST.test.num_examples)
final_accuracy = sess.run(...)

print ("Test Accuracy {0}".format(final_accuracy))