# TensorFlow Tutorial - Feedforward Neural Networks

## Introduction
Feedforward Neural Networks (FNNs) are one of the most simplistic models in neural networks and machine learning, but have been used to solve various problems since its creation. You can still find several uses for FNNs to this day in many different algorithms, and still acts as a great start for designing other models. In this tutorial, we will be taking apart our FNN Python Script ([feedforward_neural_network/main.py](https://github.com/johnsbuck/tensorflow_tutorial/blob/master/tutorial/01_basics/feedforward_neural_network/main.py)) in order to describe each part used in developing an efficient model.


### What You Should Know
For this tutorial, I suggest learning the basic, theoretical design for FNNs beforehand as it can help you grasp a better understanding when implementing. I would suggest watching [Neural Networks Demystified](https://www.youtube.com/watch?v=bxe2T-V8XRs) by Welch Labs, as it presents a very good explanation for FNNs.

### What Problem Are We Solving?
We will be focusing on a textbook problem for our FNN to solve, the classification of hand-drawn digits. This is primarily done using the MNIST dataset, a collection of 28-by-28 px images, for training and testing of our FNN.



## Model Development Steps

Generally speaking, there are 5 steps in developing any model for machine learning:

1. Data
	* Import and pre-process a given dataset(s) for training, testing, and possibly making predictions with our model.
2. Model
	* Design and implement the model so that it can properly use the dataset.
	* This includes setting proper sizes for inputs and outputs based on the dataset specifications and the problem the model is used to solve.
3. Train
	* Using your training dataset, train your model so that it can better fit the inputs with the outputs.
4. Test/Evaluate
	* After (and optionally in between) training, use your model to predict the input of your testing dataset and compare with the actual results.
5. Predict
	* Use your newly developed model for later inputs from unknown data.

For the sake of this tutorial, we will be focusing on the first 4 steps and go over our import of our MNIST dataset, the creation of a FNN to classify digits, and the training & testing performed to create a strong model. Before we begin, lets import TensorFlow.

In [44]:
from __future__ import print_function   # Python3 Print Function
import tensorflow as tf

## Step 1: Data

Because we are using TensorFlow, we will be importing MNIST using the `input_data` function defined in `tensorflow.examples.tutorials.mnist`. Different datasets can be imported and converted either to a TensorFlow tensor or another compatible data type such as a NumPy Array.

In [45]:
from tensorflow.examples.tutorials.mnist import input_data

print("==> Importing MNIST")
mnist = input_data.read_data_sets("/tmp/tensorflow/mnist/input_data", one_hot=True)
print("\n")
print(mnist)

==> Importing MNIST
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz


Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz


Datasets(train=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f0918529f50>, validation=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f08eb8595d0>, test=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f08eb859490>)


==> Importing MNIST
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz


Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz


Datasets(train=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f08eb8a8610>, validation=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f08ddbc1d90>, test=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x7f08d57c1cd0>)


If you noticed, there are 3 smaller datasets in our MNIST dataset, train, validation, and test. The train dataset will be used for training our model using 28x28 px images as inputs. Our training dataset will also come with the output label, consisting of 10 numbers in a vector that define what class the image belongs to (i.e. <0 0 0 0 0 0 1 0 0 0> is a label for 6).

We will be focusing on this training dataset for training our model and the similar testing dataset for evaluating our model after training.


## Step 2: Model 

In order to define our FNN, we must first define several other properties first, such as the structure of our layers, weights, and biases. We will be showing how we define each property into different functions and the significance of specific choices.

### Weight Initialization

Before creating our layers, we need to plan out how to initialize our weights and bias. We can't set our weights to a constant such as 0 as we want to make sure our weights don't receive the same gradient during training. This implementation of starting weights at 0 is commonly referred to as *symmetric*. We want to break this symmetry by using small, random numbers which will make our weights relatively different from one another.

A common random distribution for weights is a normal distribution, with the variance set to $\frac{2}{n}$, where $n$ is the output size of the weights. We will be using this for our FNN.


In [46]:
def weight_variables(shape):
    initial = tf.random_normal(shape, stddev=tf.sqrt(2./shape[0]))
    return tf.Variable(initial, name="W")

### Bias Initialization

Now that we have defined our weights, we can now define our bias. The bias doesn't have to be random like the weights, since the weights will be performing the symmetry breaking for the model. This means we are able to set the bias to a constant, such as 0. Alternatives have been chosen in different networks (0.1, 0.01, etc.) for different activation functions. For our biases we will choose 0.01.

In [47]:
def bias_variables(shape, constant=0.01):
    initial = tf.constant(constant, shape=shape)
    return tf.Variable(initial, name="B")

### Fully Connected Layer

Now that we have defined our initial weight and bias functions, we can use them to create the layers for our FNN. In each layer, we perform matrix multiplication between the given inputs and our weights and add our bias to the result.

We then add an *activation function* that modifies our resulting Tensor to make our model or flexible. For our model, we will be using the *Rectified Linear Unit* (ReLU) as our activation function.

Although we can set each layer differently, we will be simplifying our workload by using the same layer over and over in our FNN.

In [None]:
def fc_layer(input, in_size, out_size):
    return tf.nn.relu(tf.matmul(input, weight_variables([in_size, out_size]) + bias_variables([out_size])))

### Feedforward Neural Network

We will now attach our layers together to form an FNN. Along with our fully connected layers, we will also have two placeholder variables, $x$ and $y$. In this example, $x$ will act as a placeholder for our input in training and $y$ will be placeholding the labels used to compare with the reuslt of our FNN.

We will be setting the input to 784 (28 * 28 px in our image) and an output of 10 (one class per digit [0-9]) and have one hidden layer in between or input and output with a size of 500.

In [None]:
def fnn():
    x = tf.placeholder(tf.float32, [None, 784])
    y = tf.placeholder(tf.float32, [None, 10])
    model = fc_layer(x, 784, 500)
    model = fc_layer(model, 500, 10)
    return model, x, y

model, x, y = fnn()

## Step 3: Train

In order to train our FNN, we must define two things, a loss function and an optimizer.

### Loss Function

Our loss function is used to compare our output from our model to the output given from our trainin dataset with our inputs. For our loss function, we will be using cross entropy as our main function, with our labels defined as the placeholder $y$ and our model giving the logits.

In [50]:
print("-> Defining loss function")
loss = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=model)
loss = tf.reduce_mean(loss)

-> Defining loss function


### Optimizer

Our optimizer tries to lower our loss as much as possible using a technique called *Stochastic Gradient Descent (SGD)* to iteratively search for a smaller, local minimum. We will be using a derivative of SGD called *Adam*. We will give is a learning rate of 0.001 or 1e-3 to begin with.

In [51]:
print("-> Defining optimizer")
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(loss)

-> Defining optimizer


### Accuracy

We will also be setting an accuracy for checking our performance over time, and to check the overall performance for our final test dataset. To do this, we will compare the highest output in our model and y, if they aren't the same then it is a miss, otherwise it is a hit. We will the use this to see how many hits we receive for a given batch or total set.

In [52]:
print("-> Defining accuracy")
correct_prediction = tf.equal(tf.argmax(model, 1), tf.argmax(y, 1))
correct_prediction = tf.cast(correct_prediction, tf.float32)
accuracy = tf.reduce_mean(correct_prediction)

-> Defining accuracy


### Training

Now that we've defined our loss and optimizer, we can finally train our model using our dataset. To do this, we will first need to create a TensorFlow session. This will be used to initialize our variable within our model and run our optimizer.

With our session, we will run our training for 20,000 steps. Each step we will train 50 images in a batch. We will also use our accuracy to check how we are doing over time.

*Note: Although 1,000,000 images (20,000 * 50) is several, the entire training dataset is 9,912,422 images. We will be going over 1,000,000 images to simplify our training.*

In [59]:
print("==> Begin Training")
sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in xrange(20000):
    batch = mnist.train.next_batch(50)
    if i % 500 == 0:
        train_accuracy = sess.run(accuracy, feed_dict={x: batch[0], y: batch[1]})
        print("Step %d, Training Accuracy: %g" % (i, train_accuracy))
    # Training Step
    sess.run(optimizer, feed_dict={x: batch[0], y: batch[1]})

==> Begin Training
Step 0, Training Accuracy: 0.16


Step 500, Training Accuracy: 0.92


Step 1000, Training Accuracy: 0.94


Step 1500, Training Accuracy: 0.94


Step 2000, Training Accuracy: 0.98


Step 2500, Training Accuracy: 1


Step 3000, Training Accuracy: 0.98


Step 3500, Training Accuracy: 1


Step 4000, Training Accuracy: 0.98


Step 4500, Training Accuracy: 0.98


Step 5000, Training Accuracy: 1


Step 5500, Training Accuracy: 0.98


Step 6000, Training Accuracy: 0.96


Step 6500, Training Accuracy: 0.98


Step 7000, Training Accuracy: 0.98


Step 7500, Training Accuracy: 0.96


Step 8000, Training Accuracy: 1


Step 8500, Training Accuracy: 0.98


Step 9000, Training Accuracy: 1


Step 9500, Training Accuracy: 1


Step 10000, Training Accuracy: 1


Step 10500, Training Accuracy: 0.98


Step 11000, Training Accuracy: 1


Step 11500, Training Accuracy: 0.98


Step 12000, Training Accuracy: 1


Step 12500, Training Accuracy: 0.98


Step 13000, Training Accuracy: 0.98


Step 13500, Training Accuracy: 1


Step 14000, Training Accuracy: 1


Step 14500, Training Accuracy: 1


Step 15000, Training Accuracy: 1


Step 15500, Training Accuracy: 1


Step 16000, Training Accuracy: 0.98


Step 16500, Training Accuracy: 1


Step 17000, Training Accuracy: 1


Step 17500, Training Accuracy: 1


Step 18000, Training Accuracy: 1


Step 18500, Training Accuracy: 1


Step 19000, Training Accuracy: 1


Step 19500, Training Accuracy: 1


As shown, our training has given a high accuracy from our model. However, to confirm our results we will be using a testing dataset that hasn't been trained upon to confirm our model's usefulness.

*Note: We are using our placeholder within our feed_dict, giving $x$ the training input and $y$ the actual training label. This can be done with any placeholder previously defined.*

## Step 4: Test

Now that we have finally training, we can test it using a dataset unknown to our model. We will be doing a full test dataset evaluation using our accuracy function.

In [60]:
print("==> Begin Testing")
print("test accuracy %g" % sess.run(accuracy, feed_dict={
    x: mnist.test.images, y: mnist.test.labels}))

sess.close()

==> Begin Testing
test accuracy 0.9811


Our accuracy should be a fairly high accuracy of approximately 98%. This legitimizes our training, which should be within a 2% margin of error based on training accuracy taken from different batches.

## Conclusion

From this notebook tutorial, you should have a basic understanding of feedforward neural networks, the five different steps for model development, and how to implement, train, and test a model. The full code is available under [feedforward_neural_network/main.py](https://github.com/johnsbuck/tensorflow_tutorial/blob/master/tutorial/01_basics/feedforward_neural_network/main.py) with TensorBoard code added.

## *References*

Weight & Bias Initialization: [CS231n](https://cs231n.github.io/neural-networks-2/#init)

Cross-Entropy: [CS231n](https://cs231n.github.io/linear-classify/#softmax)