# Edgar Alfredo Briceño Gonzalez

# A01221672

---

comments in > markdown notation and at the bottom of the file


# Overview

During this session, you will participate in a supervised learning exercise about digit recognition. You will:
- Build a neural network.
- Train your neural network with a dataset that contains images that represent numbers. 
- Modify the architecture of the neural network to obtain better results.

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# The MNIST data


The MNIST data is comprised of pictures that represent a number and includes the number label associated to each picture. The data set is split into three data parts:
- training (mnist.train)
- testing (mnist.test)
- validation (mnist.validation)

The validation split is important because it's essential in machine learning that there is separate data which is not given to the machine during the learning phase. After the initial machine learning session, you present the separate data to the machine and assess the performance of it. The test and validation sets may serve different evaluation purposes.

In [2]:
%%capture
mnist = input_data.read_data_sets("./Data/MNIST_data/", one_hot=True)

# Creating the model

Common numerical computing libraries in Python use external code that is implemented in other languages to take advantage of their efficiency. However, switching back to Python for every operation causes overhead. This overhead is bad if you want to run computations on GPUs or in a distributed manner since there is a high cost to transfer data.

TensorFlow does its heavy lifting outside of Python. Although it does not run expensive operations independently from Python, TensorFlow enables you to describe a graph of interacting operations that run entirely outside Python.

#### TensorFlow Graph
Create a TensorFlow graph that represents a neural network with no hidden layers and an output layer comprised of 10 nodes.

In [3]:
########################
# Write your code here
########################

#control variables
image_size = 28*28
number_digits = 10
next_layer_neurons = 10

#paceholders to build the graph
x = tf.placeholder(tf.float32, [None, image_size])
y_ = tf.placeholder(tf.float32, [None, number_digits])

#initialize weights and bias to zero
W = tf.Variable(tf.zeros([image_size, next_layer_neurons]))
b = tf.Variable(tf.zeros([next_layer_neurons]))

#matrix multiplication
y = tf.add(tf.matmul(x, W), b)

# Defining Cost Funtion and Optimizer

In order to train the model, define what it means to improve the results after each iteration. Use a cost function and try to minimize with respect to it. The cost function represents how far you are from our desired outcome. Minimizing the error leads you towards improving the model.

A common cost function is called *cross-entropy*. Cross-entropy takes advantage of large errors and reduces the learning slow down that is caused because of traditional cost functions (i.e. quadratic cost function). In summary, it will take less to train a good model.

#### TensorFlow
1. Create a tensor to represent the cross-entropy function. 
1. Create a tensor to represent a Gradient Descent Optimizer that minimizes the cross-entropy.


In [4]:
########################
# Write your code here
########################
#Softmax is one of many cost functions
cross_entropy = tf.reduce_mean(
                    tf.nn.softmax_cross_entropy_with_logits(
                    labels=y_, logits=y))

#initialize the weights and bias to zeros
learn_rate = 0.5
train_step = tf.train.GradientDescentOptimizer(
                learning_rate=learn_rate).minimize(cross_entropy)

# Creating a TensorFlow Session

You have defined your model by creating a complete Tensorflow graph. Now, you need to launch it. Create an interactive session and initialize all the variables defined before.

#### Interactive Session
Create an Interactive Session and run the global variable initializer.

In [5]:
########################
# Write your code here
########################
#A session allows to run the neural network.
sess = tf.InteractiveSession()

#Initialize memory for the architecture variables
tf.global_variables_initializer().run()

# Train

MNIST is a large dataset. To train using a batch learning method is too time intensive in-between epochs. Therefore, use small batches of random data. This method is called stochastic training.

#### Stochastic Training
1. In a `for` loop, take 100 random samples from MNIST and run the train step using the resulting batches. 
1. Repeat the process as many times as necessary. 
1. Present all the training datasets.


In [6]:
########################
# Write your code here
########################
#create batches of information
iterations = 50
batch_size = 1000

#feed for batches
for _ in range(iterations):
    batch_xs, batch_ys = mnist.train.next_batch(batch_size)
    sess.run(train_step,
        feed_dict={x: batch_xs, y_: batch_ys})

# Evaluate your model

To understand your model's precision, you need to compare your results with the expected output. To calculate the precision, you need to sum the correct classifications over the size of the testing dataset.
#### Calculating Precision
1. Create a Tensor that compares the model's output with the expected output. 
1. Determine the fraction that are correct.


In [7]:
########################
# Write your code here
########################
#Compare if the prediction matches reality
correct_prediction = tf.equal(tf.argmax(y, 1),
                                tf.argmax(y_, 1))
#Compute accuracy based on binary result
accuracy = tf.reduce_mean(tf.cast(correct_prediction,
                                    tf.float32))
#Print the value of accuracy (Notice the TEST dataset)
print(sess.run(accuracy,feed_dict={x:mnist.test.images,
                                    y_:mnist.test.labels}))


0.8899


# Create your own Neural Network
**Goal:** Train a neural network (NN) with accuracy of ~95% on the testing set.
#### Steps:
* Create a NN with 2 hidden layers and 300 nodes in each.

In [22]:
#Control Variables
image_size = 784 # MNIST data input (img shape: 28*28)
number_digits = 10
next_layer_neurons = 10
n_hidden_1 = 300 # 1st layer number of neurons
n_hidden_2 = 300 # 2nd layer number of neurons
#tf graph input
x = tf.placeholder(tf.float32, [None, image_size])
y_ = tf.placeholder(tf.float32, [None, number_digits])
# layer weight and bias
weights = {
    'h1': tf.Variable(tf.zeros([image_size, n_hidden_1])),
    'h2': tf.Variable(tf.zeros([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.zeros([n_hidden_2, number_digits]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([number_digits]))
}
def neural_net(x):
    #First Layer - Sigmoid matrix multiplication
    y0 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    #Second Layer - Sigmoid matrix multiplication
    y1 = tf.add(tf.matmul(y0, weights['h2']), biases['b2'])
    #Output Layer - Matrix multiplication
    y2 = tf.matmul(y1, weights['out']) + biases['out']
    return y2



---

>the part above helped me refactoring the code since I used a lot of trial and error in the beginning and was not sure of the results I ended up redoing the neural network many times, here are the results of the refactor

---


In [54]:
#parameters
learn_rate = 0.1
logits = neural_net(x)
prediction = tf.nn.softmax(logits)

#Softmax is one of many cost functions
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
                     logits=logits, labels=y_))

#optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
#Gradient Descent is one of many cost-function optimizers
optimizer = tf.train.AdamOptimizer(learning_rate=learn_rate)
train_step = optimizer.minimize(cross_entropy)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

#A session allows to run the neural network.
sess = tf.InteractiveSession()

#Initialize memory for the architecture variables
tf.global_variables_initializer().run()

---

> Above is where is defined the optimizer and training_step this methods are called whenever there is an adjustment in the precision if the neural network. As it is I changed the `GradientDescentOptimizer` optimizer for the `AdamOptimizer` since in the documentation it says its more acurate and a self adjusted method I needed to modify the `learn_rate` fewer times.

---


* Train model with 10 epochs.
* Print the accuracy at each epoch.

In [55]:
#Create epochs and batches of information
epochs = 20
num_steps = 500
batch_size = 1000
display_step = 100

for epoch in range(epochs):
    #Feed the batches
    print("Epoch: " + str(epoch+1))
    for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization step (backprop)
        sess.run(train_step, feed_dict={x: batch_x, y_: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cross_entropy, accuracy], feed_dict={x: batch_x,
                                                                 y_: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    
    #Print the value of accuracy (Notice the TEST dataset)
    print(sess.run(accuracy,feed_dict={x:mnist.test.images,
                                    y_:mnist.test.labels}))
    # Calculate accuracy for MNIST test images
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels}))


Epoch: 1
Step 1, Minibatch Loss= 21.6030, Training Accuracy= 0.101
Step 100, Minibatch Loss= 1.8083, Training Accuracy= 0.748
Step 200, Minibatch Loss= 1.0136, Training Accuracy= 0.793
Step 300, Minibatch Loss= 0.4827, Training Accuracy= 0.856
Step 400, Minibatch Loss= 0.4172, Training Accuracy= 0.893
Step 500, Minibatch Loss= 0.3111, Training Accuracy= 0.906
0.9032
Testing Accuracy: 0.9032
Epoch: 2
Step 1, Minibatch Loss= 0.3256, Training Accuracy= 0.919
Step 100, Minibatch Loss= 0.3779, Training Accuracy= 0.897
Step 200, Minibatch Loss= 0.2734, Training Accuracy= 0.920
Step 300, Minibatch Loss= 0.3551, Training Accuracy= 0.909
Step 400, Minibatch Loss= 0.3750, Training Accuracy= 0.904
Step 500, Minibatch Loss= 0.2087, Training Accuracy= 0.929
0.9129
Testing Accuracy: 0.9129
Epoch: 3
Step 1, Minibatch Loss= 0.2336, Training Accuracy= 0.932
Step 100, Minibatch Loss= 0.2682, Training Accuracy= 0.926
Step 200, Minibatch Loss= 0.2560, Training Accuracy= 0.929
Step 300, Minibatch Loss= 0.2

* Print the final accuracy of the training set.
* Print the final accuracy of the testing set.

In [49]:
iterations = 500
batch_size = 1000

#feed for batches
for _ in range(iterations):
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    sess.run(train_step,
        feed_dict={x: batch_x, y_: batch_y})
#######################
# Write your code here
########################
#Compare if the prediction matches reality
correct_prediction = tf.equal(tf.argmax(logits, 1),
                                tf.argmax(y_, 1))
#Compute accuracy based on binary result
accuracy = tf.reduce_mean(tf.cast(correct_prediction,
                                    tf.float32))
#Print the value of accuracy (Notice the TEST dataset)
print(sess.run(accuracy,feed_dict={x:mnist.test.images,
                                    y_:mnist.test.labels}))

0.8834


# Summary

During the exercise you:
- Learned how to create a neural network and train it. 
- Tested the performance of your model, and found that the network architecture definition is important to obtain better results.
- Learned the importance of the initialization step (random numbers instead of zeroes), and how it impacts performance. 
- Reviewed the activation functions’ impact in performance.


# Conclusions and observations.
this practice teached me how important is to have a proper learn_rate and to have a big enought sample of epochs to achieve an appropriate answer, at first I was stunned, I couldn't understand why my result was so far of the first one with no hidden layers, It was not until later after some tweaking numbers that It strucked my that it was very important to have the correct `learn_rate` and `num_steps` later I got to experiment with other tenser flow methods and it ended up with better results.

By means of the `GradientDescentOptimizer` I have only achieved poor performance where the best optimization I have found is around 40 epocs with a `learn_rate` of 0.1. I have done other experiments alternating values in `num_steps`,` epochs`, `learn_rate` and` batch_size`, until now my experimentation has led me to understand that a size of steps like 50 that I had in the beginning is very little to achieve an acceptable `training_acuracy`.

The following table depicts the results from diferent learning rates and their accuracy after 20 epochs.


| learn_rate| accuracy|
| --------- |:-------:|
| 0.5       | .152    |
| 0.2       | .122    |
| 0.1       | .870     |

the with Train a neural network (NN) with accuracy of ~95% was achieved with the `AdamOptimizer` training with a learing rate of .1 and 20 epochs. 
![proof](https://i.imgur.com/FVBYk2Y.png)

## references.
https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer

https://stackoverflow.com/questions/40368697/where-does-next-batch-in-the-tensorflow-tutorial-batch-xs-batch-ys-mnist-trai

https://stackoverflow.com/questions/40951602/what-does-the-question-mark-in-tensorflow-shape-mean

https://towardsdatascience.com/under-the-hood-of-neural-network-forward-propagation-the-dreaded-matrix-multiplication-a5360b33426