# TensorFlow

This is the notebook for TensorFlow lessons in the DLND. Mostly we are going to deal with TensorFlow in the context of Deep Learning.

## Linear functions in TF

So returning to the original equation for Neural Networks:

$$
y = xW+b
$$

Where, y is the output and x is the input, W is the matrix of weights connecting two layers and b are the biases. This is the context that we are going over TensorFlow

### Weights and Bias

For our training, our goal is to modify the bias and weights of all the layers in our network in order to predict the label with the least amount of error possible. Since we are going to __update our weights and bias__ we are not allowed to use __tf.placeholder()__ or __tf.constant__. We are therefor going to user __tf.Variable()__ class (note the capital V).

The __tf.Variable__ class will create a tensor where the initial value can be modified similar to what we are using as variables in Python. The tensor stores its state in the session, so it must be initialized manually. For this we will use __tf.global_variables_initializer()__ function. This will initialize the state of __ALL the Variable tensors__.

### Initialization


>init=tf.global_variables_initializer()<br>
with tf.Session() as sess:<br>
sess.run(init)

The __tf.global_variables_initializer()__ call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the __tf.Variable__ class allows us to change the weights and bias, but an initial value needs to be chosen.

Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it. You'll learn more about this in the next lesson, when you study gradient descent.

Similarly, choosing weights from a normal distribution prevents any one weight from overwhelming other weights. You'll use the __tf.truncated_normal()__ function to generate random numbers from a normal distribution.
#### tf.truncated_normal()

>n_features = 120<br>
n_labels = 5<br>
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))<br>

The __tf.truncated_normal()__ function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.

Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias. Let's use the simplest solution, setting the bias to 0. Which we already do anyways. We can create a zero variable by calling __tf.zeros()__. This function will return a tensor with all zeros.

>n_labels = 5<br>
bias = tf.Variable(tf.zeros(n_labels))


## Linear Classifier Quiz

We are going to classify the handwritten numbers 0,1 and 2 from the MNIST dataset using TF. 

Since I can't run it anyways, here is the code used in the Quiz. I have commented it out to prevent running it.

In [None]:
# # Note: You can't run code in this tab
# import tensorflow as tf

# def get_weights(n_features, n_labels):
#     """
#     Return TensorFlow weights
#     :param n_features: Number of features
#     :param n_labels: Number of labels
#     :return: TensorFlow weights
#     """
#     # TODO: Return weights
#     return tf.Variable(tf.truncated_normal((n_features, n_labels)))
#     # We have a matrix of x,y and since this is weights, we use the tf.truncated_normal() function
#     # We expect to get a tensor which is random and Variable.



# def get_biases(n_labels):
#     """
#     Return TensorFlow bias
#     :param n_labels: Number of labels
#     :return: TensorFlow bias
#     """
#     # TODO: Return biases
#     return tf.Variable(tf.zeros(n_labels))
#     # Again, we want to set our bias to Zeros since we want the simpler solution.


# def linear(input, w, b):
#     """
#     Return linear function in TensorFlow
#     :param input: TensorFlow input
#     :param w: TensorFlow weights
#     :param b: TensorFlow biases
#     :return: TensorFlow linear function
#     """
#     # TODO: Linear Function (xW + b)
#     return tf.add(tf.matmul(input, w), b)
#     # Here, we first multiply x and W so its input and w.
#     # After the multiplication, it is then added to the bias.

Sandbox.py is where we initialized our weights in this quiz.

In [None]:
# import tensorflow as tf
# # Note: You can't run code in this tab
# from tensorflow.examples.tutorials.mnist import input_data
# from quiz import get_weights, get_biases, linear


# def mnist_features_labels(n_labels):
#     """
#     Gets the first <n> labels from the MNIST dataset
#     :param n_labels: Number of labels to use
#     :return: Tuple of feature list and label list
#     """
#     mnist_features = []
#     mnist_labels = []

#     mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

#     # In order to make quizzes run faster, we're only looking at 10000 images
#     for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

#         # Add features and labels if it's for the first <n>th labels
#         if mnist_label[:n_labels].any():
#             mnist_features.append(mnist_feature)
#             mnist_labels.append(mnist_label[:n_labels])

#     return mnist_features, mnist_labels


# # Number of features (28*28 image is 784 features)
# n_features = 784  # 28x28 is the pixel size.
# # Number of labels
# n_labels = 3  # We classify as either 0, 1 or 2 so 3 labels.

# # Features and Labels
# features = tf.placeholder(tf.float32)
# labels = tf.placeholder(tf.float32)
# # We want to declare our variables features and labels as tensors.
# # Since they are fixed, we can use the tf.placeholder()
# # Here we declare the type as float32.
# # Looking at it, its quite similar to np.


# # Weights and Biases
# w = get_weights(n_features, n_labels)
# b = get_biases(n_labels)
# # This is comming from the quiz.py defenitions of functions.
# # This is still defenition. We are just declaring our variable.

# # Linear Function xW + b
# logits = linear(features, w, b)
# # We then evaluate y = xW+b, with the function we have declared earlier.
# # Note, this is will not evaluate. It will only declare that we have logits which is the evaluation of
# # linear given arguments features, w and b

# # Training data
# train_features, train_labels = mnist_features_labels(n_labels)
# # We also declare our trainig dataset.
# # We have defined the function above.

# with tf.Session() as session:
#     session.run(tf.global_variables_initializer())

#     # Softmax
#     prediction = tf.nn.softmax(logits)
#     # First we evaluate the logits based on its declaration.
#     # Then we use the nn.softmax() function on it. I am guessing, nn is neural net.
#     # Note: This is beginning to look like C code.

#     # Cross entropy
#     # This quantifies how far off the predictions were.
#     # You'll learn more about this in future lessons.
#     cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)
#     # This is going to be our error function or not?

#     # Training loss
#     # You'll learn more about this in future lessons.
#     loss = tf.reduce_mean(cross_entropy)
#     # This is a mean squre of error?

#     # Rate at which the weights are changed
#     # You'll learn more about this in future lessons.
#     learning_rate = 0.08

#     # Gradient Descent
#     # This is the method used to train the model
#     # You'll learn more about this in future lessons.
#     optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
#     # We need to discover it at the future lessons.

#     # Run optimizer and get loss
#     _, l = session.run(
#         [optimizer, loss],
#         feed_dict={features: train_features, labels: train_labels})

# # Print loss
# print('Loss: {}'.format(l))

## Softmax on TF

The softmax function squashes it's inputs, typically called logits or logit scores, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the __*output of the softmax function is equivalent to a categorical probability distribution*__. It's the __*perfect function to use as the output activation for a network predicting multiple classes*__.

To declare a softmax we use the softmax function that is inside the tf.nn wrapper for neural networks inside TensorFlow. Its input must be a tensor of logits which must not be empty, and has type of _half or float32 or float64_. The return value is a tensor with the same type and shape of the logit tensor. This time its squashed. Note that softmax is an __Activation function__.

In [None]:
# import tensorflow as tf


# def run():
#     output = None
#     logit_data = [2.0, 1.0, 0.1]
#     logits = tf.placeholder(tf.float32)
    
#     # TODO: Calculate the softmax of the logits
#     softmax = tf.nn.softmax(logits)
    
#     with tf.Session() as sess:
#         # TODO: Feed in the logit data
        
#         output = sess.run(softmax, feed_dict={logits: logit_data})

#     return output

# # Note: I commented this out because I was encountering an error. Something with Core.
# # The answer for this is [0.6590012 0.24243298 0.09856589]

## Cross Entropy in TF

Now that we have activated our function, we can then proceed with computing our loss. For this one we are going to __create__ our own Cross entropy formula based on:
$$
D(\hat{y},y) = - \sum_j y_j ln \hat{y}_j
$$

For this, we need two things: first, the one_hot encode of our labels for $y$ and the $ln$ of our output $\hat{y}_j$. We are going to need __tf.reduce_sum()__ for the summation, and __tf.log()__ for the natural log.

Quiz: Print out the cross entropy loss for the following inputs.

In [None]:
# # Solution is available in the other "solution.py" tab
# import tensorflow as tf

# # Declaring the softmax data and One_hot_data
# softmax_data = [0.7, 0.2, 0.1]
# one_hot_data = [1.0, 0.0, 0.0]

# # Defining the type for the individual contents of softmax and one_hot data
# softmax = tf.placeholder(tf.float32)
# one_hot = tf.placeholder(tf.float32)

# # TODO: Print cross entropy from session
# cross_entropy = -tf.reduce_sum(tf.multiply(one_hot,tf.log(softmax)))
# # From the original equation -summation of y ln y_hat.
# # We need y here to be one_hot encoded
# # And we need the log (natural log) of our softmax

# with tf.Session() as sess:
#     print(sess.run(cross_entropy, feed_dict = {one_hot:one_hot_data,softmax:softmax_data}))
#     # feed_dict is quite similar to the experession for one_hot, softmax in zip(one_hot_data,softmax_data)
#     # That's how I picture it.
    
# # Answer is 0.35667497

## Mini-batch in TF

Here we go over implementing mini-batching in tensor flow. Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. _This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset._

Mini-batching is __computationally inefficient__, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to __randomly shuffle__ the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

Extracting /datasets/ud730/mnist\train-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist\train-labels-idx1-ubyte.gz
Extracting /datasets/ud730/mnist\t10k-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist\t10k-labels-idx1-ubyte.gz


Note: The code above is doable __BUT__ there can be errors sometimes concerning python or core not recognized. It usually gets resolved by restarting the Kernel.

Back to the discussion.

Question 1

Calculate the memory size of train_features, train_labels, weights, and bias in bytes. Ignore memory for overhead, just calculate the memory required for the stored data.

You may have to look up how much memory a float32 requires, using this link.
> To solve any of this, we need to understand what float32 means. Basically, it is the representation of a number for 32 bits (signed). Now, recalling from COCIFUN that 1 byte is 8 bits, 32 bits would be 4 bytes easily. Then we just multiply it to the dimension/shape of our variables, that gives us the answer.

train_features Shape: (55000, 784) Type: float32
> ANS: 172480000<br>

train_labels Shape: (55000, 10) Type: float32
> ANS: 2200000<br>

weights Shape: (784, 10) Type: float32
> ANS: 31360<br>

bias Shape: (10,) Type: float32
> ANS: 40<br>

As we can see, our features require ~170 Mb of memory. Now at least for this case this is still computationally small. But this is a simple "Hello, World!" function of Machine learning. Imagine how much would be required to create the pixels of huge datasets like big images. This could prove computationally problematic.<br><br>
To help with this, we do __batching__. __Batching__ allows us to create process huge sets of data within a limited hardware capitalization, of course with the implication that we will take a hit on speed. In this case, we will be exploring Mini-batching and applying it with TensorFlow.

## Mini-batching in TF


In order to use mini-batching, you must first divide your data into batches.

Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)

In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's tf.placeholder() function to receive the varying batch sizes.

Continuing the example, if each sample had n_input = 784 features and n_classes = 10 possible labels, the dimensions for features would be [None, n_input] and labels would be [None, n_classes].

> Features and Labels<br>
features = tf.placeholder(tf.float32, [None, n_input])<br>
labels = tf.placeholder(tf.float32, [None, n_classes])<br>

What does __None__ do here?

The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.
Question 2

Use the parameters below, how many batches are there, and what is the last batch size?

features is (50000, 400)

labels is (50000, 10)

batch_size is 128

Here I got stuck. I needed help in figuring out what the correct answer is. I tried to do ((50000x400)+(50000x10))/128 and that was incorrect. I tried to do (50000+50000)/128 also incorrect. Also, no luck in (400+10)/128. I had to ask my mentor what I was doing wrong. Right now it was incorrect. Then I moved forward and found an example of a batching code below this section in the lesson. Turns out I was looking at it the wrong way. I only have to divide 50000 to 128, which gives out 390.625 as the number of batches. Since we can't have a floating batch the answer is 391. 50000-(390x128) = 80 which is the size of the last batch.

>One key thing I got from this is that I am looking at the shape wrong. I thought that batching was for every element. I was wrong. Its for every data set. The shape $(50000,400)$ does not mean there are $2x10^6$ data sets, that is the number of elements/points in the data set. We do not consider it that way, we take 1 set of features and 1 set of labels. So the correct way to think is there are 784 features in a single data set that corresponds to 10 labels for that set. In all, there are 50000 sets of features and labels and this is what is needed to be batched. The length of the columns inside a single data set (row) is irrelevant in the context of batching.<br>

So now we have this example in the quiz:


In [None]:
# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

example_batches = batches(3, example_features, example_labels)

The example_batches variable would be the following:

[
    # 2 batches:
    #   First is a batch of size 3.
    #   Second is a batch of size 1
    [
        # First Batch is size 3
        [
            # 3 samples of features.
            # There are 4 features per sample.
            ['F11', 'F12', 'F13', 'F14'],
            ['F21', 'F22', 'F23', 'F24'],
            ['F31', 'F32', 'F33', 'F34']
        ], [
            # 3 samples of labels.
            # There are 2 labels per sample.
            ['L11', 'L12'],
            ['L21', 'L22'],
            ['L31', 'L32']
        ]
    ], [
        # Second Batch is size 1.
        # Since batch size is 3, there is only one sample left from the 4 samples.
        [
            # 1 sample of features.
            ['F41', 'F42', 'F43', 'F44']
        ], [
            # 1 sample of labels.
            ['L41', 'L42']
        ]
    ]
]

Again, this was where I figured how batching is actually done. As you can see the features data set is 4 rows by 4 columns meaning that we have the 4 data sets with 4 features $(4,4)$. Then we have the labels data set which is 4 rows by 2 columns $(4,2)$. If you look at the result of the example bath, its actually regardless of how many columns is in the data set, it will just be copied. What matters is the number of actual sets of data, which is indicated by the first argument of shape. If this is the case then we have 4 sets which we want to batch by 3. So we have 2 batches (passes) with the first pass of batch full with 3 rows of set from the features and labels pair.

### Quiz
Below is a simple implementation of a batch in Python.

In [None]:
import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)  # Make sure that the length of both features and labels are the same.
    # TODO: Implement batching
    output_batch=[]  # Initialize the output.
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):  # Start stop and step
    # Note: sample_size is len of features so in this case, 16, we flattened
    # First start_i is 0 then 4 then  then 8 then 12
        end_i = start_i + batch_size
        # define the end which is current size + batch_size, can be skipped,
        # We just have to update the end slice below.
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        # define 1 batch as pair of features from start_i to end_i which should be 3 rows.
        output_batch.append(batch)
        # Append to the list per batch.
        
    return output_batch

The code above is just a definition of a function. It should not output anything. After this quiz, there is another set of quiz building up on this where we use the code above to run a mini-batch for the MNIST dataset.

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    outout_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        outout_batches.append(batch)
        
    return outout_batches

# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})
    # for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        # sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})
    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

KeyboardInterrupt: 

Made the mistake of running my Notebook on vanilla Jupyter. I should have changed it to my Tensorflow environment. Will check this out tomorrow. Generally, it should output an accuracy that is low. To fix this would be covered in the next topic which is epochs.

## Epochs in TF

A review of an epoch, its the single forward or backward pass for the __whole dataset__. Epochs are used to increase the accuracy of the model without needing more data. The discussion for this section is going to be for epochs in the TensorFlow environment and how to choose the right number for them.

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches  # Helper function created in Mini-batching section


def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

In [None]:
# Test result of the code above, for verification

# Epoch: 0    - Cost: 11.0     Valid Accuracy: 0.204
# Epoch: 1    - Cost: 9.95     Valid Accuracy: 0.229
# Epoch: 2    - Cost: 9.18     Valid Accuracy: 0.246
# Epoch: 3    - Cost: 8.59     Valid Accuracy: 0.264
# Epoch: 4    - Cost: 8.13     Valid Accuracy: 0.283
# Epoch: 5    - Cost: 7.77     Valid Accuracy: 0.301
# Epoch: 6    - Cost: 7.47     Valid Accuracy: 0.316
# Epoch: 7    - Cost: 7.2      Valid Accuracy: 0.328
# Epoch: 8    - Cost: 6.96     Valid Accuracy: 0.342
# Epoch: 9    - Cost: 6.73     Valid Accuracy: 0.36 
# Test Accuracy: 0.3801000118255615

The code above is a whole run of the MNIST database for 10 epochs. As we can see in the results, the model only achieved an accuracy of 0.36 while the cost was decreased from 11 to 6.73.

Now we attempt to change the code and increase our epochs to 100. We can do this by changing the code below to our original set. Its right below the init section.

In [None]:
batch_size = 128
epochs = 10
learn_rate = 0.001

Changing the code above will provide the output below:

In [None]:
...
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.86
Epoch: 80   - Cost: 0.11     Valid Accuracy: 0.869
Epoch: 81   - Cost: 0.109    Valid Accuracy: 0.869
....
Epoch: 85   - Cost: 0.107    Valid Accuracy: 0.869
Epoch: 86   - Cost: 0.107    Valid Accuracy: 0.869
Epoch: 87   - Cost: 0.106    Valid Accuracy: 0.869
Epoch: 88   - Cost: 0.106    Valid Accuracy: 0.869
Epoch: 89   - Cost: 0.105    Valid Accuracy: 0.869
Epoch: 90   - Cost: 0.105    Valid Accuracy: 0.869
Epoch: 91   - Cost: 0.104    Valid Accuracy: 0.869
Epoch: 92   - Cost: 0.103    Valid Accuracy: 0.869
Epoch: 93   - Cost: 0.103    Valid Accuracy: 0.869
Epoch: 94   - Cost: 0.102    Valid Accuracy: 0.869
Epoch: 95   - Cost: 0.102    Valid Accuracy: 0.869
Epoch: 96   - Cost: 0.101    Valid Accuracy: 0.869
Epoch: 97   - Cost: 0.101    Valid Accuracy: 0.869
Epoch: 98   - Cost: 0.1      Valid Accuracy: 0.869
Epoch: 99   - Cost: 0.1      Valid Accuracy: 0.869
Test Accuracy: 0.8696000006198883

We can see that the accuracy is increased to 0.869 and we were able to bring down the cost to 0.1. But we do have a problem here. After epoch 80, our model stopped improving and got stuck at an accuracy of 0.869. We will try changing the learning rate to 0.1 instead of 0.001. This will lead to a result below:

In [None]:
Epoch: 76   - Cost: 0.214    Valid Accuracy: 0.752
Epoch: 77   - Cost: 0.21     Valid Accuracy: 0.756
Epoch: 78   - Cost: 0.21     Valid Accuracy: 0.756
...
Epoch: 85   - Cost: 0.207    Valid Accuracy: 0.756
Epoch: 86   - Cost: 0.209    Valid Accuracy: 0.756
Epoch: 87   - Cost: 0.205    Valid Accuracy: 0.756
Epoch: 88   - Cost: 0.208    Valid Accuracy: 0.756
Epoch: 89   - Cost: 0.205    Valid Accuracy: 0.756
Epoch: 90   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 91   - Cost: 0.207    Valid Accuracy: 0.756
Epoch: 92   - Cost: 0.204    Valid Accuracy: 0.756
Epoch: 93   - Cost: 0.206    Valid Accuracy: 0.756
Epoch: 94   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 95   - Cost: 0.2974   Valid Accuracy: 0.756
Epoch: 96   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 97   - Cost: 0.2996   Valid Accuracy: 0.756
Epoch: 98   - Cost: 0.203    Valid Accuracy: 0.756
Epoch: 99   - Cost: 0.2987   Valid Accuracy: 0.756
Test Accuracy: 0.7556000053882599

This time, our accuracy was lower and we still get stuck at in terms of accuracy. So increasing the learning rate was not the option, we then try to revert to the previous learning rate of 0.001 but this time change the epochs to 80 from 100. We get the output below for this new setup:

In [None]:
Epoch: 65   - Cost: 0.122    Valid Accuracy: 0.868
Epoch: 66   - Cost: 0.121    Valid Accuracy: 0.868
Epoch: 67   - Cost: 0.12     Valid Accuracy: 0.868
Epoch: 68   - Cost: 0.119    Valid Accuracy: 0.868
Epoch: 69   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 70   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 71   - Cost: 0.117    Valid Accuracy: 0.868
Epoch: 72   - Cost: 0.116    Valid Accuracy: 0.868
Epoch: 73   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 74   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 75   - Cost: 0.114    Valid Accuracy: 0.868
Epoch: 76   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 77   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 78   - Cost: 0.112    Valid Accuracy: 0.868
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.868
Epoch: 80   - Cost: 0.111    Valid Accuracy: 0.869
Test Accuracy: 0.86909999418258667

Again, this is expected since we have already run this set of hyperparameters before and we expected it to plateau at this epoch. In the upcoming lab, we will go over changing the hyperparameters in TensorFlow.

There was a Quiz about TensorFlow which could (should) be downloaded. Its about doing tensorflow and initializing and also creating hyperparameters. For now we have another quiz below. The idea is to create a multilayer NN with an activation function of relu. Its still quite new to me but there is intuition here. Its quite intuitive for the most part.

In [1]:
# Solution is available in the other "solution.py" tab
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model
# Hidden Layer with ReLU activation function
hidden_layer = tf.add(tf.matmul(features,weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
logits = tf.add(tf.matmul(hidden_layer,weights[1]), biases[1])

# TODO: Print session results
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits))

KeyboardInterrupt: 

Moving forward in the TensorFlow environment we take on how to actually build a __deep neural network__ with a __logisitc classifier__ we have created previously.

## Walkthrough
<br> The following code list should help us understand what we are doing in the __deep neural network__ in terms of code in TensorFlow.

### Code Portion

#### TensorFlow MNIST

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".",one_hot=True,reshape=False)
# Its quite similar to the Pandas get CSV, this time with more arguments.

# Here we are going to still use the MNIST dataset that is already in TensorFlow as part of the Tutorials.
# The data is already one_hot encoded and will get batched.

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.


ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

#### Learning Parameters

In [None]:
import tensorflow as tf

# Parameters
learning_rate = 0.001
# The rate at which change is going to get applied to our system.
training_epochs = 20
# The number of passes we are going to have in our dataset.
batch_size = 128 # NOTE: Decrease this parameter if your memory is not enough
# The number of rows in the data that we will update in parallel.
display_step = 1

n_input = 784
# The number of features on a given data set. This is a 28*28 size image thus the size.
# Also, remember the one_hot option earlier is applicable because we are expecting uniform range in the inputs already.
n_classes = 10
# This is the number of labels. For MNIST its the digits 0-9 so 10 classes.

#### Hidden Layer Parameters

In [None]:
n_hidden_layer = 256
# This will be the layer number of features. So simply put, the hidden nodes for the hidden layer.
# Can also be considered the width of the layer.

#### Weights and Biases
Note that we are using a single hidden layer for this example. I think the formula $n+1$ where $n=number of hidden layers$ should provide us an idea of how many weights and biases we need to define.

In [None]:
# Store layers weight and bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input,n_hidden_layer])),
    'out' : tf.Variable(tf.random_normal([n_hidden_layer,n_classes]))
}
# Here we initialize the size and type of our weights.
# For this case we have 1 hidden layer, so we have 2 weight array.
# One from input to hidden and one from hidden to output
# Note that instead or random seed, we just use the random_normal function of TensorFlow.
# Also, that weights will get changed later in back prop so it needs to be variable.
bias = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input,n_hidden_layer])),
    'out' : tf.Variable(tf.random_normal([n_hidden_layer,n_classes]))
}
# In this case, our bias is also random. But we can set it to zero as well with the tf.zero() instead of random_normal.

#### Input
We now define our input class and size. Also, we are going to do __flattening__ of the layer via the __tf.reshape()__ function.<br>
> __The -1 in the shape argument of tf.reshape is used to create it as 1-D. This can be found in the documentation of [tf.reshape()](https://www.tensorflow.org/api_docs/python/tf/reshape)__

In [None]:
# tf Graph input
x = tf.placeholder('float',[None, 28,28,1])
# We define our input as x. Since its constant we use the tf.placeholder instead of tf.variable.
# Then we define its type as float and its dimmensions are None for the batchsize which is also a placeholder.
# Then our image size of 28,28 from MNIST and then finally we have 1 as our channel.
# Note that the channel is 1 since this is grayscale, if its colored then this will change.

y = tf.placeholder('float',[None,n_classes])
# Here we also define our output as y. Again its going to be constant so its tf.placeholder().
# Float type with dimmensions for batchsize as None and n_classes which will be 10 once we run it.

x_flat = tf.reshape(x,[-1,n_input])
Note that we cannot map a 28x28 pixel image to 1 class. We need it to be in a single row to correspond to one output.
This is where we use the tf.reshape option. We reshape x into 1-D by n_input array.


####  Multilayer Perceptron
Let's review first what we have done, first we downloaded our MNIST data from the tutorials portion of TensorFlow. Then we have imported TensorFlow and declared our parameters. We then defined our variables for weights and biases as well as our inputs.<br>Now we can move to actually creating our multilayer perceptron. This is similar to the .add() in Keras or the matmul in numpy. More similar to numpy than in keras though.

In [None]:
# Hidden Layer with RELU activation

# Input to Hidden (y=wx+b)
layer_1 = tf.add(tf.matmul(x_flat,weights['hidden_layer']),bias['hidden_layer'])

# Activating the output of linear function of layer_1
layer_1 = tf.nn.relu(layer_1)

# Hidden_layer to output (Activation function is linear y=x,so its skipped)
# We define our output as logits and again linear function is y=wx+b
logits = tf.add(tf.matmul(layer_1,weights['out']),bias['out'])

#### Optimizer
We are going to define here our loss and optimizer. Its quite similar to the error $y-\hat{y}$ and the error function i the numpy implementation.<br>Here we are going to use softmax for our cost which is the loss and our optimizer is GradientDescentOptimizer which is the same as gradient descent we were using. Note that TensorFlow already has these functions built-in so we no longer have to do the sigmoid, sigmoid_prime and all those other things we were doing in the numpy implementation.

In [None]:
# Defining loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=y))
optimizer = tf.train.nn.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Read the documentation for the proper mapping of the arguments.
# Note that softmax is just the summation, we still have to get the mean of the summation. So MSE?

#### Session
Now that we have defined all of our variables, our model and our error functions we can now run the model.

Documentation for [tf.global_variables_initializer()](https://www.tensorflow.org/api_docs/python/tf/global_variables_initializer).

In [None]:
# Initializing the variables
init = tf.global_variables_initializer()
# Note that the global variables intializer is a helper function in TensorFlow. Its what it says it is.

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training Cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size) # Actually, this just gets the number of batches
        # Looping over the entire batch
        for i in range(total_batch):
            # Actual training, forward pass
            batch_x , batch_y = mnist.train.next_batch(batch_size)
            # Backpropagation
            sess.run(optimizer, feed_dict=(x:batch_x, y:batch_y))

### Adding Layers
The code above can be repeated for more layers. It is intuitive enough to be done with minimal code edits. I still would need to figure out how to do the looping or if it still needs to be done. But from what I can figure out, I think TensorFlow is smart enough to figure our how many layers it would need.<br> Here is how I see it: once __*logits*__ is called it will prompt the code that would run logits (in this case layer_1), and since layer_1 is still not evaluated then it will go to the definition of layer_1 and run the code there which is going to require the linear function of layer_1 to be evaluated which in turn would move one step back in the model. Adding more layers should theoretically mean just changing the variable chain. <br>
>__It is making sure that you pipe | your outputs well that it makes sense for the model to work__. I think this is one great thing about tensorflow, you no longer need to define and input the same line of code over and over again.

### Save and Restore TensorFlow Models
<br>We all know that training a model, especially complex ones, can consume a lot of time. One problem with closing a TensorFlow session is that you will lose all the trained weights and biases data. This will be frustrating especially if you want to use the model you have based on the trainings it has done. For these kinds of situation, TensorFlow has the ability to save your progress using the [__tf.train.Saver__](https://www.tensorflow.org/api_docs/python/tf/train/Saver) class. Reading through the documentation, it looks like it will output the weights via a text file. The arguments include options to enable reshape and sharded saving of the weights.<br> From Udacity's explanation, the class will save the __tf.Variable__ data types in the file system.<br><br>Here is the code on saving the __*weights*__ and __*bias*__ Tensors for a sample model.

In [None]:
import tensorflow as tf

# Declare the file path for the data to be saved
save_file = './model.ckpt'
# This is a checkpoint file, and it will create a new file named model

# Declaring the two tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2,3]))
bias = tf.Variable(tf.truncated_normal([3]))
# Just to recall, this would mean y = w1x1+b1+w2x2+b2+w3x3+b3 so the number of features = number of bias

# Aliasing the tf.train.Saver() class
saver = tf.train.Saver()
# Note that Saver is really capital

with tf.session() as sess:
    # Initialize the variables
    sess.run(tf.global_variables_initializer())
    # This is almost always done, initializing
    
    # Show the current values of weights and bias
    print('Weights:')
    print(sess.run(weights)) # or a=sess.run(weights) print('Weights:',a)
    print('Bias:')
    print(sess.run(bias))
    
    # Save the model
    saver.save(sess,save_file)