In [1]:
import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(sess.run(hello_constant).decode())

Hello World!


#### Tensor
* In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a tensor. In the case of hello_constant = tf.constant('Hello World!'), hello_constant is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:

In [2]:
# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
# C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

* tf.constant() is one of many TensorFlow operations you will use in this lesson. 
* The tensor returned by tf.constant() is called a constant tensor, because the value of the tensor never changes.

### Session

* TensorFlow’s api is built around the idea of a computational graph, a way of visualizing a mathematical process which you learned about in the MiniFlow lesson. Let’s take the TensorFlow code you ran and turn that into a graph:

![alt text](img/session.png "TF Session")

The code has already created the tensor, hello_constant, from the previous lines. The next step is to evaluate the tensor in a session.

The code creates a session instance, sess, using tf.Session. The sess.run() function then evaluates the tensor and returns the results.

After you run the above, you will see the following printed out:

In [3]:
with tf.Session() as sess:
    output = sess.run(hello_constant).decode()
    print(output)

Hello World!


### Input
* In the last section, you passed a tensor into a session and it returned the result. What if you want to use a non-constant? This is where tf.placeholder() and feed_dict come into place. In this section, you'll go over the basics of feeding data into TensorFlow.

`tf.placeholder()`

* Sadly you can’t just set x to your dataset and put it in TensorFlow, because over time you'll want your TensorFlow model to take in different datasets with different parameters. You need tf.placeholder()!

* *`tf.placeholder()`* returns a tensor that gets its value from data passed to the tf.session.run() function, allowing you to set the input right before the session runs.

### Session’s feed_dict



 ```
 x = tf.placeholder(tf.string)
 
 with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})

```

Use the feed_dict parameter in tf.session.run() to set the placeholder tensor. The above example shows the tensor x being set to the string "Hello, world". It's also possible to set more than one tensor using feed_dict as shown below.

```

x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
output = sess.run(x, feed_dict={x: 'Test String', y: 123, z: 45.67})

```

**Note**: If the data passed to the feed_dict doesn’t match the tensor type and can’t be cast into the tensor type, you’ll get the error “ValueError: invalid literal for...”.

**Quiz**
Let's see how well you understand tf.placeholder() and feed_dict. The code below throws an error, but I want you to make it return the number 123. Change line 11, so that the code returns the number 123.

**Note**: The quizzes are running TensorFlow version 0.12.1. However, all the code used in this course is compatible with version 1.0. We'll be upgrading our in class quizzes to the newest version in the near future.

In [4]:
import tensorflow as tf
import numpy as np
# TODO: Convert the following to TensorFlow:

def run():
    output = None
    
    x = tf.placeholder('int32')
    y = tf.placeholder('int32')
    z = tf.placeholder('int32')
    
    z = tf.subtract(tf.divide(x,y), 1)
    
    with tf.Session() as sess:
        # TODO: Feed the x tensor 123
        output = sess.run(z, feed_dict={x:10,y:2})

    return np.asscalar(output)

# TODO: Print z from a session

print(run())

4.0


### TensorFlow Math

* Getting the input is great, but now you need to use it. You're going to use basic math functions that everyone knows and loves - add, subtract, multiply, and divide - with tensors. (There's many more math functions you can check out in the documentation.)

#### Addition
`x = tf.add(5, 2)  # 7`

* You’ll start with the add function. The tf.add() function does exactly what you expect it to do. It takes in two numbers, two tensors, or one of each, and returns their sum as a tensor.

#### Subtraction and Multiplication
* Here’s an example with subtraction and multiplication.

```
x = tf.subtract(10, 4) # 6
y = tf.multiply(2, 5)  # 10
```

* The x tensor will evaluate to 6, because 10 - 4 = 6. The y tensor will evaluate to 10, because 2 * 5 = 10. That was easy!

#### Converting types
* It may be necessary to convert between types to make certain operators work together. For example, if you tried the following, it would fail with an exception:

```
tf.subtract(tf.constant(2.0),tf.constant(1)) 
```
---
*Fails with ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int32:*


* That's because the constant 1 is an integer but the constant 2.0 is a floating point value and subtract expects them to match.

* In cases like these, you can either make sure your data is all of the same type, or you can cast a value to another type. In this case, converting the 2.0 to an integer before subtracting, like so, will give the correct result:

`tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1))   # 1`

### Quiz

* Let's apply what you learned to convert an algorithm to TensorFlow. The code below is a simple algorithm using division and subtraction. Convert the following algorithm in regular Python to TensorFlow and print the results of the session. You can use tf.constant() for the values 10, 2, and 1.

In [5]:
import tensorflow as tf
import numpy as np
# TODO: Convert the following to TensorFlow:



def run():
    output = None
    
    x = tf.placeholder('int32')
    y = tf.placeholder('int32')
    z = tf.placeholder('int32')
    
    z = tf.subtract(tf.divide(x,y), tf.cast(tf.constant(1), tf.float64))
    
    with tf.Session() as sess:
        # TODO: Feed the x tensor 123
        output = sess.run(z, feed_dict={x:10,y:2})

    return np.asscalar(output)

# TODO: Print z from a session

print(run())

4.0


### Linear functions in TensorFlow

* The most common operation in neural networks is calculating the linear combination of inputs, weights, and biases. As a reminder, we can write the output of the linear operation as


* Here, **W** is a matrix of the weights connecting two layers. The output **y**, the input **x**, and the biases **b** are all vectors.

### Weights and Bias in TensorFlow

* The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified. This leaves out tf.placeholder() and tf.constant(), since those Tensors can't be modified. This is where tf.Variable class comes in.

```
tf.Variable()
x = tf.Variable(5)
```

* The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You'll use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.

### Initialization
```
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    
```

* The `tf.global_variables_initializer()` call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the **tf.Variable** *class allows us to **change the weights and bias***, **but an initial value needs to be chosen**.

* Initializing the weights *with random numbers from a normal distribution is* **good practice**. Randomizing the weights helps the model from **becoming stuck in the same place every time you train it**. You'll learn more about this in the next lesson, when you study gradient descent.

* Similarly, **choosing weights from a normal distribution prevents any one weight from overwhelming other weights**. You'll use the `tf.truncated_normal()` function to generate random numbers from a normal distribution.

```
tf.truncated_normal()
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
```

* The tf.truncated_normal() function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.

* Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias. Let's use the simplest solution, setting the bias to 0.

```
tf.zeros()
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

```

* The tf.zeros() function returns a tensor with all zeros.

### Linear Classifier Quiz

* A subset of the MNIST dataset

* You'll be classifying the handwritten numbers 0, 1, and 2 from the MNIST dataset using TensorFlow. The above is a small sample of the data you'll be training on. Notice how some of the 1s are written with a serif at the top and at different angles. The similarities and differences will play a part in shaping the weights of the model.


**Left: Weights for labeling 0. Middle: Weights for labeling 1. Right: Weights for labeling 2.**

* The images above are trained weights for each label (0, 1, and 2). The weights display the unique properties of each digit they have found. Complete this quiz to train your own weights using the MNIST dataset.

### Instructions

* Open quiz.py.
    * Implement get_weights to return a tf.Variable of weights
    * Implement get_biases to return a tf.Variable of biases
    * Implement xW + b in the linear function

* Open sandbox.py
    * Initialize all weights
    * Since xW in xW + b is matrix multiplication, you have to use the tf.matmul() function instead of tf.multiply(). Don't forget that order matters in matrix multiplication, so tf.matmul(a,b) is not the same as tf.matmul(b,a).

In [6]:
# The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You'll use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.

x = tf.Variable(5)
x

<tf.Variable 'Variable:0' shape=() dtype=int32_ref>

In [7]:
# Initialization -  returns an operation that will initialize
# all TensorFlow variables from the graph.

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

In [8]:
# tf.truncated_normal() function returns a tensor with random values from a 
# normal distribution whose magnitude is no more than 2 standard deviations from the mean.

n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
weights

<tf.Variable 'Variable_1:0' shape=(120, 5) dtype=float32_ref>

In [9]:
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))
bias

<tf.Variable 'Variable_2:0' shape=(5,) dtype=float32_ref>

In [10]:
import tensorflow as tf

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    # TODO: Return weights
    weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))
    return weights


def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    bias = tf.Variable(tf.zeros(n_labels))
    return bias


def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    # TODO: Linear Function (xW + b)
    x = tf.matmul(input, w)
    x = tf.add(x,b)
    
    return x
    
    

In [11]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data


def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

    # In order to make quizzes run faster, we're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels


# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

with tf.Session() as session:
    # TODO: Initialize session variables
    init = tf.global_variables_initializer()
    session.run(init)
        
    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))


Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /datasets/ud730/mnist\train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /datasets/ud730/mnist\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting /datasets/ud730/mnist\t10k-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Loss: 8.307246208190918


### TensorFlow Softmax
* The softmax function squashes it's inputs, typically called logits or logit scores, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of the softmax function is equivalent to a categorical probability distribution. It's the perfect function to use as the output activation for a network predicting multiple classes.
 
![alt text](img/softmax-input-output.png "TF Session")

* Example of the softmax function at work.

#### TensorFlow Softmax
* We're using TensorFlow to build neural networks and, appropriately, there's a function for calculating softmax.

`x = tf.nn.softmax([2.0, 1.0, 0.2])`

* Easy as that! tf.nn.softmax() implements the softmax function for you. It takes in logits and returns softmax activations.

### Quiz
* Use the softmax function in the quiz below to return the softmax of the logits.

In [12]:
import tensorflow as tf

def run_smax():
    output = None
    logit_data = [2.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)
    
    # TODO: Calculate the softmax of the logits
    softmax = tf.nn.softmax(logits)   
    
    with tf.Session() as sess:
        # TODO: Feed in the logit data
        output = sess.run(softmax, feed_dict={logits:logit_data})
        
    sess.close()
    return output

In [13]:
run_smax()

array([0.6590012 , 0.24243298, 0.09856589], dtype=float32)

### Cross Entropy in TensorFlow

* As with the softmax function, TensorFlow has a function to do the cross entropy calculations for us.

 ![alt-text](img/softmax-input-output.png "Softmax map")

#### Cross entropy loss function

* Let's take what you learned from the video and create a cross entropy function in TensorFlow. To create a cross entropy function in TensorFlow, you'll need to use two new functions:

```
tf.reduce_sum()
tf.log()
```

#### Reduce Sum

```
x = tf.reduce_sum([1, 2, 3, 4, 5])  # 15
```

* The `tf.reduce_sum()` function takes an array of numbers and sums them together.

#### Natural Log

`x = tf.log(100.0)  # 4.60517`
* This function does exactly what you would expect it to do. tf.log() takes the natural log of a number.

### Quiz

![alt-text](img/cross-entropy-diagram.png "cross-entropy")

Print the cross entropy using softmax_data and one_hot_encod_label.

In [14]:
import tensorflow as tf

softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

entropy = tf.subtract(tf.constant(0.0), 
                      tf.reduce_sum(tf.multiply(tf.log(softmax), one_hot)))

feed_dict = {softmax:softmax_data, one_hot:one_hot_data}

# TODO: Print cross entropy from session

with tf.Session() as session:
    output = session.run(entropy, feed_dict=feed_dict)

output

0.35667497

### Mini-batching
* In this section, you'll go over what mini-batching is and how to apply it in TensorFlow.

* Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

* Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

* It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

**Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.**

In [15]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

Extracting /datasets/ud730/mnist\train-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist\train-labels-idx1-ubyte.gz
Extracting /datasets/ud730/mnist\t10k-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist\t10k-labels-idx1-ubyte.gz


In [16]:
train_labels.size*4

2200000

In [17]:
((10*10*784 + train_features.size + train_labels.size + 10) * 4)

174993640

In [18]:
784*10*4

31360

* The total memory space required for the inputs, weights and bias is around 174 megabytes, which isn't that much memory. You could train this whole dataset on most CPUs and GPUs.

* But larger datasets that you'll use in the future measured in gigabytes or more. It's possible to purchase more memory, but it's expensive. A Titan X GPU with 12 GB of memory costs over $1,000.

* Instead, in order to run large models on your machine, you'll learn how to use mini-batching.

### TensorFlow Mini-batching

* In order to use mini-batching, you must first divide your data into batches.

* Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)

* In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's tf.placeholder() function to receive the varying batch sizes.

* Continuing the example, if each sample had n_input = 784 features and n_classes = 10 possible labels, the dimensions for features would be [None, n_input] and labels would be [None, n_classes].

```
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])
```

**What does None do here?**

* The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

* Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.

**Question 2**
* Use the parameters below, how many batches are there, and what is the last batch size?

```
features is (50000, 400)

labels is (50000, 10)

batch_size is 128
```

In [19]:
'batches: ',50000//128

('batches: ', 390)

In [20]:
'last_batch_size', 0.625*128

('last_batch_size', 80.0)

**Question 3**

* Implement the `batches` function to batch `features` and `labels`. The function should return each batch with a maximum size of `batch_size`. To help you with the quiz, look at the following example output of a working `batches` function.


```
# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]

# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

example_batches = batches(3, example_features, example_labels)
```

The **`example_batches`** *variable* would be the following:

```
[
    # 2 batches:
    #   First is a batch of size 3.
    #   Second is a batch of size 1
    [
        # First Batch is size 3
        [
            # 3 samples of features.
            # There are 4 features per sample.
            ['F11', 'F12', 'F13', 'F14'],
            ['F21', 'F22', 'F23', 'F24'],
            ['F31', 'F32', 'F33', 'F34']
        ], [
            # 3 samples of labels.
            # There are 2 labels per sample.
            ['L11', 'L12'],
            ['L21', 'L22'],
            ['L31', 'L32']
        ]
    ], [
        # Second Batch is size 1.
        # Since batch size is 3, there is only one sample left from the 4 samples.
        [
            # 1 sample of features.
            ['F41', 'F42', 'F43', 'F44']
        ], [
            # 1 sample of labels.
            ['L41', 'L42']
        ]
    ]
]
```

<span style="color:red"></span>.

In [21]:
import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    output = []
    for start in range(0,len(features),batch_size):
        end = start + batch_size
        minibatch = [features[start:end],labels[start:end]]
        output.append(minibatch)
        
    return output

* Let's use mini-batching to feed batches of MNIST features and labels into a linear model.

* Set the batch size and run the optimizer over all the batches with the batches function. The recommended batch size is 128. If you have memory restrictions, feel free to make it smaller.

In [22]:
import math
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# TODO: Set batch size
batch_size = 32
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(accuracy,
                             feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))


Extracting /datasets/ud730/mnist\train-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist\train-labels-idx1-ubyte.gz
Extracting /datasets/ud730/mnist\t10k-images-idx3-ubyte.gz
Extracting /datasets/ud730/mnist\t10k-labels-idx1-ubyte.gz
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Test Accuracy: 0.21379999816417694


### Epochs

* An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.

* The following TensorFlow code trains a model using 10 epochs.

```
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches  # Helper function created in Mini-batching section


def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))
```

**Running the code will output the following:**

```
Epoch: 0    - Cost: 11.0     Valid Accuracy: 0.204
Epoch: 1    - Cost: 9.95     Valid Accuracy: 0.229
Epoch: 2    - Cost: 9.18     Valid Accuracy: 0.246
Epoch: 3    - Cost: 8.59     Valid Accuracy: 0.264
Epoch: 4    - Cost: 8.13     Valid Accuracy: 0.283
Epoch: 5    - Cost: 7.77     Valid Accuracy: 0.301
Epoch: 6    - Cost: 7.47     Valid Accuracy: 0.316
Epoch: 7    - Cost: 7.2      Valid Accuracy: 0.328
Epoch: 8    - Cost: 6.96     Valid Accuracy: 0.342
Epoch: 9    - Cost: 6.73     Valid Accuracy: 0.36 
Test Accuracy: 0.3801000118255615
```
Each epoch attempts to move to a lower cost, leading to better accuracy.

This model continues to improve accuracy up to Epoch 9. Let's increase the number of epochs to 100.

```
...
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.86
Epoch: 80   - Cost: 0.11     Valid Accuracy: 0.869
Epoch: 81   - Cost: 0.109    Valid Accuracy: 0.869
....
Epoch: 85   - Cost: 0.107    Valid Accuracy: 0.869
Epoch: 86   - Cost: 0.107    Valid Accuracy: 0.869
Epoch: 87   - Cost: 0.106    Valid Accuracy: 0.869
Epoch: 88   - Cost: 0.106    Valid Accuracy: 0.869
Epoch: 89   - Cost: 0.105    Valid Accuracy: 0.869
Epoch: 90   - Cost: 0.105    Valid Accuracy: 0.869
Epoch: 91   - Cost: 0.104    Valid Accuracy: 0.869
Epoch: 92   - Cost: 0.103    Valid Accuracy: 0.869
Epoch: 93   - Cost: 0.103    Valid Accuracy: 0.869
Epoch: 94   - Cost: 0.102    Valid Accuracy: 0.869
Epoch: 95   - Cost: 0.102    Valid Accuracy: 0.869
Epoch: 96   - Cost: 0.101    Valid Accuracy: 0.869
Epoch: 97   - Cost: 0.101    Valid Accuracy: 0.869
Epoch: 98   - Cost: 0.1      Valid Accuracy: 0.869
Epoch: 99   - Cost: 0.1      Valid Accuracy: 0.869
Test Accuracy: 0.8696000006198883
```

From looking at the output above, you can see the model doesn't increase the validation accuracy after epoch 80. Let's see what happens when we increase the learning rate.

```
learn_rate = 0.1

Epoch: 76   - Cost: 0.214    Valid Accuracy: 0.752
Epoch: 77   - Cost: 0.21     Valid Accuracy: 0.756
Epoch: 78   - Cost: 0.21     Valid Accuracy: 0.756
...
Epoch: 85   - Cost: 0.207    Valid Accuracy: 0.756
Epoch: 86   - Cost: 0.209    Valid Accuracy: 0.756
Epoch: 87   - Cost: 0.205    Valid Accuracy: 0.756
Epoch: 88   - Cost: 0.208    Valid Accuracy: 0.756
Epoch: 89   - Cost: 0.205    Valid Accuracy: 0.756
Epoch: 90   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 91   - Cost: 0.207    Valid Accuracy: 0.756
Epoch: 92   - Cost: 0.204    Valid Accuracy: 0.756
Epoch: 93   - Cost: 0.206    Valid Accuracy: 0.756
Epoch: 94   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 95   - Cost: 0.2974   Valid Accuracy: 0.756
Epoch: 96   - Cost: 0.202    Valid Accuracy: 0.756
Epoch: 97   - Cost: 0.2996   Valid Accuracy: 0.756
Epoch: 98   - Cost: 0.203    Valid Accuracy: 0.756
Epoch: 99   - Cost: 0.2987   Valid Accuracy: 0.756
Test Accuracy: 0.7556000053882599
```

* Looks like the learning rate was increased too much. The final accuracy was lower, and it stopped improving earlier. Let's stick with the previous learning rate, but change the number of epochs to 80.

```
Epoch: 65   - Cost: 0.122    Valid Accuracy: 0.868
Epoch: 66   - Cost: 0.121    Valid Accuracy: 0.868
Epoch: 67   - Cost: 0.12     Valid Accuracy: 0.868
Epoch: 68   - Cost: 0.119    Valid Accuracy: 0.868
Epoch: 69   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 70   - Cost: 0.118    Valid Accuracy: 0.868
Epoch: 71   - Cost: 0.117    Valid Accuracy: 0.868
Epoch: 72   - Cost: 0.116    Valid Accuracy: 0.868
Epoch: 73   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 74   - Cost: 0.115    Valid Accuracy: 0.868
Epoch: 75   - Cost: 0.114    Valid Accuracy: 0.868
Epoch: 76   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 77   - Cost: 0.113    Valid Accuracy: 0.868
Epoch: 78   - Cost: 0.112    Valid Accuracy: 0.868
Epoch: 79   - Cost: 0.111    Valid Accuracy: 0.868
Epoch: 80   - Cost: 0.111    Valid Accuracy: 0.869
Test Accuracy: 0.86909999418258667
```

* The accuracy only reached **0.86**, but that could be because the learning rate was too high. Lowering the learning rate would require more epochs, but could ultimately achieve better accuracy.

* In the upcoming TensorFLow Lab, you'll get the opportunity to choose your own learning rate, epoch count, and batch size to improve the model's accuracy.

# MNIST NOTEBOOK LAB

## TensorFlow Neural Network Lab

#### TensorFlow Lab

* We've prepared a Jupyter notebook that will guide you through the process of creating a single layer neural network in TensorFlow. You'll implement data normalization, then build and train the network with TensorFlow.

![alt-text](img/nmn.png "not so mnist")

#### Getting the notebook
* The notebook and all related files are available from our GitHub repository. Either clone the repository or download it as a Zip file.

* Use `Git` to `clone` the repository.

`git clone https://github.com/udacity/deep-learning.git`
* If you're unfamiliar with Git and GitHub, I highly recommend checking out our course. If you'd rather not use Git, you can download the repository as a Zip archive. You can find the repo here.

![alt-text](img/download-repo.png "repo")

* Download the repository contents as a Zip file using the green button on the top right.

* If you download the Zip file, be sure to extract it (usually just double clicking). The most recent versions of all our code will be available from the repository, so it's the best place to get up-to-date files.

* Once you have the repo cloned or downloaded, change directories into the repo, then the intro-to-tensorflow directory. In there you'll find the lab notebook, as well as Conda environment files for installing all the necessary packages.

### Windows Instructions
* We've provided a Conda environment file for you to easily install all the necessary packages. In the intro-to-tensorflow directory, enter

`conda env create -f environment_win.yml`
* This will create an environment called dlnd-tf-lab. You can enter the environment with the command

`activate` dlnd-tf-lab
* All the necessary packages should be installed for you.

`OS X and Linux Instructions`
* We've provided a Conda environment file for you to easily install all the necessary packages. In the intro-to-tensorflow directory, enter

* `conda env create -f environment.yml`
* This will create an environment called dlnd-tf-lab. You can enter the environment with the command

* `source activate dlnd-tf-lab`
* All the necessary packages should be installed for you.

**View The Notebook``
In the directory with the notebook file, start your `Jupyter notebook` server

`jupyter notebook`
* This should open a browser window for you. If it doesn't, go to http://localhost:8888/tree. Although, the port number might be different if you have other notebook servers running, so try 8889 instead of 8888 if you can't find the right server.

* You should see the notebook *intro_to_tensorflow.ipynb*, this is the notebook you'll be working on. The notebook has 3 problems for you to solve:

* **Problem 1**: Normalize the features
* **Problem 2**: Use `TensorFlow` operations to create features, labels, weight, and biases tensors
* **Problem 3**: Tune the learning rate, number of steps, and batch size for the best accuracy
* This is a self-assessed lab. Compare your answers to the solutions here. If you have any difficulty completing the lab, Udacity provides a few services to answer any questions you might have.

* Help
* Remember that you can get assistance from your mentor, the Forums (click the link on the left side of the classroom), or the Slack channel. You can also review the concepts from the previous lessons.

### Multilayer Neural Networks
* In the previous lessons and the lab, you learned how to build a neural network of one layer. Now, you'll learn how to build multilayer neural networks with TensorFlow. Adding a hidden layer to a network allows it to model more complex functions. Also, using a non-linear activation function on the hidden layer lets it model non-linear functions.

![alt-text](img/two-layer-network.png "network")

* The first thing we'll learn to implement in TensorFlow is ReLU hidden layer. A ReLU is a non-linear function, or rectified linear unit. The ReLU function is 0 for negative inputs and xx for all inputs x >0x>0.

* As before, the following nodes will build up on the knowledge from the Deep Neural Networks lesson. If you need to refresh your mind, you can go back and watch them again.

* ReLU
* Feedforward
* Dropout

![alt-text](img/relu-network.png "relu")

### *TensorFlow ReLUs*
* TensorFlow provides the ReLU function as tf.nn.relu(), as shown below.

### Hidden Layer with ReLU activation function
```
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)
```
The above code applies the `tf.nn.relu()` function to the hidden_layer, effectively turning off any negative weights and acting like an on/off switch. Adding additional layers, like the output layer, after an activation function turns the model into a nonlinear function. This nonlinearity allows the network to solve more complex problems.

**Quiz**
Below you'll use the `ReLU` function to turn a linear single layer network into a non-linear multilayer network.



In [23]:
# Solution is available in the other "solution.py" tab
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], 
                        [-1.0, -2.0, -3.0, -4.0],
                        [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model
tensor_convolve = tf.add(tf.matmul(features, weights[0]), biases[0])
tensor_relu = tf.nn.relu(tensor_convolve)
tensor_convolve = tf.add(tf.matmul(tensor_relu, weights[1]), biases[1])


# TODO: Print session results
with tf.Session() as session:
    session.run(tf.global_variables_initializer()) 
    output = session.run(tensor_convolve)
    print(output)

[[ 5.1099997  8.44     ]
 [ 0.         0.       ]
 [24.010002  38.24     ]]


### Deep Neural Network in TensorFlow
* You've seen how to build a logistic classifier using TensorFlow. Now you're going to see how to use the logistic classifier to build a deep neural network.

* Step by Step

* In the following walkthrough, we'll step through TensorFlow code written to classify the letters in the MNIST database. If you would like to run the network on your computer, the file is provided [here](https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58a61a3a_multilayer-perceptron/multilayer-perceptron.zip "Aymeric Damien's GitHub repository"). You can find this and many more examples of TensorFlow at Aymeric Damien's GitHub repository.

* `Code`
```
TensorFlow MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)
```
* You'll use the MNIST dataset provided by TensorFlow, which batches and One-Hot encodes the data for you.

#### Learning Parameters
```
import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

```
* The focus here is on the architecture of multilayer neural networks, not parameter tuning, so here we'll just give you the learning parameters.

#### Hidden Layer Parameters
`n_hidden_layer = 256 # layer number of features`

* The variable n_hidden_layer determines the size of the hidden layer in the neural network. This is also known as the width of a layer.

#### Weights and Biases
```
# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

```

* Deep neural networks use multiple layers with each layer requiring it's own weight and bias. The 'hidden_layer' weight and bias is for the hidden layer. The 'out' weight and bias is for the output layer. If the neural network were deeper, there would be weights and biases for each additional layer.

#### Input

```
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

```
* The MNIST data is made up of 28px by 28px images with a single channel. The tf.reshape() function above reshapes the 28px by 28px matrices in x into row vectors of 784px.

#### Multilayer Perceptron

![alt-text](img/multi-layer.png "MLP")

```
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])
```
* You've seen the linear function `tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])` before, also known as `xw + b`. Combining linear functions together using a `ReLU` will give you a two layer network.

#### Optimizer
```
# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

```
* This is the same optimization technique used in the Intro to TensorFLow lab.

#### Session

```
# Initializing the variables
init = tf.global_variables_initializer()


# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})


```

* The MNIST library in TensorFlow provides the ability to receive the dataset in batches. Calling the `mnist.train.next_batch()` function returns a subset of the training data.

#### Deeper Neural Network

![alt-text](img/layers.png "deeper learning")

That's it! Going from one layer to two is easy. Adding more layers to the network allows you to solve more complicated problems.

In [24]:
# TensorFlow MNIST
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Hidden Layer Parameters
n_hidden_layer = 256 # layer number of features

# Weights and Biases - Store layers weight & bias

weights = {'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))}

biases = {'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))}

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

# The MNIST data is made up of 28px by 28px images with a single channel. 
# The tf.reshape() function above reshapes the 28px by 28px matrices 
# in x into row vectors of 784px.

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

# Optimizer
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()


# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})


Extracting .\train-images-idx3-ubyte.gz
Extracting .\train-labels-idx1-ubyte.gz
Extracting .\t10k-images-idx3-ubyte.gz
Extracting .\t10k-labels-idx1-ubyte.gz


### Save and Restore TensorFlow Models
* Training a model can take hours. But once you close your TensorFlow session, you lose all the trained weights and biases. If you were to reuse the model in the future, you would have to train it all over again!

* Fortunately, TensorFlow gives you the ability to save your progress using a class called tf.train.Saver. This class provides the functionality to save any tf.Variable to your file system.

### Saving Variables
* Let's start with a simple example of saving weights and bias Tensors. For the first example you'll just save two variables. Later examples will save all the weights in a practical model.

    `import tensorflow as tf`

    * The file path to save the data
    `save_file = './model.ckpt'`

    * Two Tensor Variables: weights and bias
    ```
    weights = tf.Variable(tf.truncated_normal([2, 3]))
    bias = tf.Variable(tf.truncated_normal([3]))

    # Class used to save and/or restore Tensor Variables
    saver = tf.train.Saver()

    with tf.Session() as sess:
        # Initialize all the Variables
        sess.run(tf.global_variables_initializer())

        # Show the values of weights and bias
        print('Weights:')
        print(sess.run(weights))
        print('Bias:')
        print(sess.run(bias))

        # Save the model
        saver.save(sess, save_file)
    ```
    ```
    * Weights:

    [[-0.97990924 1.03016174 0.74119264]

    [-0.82581609 -0.07361362 -0.86653847]]

    Bias:

    [ 1.62978125 -0.37812829 0.64723819]
    ```
* The Tensors weights and bias are set to random values using the tf.truncated_normal() function. The values are then saved to the save_file location, "model.ckpt", using the tf.train.Saver.save() function. (The ".ckpt" extension stands for "checkpoint".)

* *If you're using TensorFlow 0.11.0RC1 or newer, a file called "model.ckpt.meta" will also be created. This file contains the TensorFlow graph.*

    ```
    ### Loading Variables
    * Now that the Tensor Variables are saved, let's load them back into a new model.

    ### Remove the previous weights and bias
    tf.reset_default_graph()

    ### Two Variables: weights and bias
    weights = tf.Variable(tf.truncated_normal([2, 3]))
    bias = tf.Variable(tf.truncated_normal([3]))
    # Class used to save and/or restore Tensor Variables
    saver = tf.train.Saver()

    with tf.Session() as sess:
        # Load the weights and bias
        saver.restore(sess, save_file)

        # Show the values of weights and bias
        print('Weight:')
        print(sess.run(weights))
        print('Bias:')
        print(sess.run(bias))
    ```


    ```
    Weights:

    [[-0.97990924 1.03016174 0.74119264]

    [-0.82581609 -0.07361362 -0.86653847]]

    Bias:

    [ 1.62978125 -0.37812829 0.64723819]
    ```

* You'll notice you still need to create the weights and bias Tensors in Python. The tf.train.Saver.restore() function loads the saved data into weights and bias.

* Since `tf.train.Saver.restore()` sets all the TensorFlow Variables, you don't need to call `tf.global_variables_initializer()`.

### Save a Trained Model
* Let's see how to train a model and save its weights.

* First start with a model:
    ```
    # Remove previous Tensors and Operations
    tf.reset_default_graph()

    from tensorflow.examples.tutorials.mnist import input_data
    import numpy as np

    learning_rate = 0.001
    n_input = 784  # MNIST data input (img shape: 28*28)
    n_classes = 10  # MNIST total classes (0-9 digits)
    ```

    ```
    # Import MNIST data
    mnist = input_data.read_data_sets('.', one_hot=True)

    # Features and Labels
    features = tf.placeholder(tf.float32, [None, n_input])
    labels = tf.placeholder(tf.float32, [None, n_classes])

    # Weights & bias
    weights = tf.Variable(tf.random_normal([n_input, n_classes]))
    bias = tf.Variable(tf.random_normal([n_classes]))
    ```

    ```
    # Logits - xW + b
    logits = tf.add(tf.matmul(features, weights), bias)

    # Define loss and optimizer
    cost = tf.reduce_mean(\
        tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
        .minimize(cost)
    ```

    ```
        # Calculate accuracy
        correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    ```
* Let's train that model, then save the weights:
    ```
    import math

    save_file = './train_model.ckpt'
    batch_size = 128
    n_epochs = 100

    saver = tf.train.Saver()

    # Launch the graph
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        # Training cycle
        for epoch in range(n_epochs):
            total_batch = math.ceil(mnist.train.num_examples / batch_size)

            # Loop over all batches
            for i in range(total_batch):
                batch_features, batch_labels = mnist.train.next_batch(batch_size)
                sess.run(
                    optimizer,
                    feed_dict={features: batch_features, labels: batch_labels})

            # Print status for every 10 epochs
            if epoch % 10 == 0:
                valid_accuracy = sess.run(
                    accuracy,
                    feed_dict={
                        features: mnist.validation.images,
                        labels: mnist.validation.labels})
                print('Epoch {:<3} - Validation Accuracy: {}'.format(
                    epoch,
                    valid_accuracy))
    ```
    ```
        # Save the model
        saver.save(sess, save_file)
        print('Trained Model Saved.')
    Epoch 0 - Validation Accuracy: 0.06859999895095825

    Epoch 10 - Validation Accuracy: 0.20239999890327454

    Epoch 20 - Validation Accuracy: 0.36980000138282776

    Epoch 30 - Validation Accuracy: 0.48820000886917114

    Epoch 40 - Validation Accuracy: 0.5601999759674072

    Epoch 50 - Validation Accuracy: 0.6097999811172485

    Epoch 60 - Validation Accuracy: 0.6425999999046326

    Epoch 70 - Validation Accuracy: 0.6733999848365784

    Epoch 80 - Validation Accuracy: 0.6916000247001648

    Epoch 90 - Validation Accuracy: 0.7113999724388123

    Trained Model Saved.

    Load a Trained Model
    ```

* Let's load the weights and bias from memory, then check the test accuracy.

    ```
    saver = tf.train.Saver()

    # Launch the graph
    with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))
```
 **`Test Accuracy: 0.7229999899864197`**

* That's it! You now know how to save and load a trained model in TensorFlow. Let's look at loading weights and biases into modified models in the next section.

In [25]:
import tensorflow as tf
save_file = './model.ckpt'

weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    sess.run(tf.global_variables_initializer())
    # Show the values of weights and bias
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    saver.save(sess, save_file)

Weights:
[[ 1.0397928   0.40142336 -0.09841327]
 [-1.2256839  -0.67117643  0.4775671 ]]
Bias:
[ 0.6085107   0.2436265  -0.35508323]


In [26]:
# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

INFO:tensorflow:Restoring parameters from ./model.ckpt


InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2,3] rhs shape= []
	 [[Node: save/Assign = Assign[T=DT_FLOAT, _class=["loc:@Variable"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Variable, save/RestoreV2/_3)]]

Caused by op 'save/Assign', defined at:
  File "F:\python\anaconda\envs\aind\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "F:\python\anaconda\envs\aind\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\ksqua\AppData\Roaming\Python\Python36\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "F:\python\anaconda\envs\aind\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "C:\Users\ksqua\AppData\Roaming\Python\Python36\site-packages\ipykernel\kernelapp.py", line 486, in start
    self.io_loop.start()
  File "F:\python\anaconda\envs\aind\lib\site-packages\tornado\platform\asyncio.py", line 127, in start
    self.asyncio_loop.run_forever()
  File "F:\python\anaconda\envs\aind\lib\asyncio\base_events.py", line 422, in run_forever
    self._run_once()
  File "F:\python\anaconda\envs\aind\lib\asyncio\base_events.py", line 1432, in _run_once
    handle._run()
  File "F:\python\anaconda\envs\aind\lib\asyncio\events.py", line 145, in _run
    self._callback(*self._args)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tornado\ioloop.py", line 759, in _run_callback
    ret = callback()
  File "F:\python\anaconda\envs\aind\lib\site-packages\tornado\stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "F:\python\anaconda\envs\aind\lib\site-packages\zmq\eventloop\zmqstream.py", line 536, in <lambda>
    self.io_loop.add_callback(lambda : self._handle_events(self.socket, 0))
  File "F:\python\anaconda\envs\aind\lib\site-packages\zmq\eventloop\zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "F:\python\anaconda\envs\aind\lib\site-packages\zmq\eventloop\zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "F:\python\anaconda\envs\aind\lib\site-packages\zmq\eventloop\zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tornado\stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\ksqua\AppData\Roaming\Python\Python36\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "C:\Users\ksqua\AppData\Roaming\Python\Python36\site-packages\ipykernel\kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "C:\Users\ksqua\AppData\Roaming\Python\Python36\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "C:\Users\ksqua\AppData\Roaming\Python\Python36\site-packages\ipykernel\ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "C:\Users\ksqua\AppData\Roaming\Python\Python36\site-packages\ipykernel\zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "F:\python\anaconda\envs\aind\lib\site-packages\IPython\core\interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "F:\python\anaconda\envs\aind\lib\site-packages\IPython\core\interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "F:\python\anaconda\envs\aind\lib\site-packages\IPython\core\interactiveshell.py", line 2903, in run_ast_nodes
    if self.run_code(code, result):
  File "F:\python\anaconda\envs\aind\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-26-2a9b38b3cc9b>", line 9, in <module>
    saver = tf.train.Saver()
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\training\saver.py", line 1338, in __init__
    self.build()
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\training\saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\training\saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\training\saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\training\saver.py", line 494, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\training\saver.py", line 185, in restore
    self.op.get_shape().is_fully_defined())
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\ops\state_ops.py", line 283, in assign
    validate_shape=validate_shape)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 63, in assign
    use_locking=use_locking, name=name)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\framework\ops.py", line 3392, in create_op
    op_def=op_def)
  File "F:\python\anaconda\envs\aind\lib\site-packages\tensorflow\python\framework\ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2,3] rhs shape= []
	 [[Node: save/Assign = Assign[T=DT_FLOAT, _class=["loc:@Variable"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Variable, save/RestoreV2/_3)]]


In [None]:
# Remove previous Tensors and Operations
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [None]:
import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')


In [None]:
saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

### FInetuning

#### Loading the Weights and Biases into a New Model
* Sometimes you might want to adjust, or "finetune" a model that you have already trained and saved.

* However, loading saved Variables directly into a modified model can generate errors. Let's go over how to avoid these problems.

* Naming Error
     * TensorFlow uses a string identifier for Tensors and Operations called name. 
     * If a name is not given, TensorFlow will create one automatically. 
     * TensorFlow will give the first node the name `<Type>`, and then give the name `<Type>_<number>` for the subsequent nodes. 
     
**Let's see how this can affect loading a model with a different order of weights and bias:**

In [None]:
import tensorflow as tf

# Remove the previous weights and bias
tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - ERROR
    saver.restore(sess, save_file)

The code above prints out the following:

Save Weights: Variable:0

Save Bias: Variable_1:0

Load Weights: Variable_1:0

Load Bias: Variable:0

...

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match.

...

You'll notice that the name properties for weights and bias are different than when you saved the model. This is why the code produces the "Assign requires shapes of both tensors to match" error. The code saver.restore(sess, save_file) is trying to load weight data into bias and bias data into weights.

Instead of letting TensorFlow set the name property, let's set it manually:

In [None]:
import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

Save Weights: weights_0:0

Save Bias: bias_0:0

Load Weights: weights_0:0

Load Bias: bias_0:0

Loaded Weights and Bias successfully.

That worked! The Tensor names match and the data loaded correctly.

### TensorFlow Dropout

![alt-text](img/dropout-node.jpeg "Dropout")

* Figure 1: [Taken from the paper "Dropout: A Simple Way to Prevent Neural Networks from Overfitting" ](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

* Dropout is a regularization technique for reducing overfitting. The technique temporarily drops units (artificial neurons) from the network, along with all of those units' incoming and outgoing connections. Figure 1 illustrates how dropout works.

* TensorFlow provides the `tf.nn.dropout()` function, which you can use to implement dropout.

* Let's look at an example of how to use `tf.nn.dropout()`.

```

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])
The code above illustrates how to apply dropout to a neural network.

```

The `tf.nn.dropout()` function takes in two parameters:

* **hidden_layer**: the tensor to which you would like to apply dropout
* **keep_prob**: the probability of keeping (i.e. not dropping) any given unit
    keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, `tf.nn.dropout()` multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

* During training, a good starting value for keep_prob is 0.5.

* During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

**Quiz 1**
* **What's wrong with the code ?**

* Take a look at the code snippet below. Do you see what's wrong?

* There's nothing wrong with the syntax, however the test accuracy is extremely low.

```

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

```
```

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i in range(batches):
            ....

            sess.run(optimizer, feed_dict={
                features: batch_features,
                labels: batch_labels,
                keep_prob: 0.5})

    validation_accuracy = sess.run(accuracy, feed_dict={
        features: test_features,
        labels: test_labels,
        keep_prob: 0.5})
        
```
* **Answer:** The keep_prob values are too low for the testing/validation phase, where the ideal value is == 1.

* **Quiz 2**

* This quiz will be starting with the code from the ReLU Quiz and applying a dropout layer. Build a model with a ReLU layer and dropout layer using the `keep_prob` placeholder to pass in a probability of 0.5. Print the `logits` from the model.

* **Note**: Output will be different every time the code is run. This is caused by dropout randomizing the units it drops.

In [31]:
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])


# TODO: Print logits from a session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits, feed_dict={keep_prob: 0.5}))

[[ 2.98        7.5400004 ]
 [ 0.30800003  0.7700001 ]
 [38.460003   71.700005  ]]
