# Tensorflow Tutorials 

In this jupyter notebook we will go therough 14 different topics:
1. **Installing Tensorflow on your machine**
2. **Hello world example**
3. **Tensors**
4. **Session**
5. **Linear functions**
6. **Softmax function**
7. **Cross Entropy**
8. **Mini-batching**
9. **Epochs**
10. **Recurrent Linear Units (ReLUs)**
11. **Deep Neural Network**
12. **Save and Restore TensorFlow Models**
13. **Finetune**
14. **Dropout regularization**

Before learning about each of them let's import all of the libraries that we are going to use here:

In [1]:
# Import the libraries
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import math

<br>

## 1. Install Tensorflow

<hr>


### 1.1. Windows

Run the following commands to setup your environment:

1. <code>conda create -n tensorflow python=3.5</code>
2. <code>activate tensorflow</code>
3. <code>conda install pandas matplotlib jupyter notebook scipy scikit-learn</code>
4. <code>pip install tensorflow</code>

<br>

### 1.2. OS X or Linux

As usual, we'll be using Conda to install TensorFlow. You might already have a TensorFlow environment, but check to make sure you have all the necessary packages.

Run the following commands to setup your environment:

1. <code>conda create -n tensorflow python=3.5</code>
2. <code>source activate tensorflow</code>
3. <code>conda install pandas matplotlib jupyter notebook scipy scikit-learn</code>
4. <code>pip install tensorflow</code>

<br>

## 2. Hello World

<hr>

Now let's write a tensorflow code for printing "hello world".

In [2]:
# Create a constant tensor 
hello_constant = tf.constant('Hello World!')

In [3]:
# Create a Session and run the tensor inside of it
with tf.Session() as sess:
    output = sess.run(hello_constant)
    print(output)

b'Hello World!'


<br>

## 3. Tensor

<hr>

In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a tensor. In the case of <code>hello_constant = tf.constant('Hello World!')</code>, hello_constant is a 0-dimensional string tensor, but tensors come in a variety of sizes as shown below:

In [5]:
# Create a 0-dimensional constant tensor 
A = tf.constant(1234) 

# Create a Session and run the tensor inside of it
with tf.Session() as sess:
    output = sess.run(A)
    print(output)

1234


In [6]:
# Create a 1-dimensional constant tensor 
B = tf.constant([123,456,789]) 

# Create a Session and run the tensor inside of it
with tf.Session() as sess:
    output = sess.run(B)
    print(output)

[123 456 789]


In [8]:
# Create a 2-dimensional constant tensor 
C = tf.constant([[123,456,789], [222,333,444]])

# Create a Session and run the tensor inside of it
with tf.Session() as sess:
    output = sess.run(C)
    print(output)

[[123 456 789]
 [222 333 444]]


<code>tf.constant()</code> is one of many TensorFlow operations you will use in this lesson. The tensor returned by <code>tf.constant()</code> is called a constant tensor, because the value of the tensor never changes.

<br>

## 4. Session

<hr>

TensorFlow’s api is built around the idea of a computational graph, a way of visualizing a mathematical process which you learned about in the MiniFlow lesson. Let’s take the TensorFlow code you ran and turn that into a graph:

<img width="500px" src="assets/session.png">

A "TensorFlow Session", as shown above, is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines. Let’s see how you use it.

In [9]:
# Create a constant tensor 
hello_constant = tf.constant('Hello World!')

# Create a Session and run the tensor inside of it
with tf.Session() as sess:
    output = sess.run(hello_constant)
    print(output)

b'Hello World!'


he code has already created the tensor, <code>hello_constant</code>, from the previous lines. The next step is to evaluate the tensor in a session.

The code creates a session instance, <code>sess</code>, using <code>tf.Session</code>. The <code>sess.run()</code> function then evaluates the tensor and returns the results.

<br>

## 5. Linear functions

<hr>

The most common operation in neural networks is calculating the linear combination of inputs, weights, and biases. As a reminder, we can write the output of the linear operation as 

<code style="background:white; font-size:24px; display:flex; margin-left:450px; margin-right:45%;">y=Wx+b</code>

Here, W is a matrix of the weights connecting two layers. The output y, the input x, and the biases b are all vectors.

The goal of training a neural network is to modify weights and biases to best predict the labels. In order to use weights and bias, you'll need a Tensor that can be modified. This leaves out tf.placeholder() and tf.constant(), since those Tensors can't be modified. This is where tf.Variable class comes in.

**tf.Variable()**

In [6]:
# Create a variable that holds number 5
x = tf.Variable(5)

Instructions for updating:
Colocations handled automatically by placer.


The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually. You'll use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.

In [7]:
# Initialize the variables
init = tf.global_variables_initializer()

# Open a session and run the initialization in it
with tf.Session() as sess:
    sess.run(init)

The tf.global_variables_initializer() call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above. Using the tf.Variable class allows us to change the weights and bias, but an initial value needs to be chosen.

Initializing the weights with random numbers from a normal distribution is good practice. Randomizing the weights helps the model from becoming stuck in the same place every time you train it. You'll learn more about this in the next lesson, when you study gradient descent.

Similarly, choosing weights from a normal distribution prevents any one weight from overwhelming other weights. You'll use the tf.truncated_normal() function to generate random numbers from a normal distribution.

**tf.truncated_normal()**

In [8]:
# Create a variable which is wraped around a truncated normal
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

The tf.truncated_normal() function returns a tensor with random values from a normal distribution whose magnitude is no more than 2 standard deviations from the mean.

Since the weights are already helping prevent the model from getting stuck, you don't need to randomize the bias. Let's use the simplest solution, setting the bias to 0.

**tf.zeros()**

In [9]:
# Create a variable which is wraped around zeros
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

<br>

## 6. TensorFlow Softmax

<hr>

The softmax function squashes it's inputs, typically called logits or logit scores, to be between 0 and 1 and also normalizes the outputs such that they all sum to 1. This means the output of the softmax function is equivalent to a categorical probability distribution. It's the perfect function to use as the output activation for a network predicting multiple classes.

<img width="400px" src="assets/softmax-input-output.png">

We're using TensorFlow to build neural networks and, appropriately, there's a function for calculating softmax.

In [13]:
# Apply softmax function to the following array
values = [1.2, 0.9, 0.4]
x = tf.nn.softmax(values)

# Open a session and run the softmax in it
with tf.Session() as sess:
    output = sess.run(x)
    print(output)

[0.45659032 0.3382504  0.20515925]


Easy as that! tf.nn.softmax() implements the softmax function for you. It takes in logits and returns softmax activations.

In [14]:
# Again use the softmax function for the following array
# However this use placeholder
logit_data = [1.2, 0.9, 0.4]
logits = tf.placeholder(tf.float32)
softmax = tf.nn.softmax(logits)

# Open a session and run the softmax in it
with tf.Session() as sess:
    output = sess.run(softmax, feed_dict={logits: logit_data})
    print(output)

[0.45659032 0.3382504  0.20515925]


The tf.zeros() function returns a tensor with all zeros.

You'll be classifying the handwritten numbers 0, 1, and 2 from the MNIST dataset using TensorFlow. The above is a small sample of the data you'll be training on. Notice how some of the 1s are written with a serif at the top and at different angles. The similarities and differences will play a part in shaping the weights of the model.

In [15]:
# Write a function for returning weights
def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    return tf.Variable(tf.truncated_normal((n_features, n_labels)))


# Write a function for returning biases
def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    return tf.Variable(tf.zeros(n_labels))


# Write a function for returning linear function
def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    return tf.add(tf.matmul(input, w), b)

In [16]:
# Get the number of features (28*28 image is 784 features)
n_features = 784

# Get the number of labels
n_labels = 3

In [17]:
# Create placeholders for features and labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

In [21]:
# Get the weights and biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

In [22]:
# Get the linear function (xW + b)
logits = linear(features, w, b)

In [24]:
### Make the features and labels ready for training

# Initialize an empty array for features and labels
mnist_features = []
mnist_labels = []

# Load the MNIST dataset
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

# Iterating through features and labels
for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):

    # If it's for the first <n>th labels then add features and labels
    if mnist_label[:n_labels].any():
        mnist_features.append(mnist_feature)
        mnist_labels.append(mnist_label[:n_labels])

# Assign the features and labels for training
train_features = mnist_features
train_labels = mnist_labels

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz


In [26]:
# Session
with tf.Session() as session:
    
    # Initialize the variables
    session.run(tf.global_variables_initializer())

    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

Loss: 5.5296101570129395


<br>

## 7. Cross Entropy

<hr>

As with the softmax function, TensorFlow has a function to do the cross entropy calculations for us.


<img width="500px" src="assets/cross-entropy-diagram.png">

Let's take what you learned from the video and create a cross entropy function in TensorFlow. To create a cross entropy function in TensorFlow, you'll need to use two new functions:

- tf.reduce_sum()
- tf.log()


In [27]:
# Apply reduce_sum to the following array
values = [1, 2, 3, 4, 5]
x = tf.reduce_sum(values)  

# Open a session and run the tensor in it
with tf.Session() as sess:
    output = sess.run(x)
    print(output)

15


The tf.reduce_sum() function takes an array of numbers and sums them together.

In [28]:
# Apply log to the following value
value = 100.0
x = tf.log(value) 

# Open a session and run the tensor in it
with tf.Session() as sess:
    output = sess.run(x)
    print(output)

4.6051702


This function does exactly what you would expect it to do. tf.log() takes the natural log of a number. Now print the cross entropy using softmax_data and one_hot_encod_label

In [4]:
# Softmax and one hot encoded data
softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

# Create placeholder for softmax and one hot
softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

# Get the cross-entropy
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

# Session
with tf.Session() as sess:
    
    # Printing the cross entropy of given arrays
    print(sess.run(cross_entropy, feed_dict={softmax: softmax_data, one_hot: one_hot_data}))

0.35667497


<br>

## 8. Mini-batching

<hr>

In this section, you'll go over what mini-batching is and how to apply it in TensorFlow.

Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.

In [29]:
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)
learning_rate = 0.001

In [30]:
# Load the MNIST dataset
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz


In [32]:
# Get the features 
train_features = mnist.train.images
test_features = mnist.test.images

# Get the labels 
train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

In [33]:
# Get the weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

In order to use mini-batching, you must first divide your data into batches.

Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)

In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's tf.placeholder() function to receive the varying batch sizes.

Continuing the example, if each sample had n_input = 784 features and n_classes = 10 possible labels, the dimensions for features would be [None, n_input] and labels would be [None, n_classes].

In [14]:
# Create placeholder for features and labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

What does None do here?

The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.

Implement the batches function to batch features and labels. The function should return each batch with a maximum size of batch_size. To help you with the quiz, look at the following example output of a working batches function.

In [36]:
# Sample example for features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]

# Sample example for labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

In [39]:
# Function for creating batches
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    # Make sure the length of features and labels are the same
    assert len(features) == len(labels)
    
    # Initialize an empty array for batches
    output_batches = []
    
    # Get the lenght our dataset
    sample_size = len(features)
    
    # Loop batch times
    for start_i in range(0, sample_size, batch_size):
        
        # Get the next index
        end_i = start_i + batch_size
        
        # Create the batch
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        
        # Append the batch to output_batches array
        output_batches.append(batch)
        
    return output_batches

Let's use mini-batching to feed batches of MNIST features and labels into a linear model.

Set the batch size and run the optimizer over all the batches with the batches function. The recommended batch size is 128. If you have memory restrictions, feel free to make it smaller.

In [45]:
# Craete Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

# Define an optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Set the batch size
batch_size = 128

# Initialize the variables
init = tf.global_variables_initializer()

# Session
with tf.Session() as sess:
    
    # Run the variable initialization
    sess.run(init)
    
    # Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(accuracy, feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {:.2f}'.format(test_accuracy))

Test Accuracy: 0.70


The accuracy is low, but you probably know that you could train on the dataset more than once. You can train a model using the dataset multiple times. You'll go over this subject in the next section where we talk about "epochs".

<br>

## 9. Epochs

<hr>

An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.

The following TensorFlow code trains a model using 10 epochs.

In [46]:
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

In [47]:
# Load the MNIST dataset
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz


In [48]:
# Get the features
train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

In [49]:
# Get the labels
train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

In [50]:
# Create placeholder for features and labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

In [51]:
# Get the weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

In [52]:
# Create the logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

In [53]:
# Create a placeholder for learning rate
learning_rate = tf.placeholder(tf.float32)

# Define the loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

# Define the optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

In [54]:
# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [55]:
# Initialize the variables
init = tf.global_variables_initializer()

In [56]:
# hyper parameters
batch_size = 128
epochs = 80
learn_rate = 0.1

In [57]:
# Create batches for the training
train_batches = batches(batch_size, train_features, train_labels)

In [59]:
### Training

# Session
with tf.Session() as sess:
    
    # Run the variable initialization
    sess.run(init)

    # Loop epoch times
    for epoch_i in range(epochs):

        # Loop over all batches and get the feature and label
        for batch_features, batch_labels in train_batches:
            
            # Create a feed_dict for training optimizer
            train_feed_dict = {features: batch_features,
                               labels: batch_labels,
                               learning_rate: learn_rate}
            
            # Train the optimizer
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Get the cost
        current_cost = sess.run(cost, feed_dict={features: batch_features, labels: batch_labels})
        
        # Get the accuracy for validation set
        valid_accuracy = sess.run(accuracy, feed_dict={features: valid_features, labels: valid_labels})
        
        # Print the cost and accuracy
        print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(epoch_i,
                                                                        current_cost,
                                                                        valid_accuracy))

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(accuracy, feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Epoch: 0    - Cost: 1.4      Valid Accuracy: 0.718
Epoch: 1    - Cost: 0.97     Valid Accuracy: 0.794
Epoch: 2    - Cost: 0.813    Valid Accuracy: 0.825
Epoch: 3    - Cost: 0.728    Valid Accuracy: 0.845
Epoch: 4    - Cost: 0.675    Valid Accuracy: 0.856
Epoch: 5    - Cost: 0.639    Valid Accuracy: 0.863
Epoch: 6    - Cost: 0.613    Valid Accuracy: 0.868
Epoch: 7    - Cost: 0.591    Valid Accuracy: 0.871
Epoch: 8    - Cost: 0.572    Valid Accuracy: 0.875
Epoch: 9    - Cost: 0.554    Valid Accuracy: 0.877
Epoch: 10   - Cost: 0.539    Valid Accuracy: 0.879
Epoch: 11   - Cost: 0.525    Valid Accuracy: 0.881
Epoch: 12   - Cost: 0.511    Valid Accuracy: 0.882
Epoch: 13   - Cost: 0.499    Valid Accuracy: 0.886
Epoch: 14   - Cost: 0.487    Valid Accuracy: 0.887
Epoch: 15   - Cost: 0.476    Valid Accuracy: 0.889
Epoch: 16   - Cost: 0.466    Valid Accuracy: 0.891
Epoch: 17   - Cost: 0.456    Valid Accuracy: 0.893
Epoch: 18   - Cost: 0.446    Valid Accuracy: 0.894
Epoch: 19   - Cost: 0.437    Va

<br>

## 10. ReLUs

<hr>

TensorFlow provides the ReLU function as tf.nn.relu(), as shown below.

<code># Hidden Layer with ReLU activation function</code>

<code>hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)</code>

<code>hidden_layer = tf.nn.relu(hidden_layer)</code>

<code>output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)</code>

The above code applies the tf.nn.relu() function to the hidden_layer, effectively turning off any negative weights and acting like an on/off switch. Adding additional layers, like the output layer, after an activation function turns the model into a nonlinear function. This nonlinearity allows the network to solve more complex problems.

Below you'll use the ReLU function to turn a linear single layer network into a non-linear multilayer network.

<img width="500px" src="assets/relu-network.png">

In [6]:
# Weights for hidden layer & output
hidden_layer_weights = [[0.1, 0.2, 0.4],
                        [0.4, 0.6, 0.6],
                        [0.5, 0.9, 0.1],
                        [0.8, 0.2, 0.8]]

out_weights = [[0.1, 0.6],
               [0.2, 0.1],
               [0.7, 0.9]]

In [7]:
# Weights and biases
weights = [tf.Variable(hidden_layer_weights),
           tf.Variable(out_weights)]

biases = [tf.Variable(tf.zeros(3)),
          tf.Variable(tf.zeros(2))]

In [8]:
# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], 
                        [-1.0, -2.0, -3.0, -4.0], 
                        [11.0, 12.0, 13.0, 14.0]])

In [9]:
# 1st hidden layer
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])

# Apply RELU
hidden_layer = tf.nn.relu(hidden_layer)

# Get the output
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

In [10]:
# Session
with tf.Session() as sess:
    # Initialize the variables
    sess.run(tf.global_variables_initializer())
    
    # Print the logits
    print(sess.run(logits))

[[ 5.11      8.440001]
 [ 0.        0.      ]
 [24.010002 38.239998]]


<br>

## 11. Deep Neural Network

<hr>

You've seen how to build a logistic classifier using TensorFlow. Now you're going to see how to use the logistic classifier to build a deep neural network. In the following walkthrough, we'll step through TensorFlow code written to classify the letters in the MNIST database. 

In [3]:
# Load the dataset
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


In [4]:
# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)
n_hidden_layer = 256 # layer number of features

In [6]:
# Store layers weight & bias
weights = {'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
           'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))}
biases = {'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
          'out': tf.Variable(tf.random_normal([n_classes]))}

In [7]:
# Create placeholder for input and output
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

In [8]:
# 1st Hidden layer
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])

# Apply RELU
layer_1 = tf.nn.relu(layer_1)

In [9]:
# Output layer with linear activation
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

In [11]:
# Define the loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))

# Define the optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

In [12]:
# Initialize the variables
init = tf.global_variables_initializer()

# Session (launching the graph)
with tf.Session() as sess:
    
    # Run the variable initialization
    sess.run(init)
    
    # Loop epoch times
    for epoch in range(training_epochs):
        
        # Get the total number of batches
        total_batch = int(mnist.train.num_examples/batch_size)
        
        # Loop over batches
        for i in range(total_batch):
            
            # Get the input and output
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

That's it! Going from one layer to two is easy. Adding more layers to the network allows you to solve more complicated problems.

<br>

## 12. Save and Restore TensorFlow Models

<hr>

Training a model can take hours. But once you close your TensorFlow session, you lose all the trained weights and biases. If you were to reuse the model in the future, you would have to train it all over again!

Fortunately, TensorFlow gives you the ability to save your progress using a class called tf.train.Saver. This class provides the functionality to save any tf.Variable to your file system.

Let's start with a simple example of saving weights and bias Tensors. For the first example you'll just save two variables. Later examples will save all the weights in a practical model.

In [4]:
# Path of the saved model
save_file = './model.ckpt'

# Create two tensor variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Session
with tf.Session() as sess:
    
    # Initialize all the Variables
    sess.run(tf.global_variables_initializer())

    # Show the values of weights
    print("Weights: \n", sess.run(weights), "\n")
    
    # Show the values of bias
    print("Bias: \n", sess.run(bias))

    # Save the model
    saver.save(sess, save_file)

Weights: 
 [[ 1.0848122   0.48442674 -0.21687426]
 [-0.89066917  0.35860372 -0.05830318]] 

Bias: 
 [ 0.5093955   0.02881323 -0.9548363 ]


The Tensors weights and bias are set to random values using the tf.truncated_normal() function. The values are then saved to the save_file location, "model.ckpt", using the tf.train.Saver.save() function. (The ".ckpt" extension stands for "checkpoint".)

If you're using TensorFlow 0.11.0RC1 or newer, a file called "model.ckpt.meta" will also be created. This file contains the TensorFlow graph.

Now that the Tensor Variables are saved, let's load them back into a new model.

In [6]:
# Path of the saved model
save_file = './model.ckpt'

# Remove the previous weights and bias
tf.reset_default_graph()

# Create two tensor variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Session
with tf.Session() as sess:
    
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Show the values of weights
    print("Weights: \n", sess.run(weights), "\n")
    
    # Show the values of bias
    print("Bias: \n", sess.run(bias))

INFO:tensorflow:Restoring parameters from ./model.ckpt
Weights: 
 [[ 1.1212208  -0.5115597  -0.2281799 ]
 [ 0.9504565   0.49948028 -0.7823087 ]] 

Bias: 
 [-0.0563532  -1.6072345   0.83857113]


You'll notice you still need to create the weights and bias Tensors in Python. The tf.train.Saver.restore() function loads the saved data into weights and bias.

Since tf.train.Saver.restore() sets all the TensorFlow Variables, you don't need to call tf.global_variables_initializer().

Let's see how to train a model and save its weights.

First start with a model:

In [9]:
# Remove previous Tensors and Operations
tf.reset_default_graph()

# Parameters
learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Load the MNIST dataset
mnist = input_data.read_data_sets('.', one_hot=True)

# Create placeholder fo features and labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define the loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

# Define the optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate the accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


Let's train that model, then save the weights:

In [11]:
# Path for the saved model
save_file = './train_model.ckpt'

# Hyper parameters
batch_size = 128
n_epochs = 100

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Session
with tf.Session() as sess:
    
    # Initialize the variables
    sess.run(tf.global_variables_initializer())

    # Loop epoch times
    for epoch in range(n_epochs):
        
        # Get the total number of batches
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over batches
        for i in range(total_batch):
            
            # Get the feature and label
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            
            # Train the optimizer
            sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

        # Print validation accuracy every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(accuracy, feed_dict={features: mnist.validation.images,
                                                           labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {:.2f}'.format(epoch, valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')

Epoch 0   - Validation Accuracy: 0.18
Epoch 10  - Validation Accuracy: 0.32
Epoch 20  - Validation Accuracy: 0.46
Epoch 30  - Validation Accuracy: 0.54
Epoch 40  - Validation Accuracy: 0.59
Epoch 50  - Validation Accuracy: 0.63
Epoch 60  - Validation Accuracy: 0.66
Epoch 70  - Validation Accuracy: 0.69
Epoch 80  - Validation Accuracy: 0.70
Epoch 90  - Validation Accuracy: 0.72
Trained Model Saved.


Let's load the weights and bias from memory, then check the test accuracy.

In [13]:
# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Session
with tf.Session() as sess:
    
    # Load the weights and bias
    saver.restore(sess, save_file)

    # Get the accuracy for the test set
    test_accuracy = sess.run(accuracy, feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {:.2f}'.format(test_accuracy))

INFO:tensorflow:Restoring parameters from ./train_model.ckpt
Test Accuracy: 0.73


That's it! You now know how to save and load a trained model in TensorFlow. Let's look at loading weights and biases into modified models in the next section.

<br>

## 13. Finetune

<hr>

Sometimes you might want to adjust, or "finetune" a model that you have already trained and saved.

However, loading saved Variables directly into a modified model can generate errors. Let's go over how to avoid these problems.

TensorFlow uses a string identifier for Tensors and Operations called name. If a name is not given, TensorFlow will create one automatically. TensorFlow will give the first node the name <Type>, and then give the name <Type>_<number> for the subsequent nodes. Let's see how this can affect loading a model with a different order of weights and bias:

In [14]:
# Remove the previous weights and bias
tf.reset_default_graph()

# Path to the saved model
save_file = './model.ckpt'

# Create two tensor variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

# Session
with tf.Session() as sess:
    
    # Initialize the variables
    sess.run(tf.global_variables_initializer())
    
    # Save the model
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

# Session
with tf.Session() as sess:
    
    # Load the weights and bias - ERROR
    saver.restore(sess, save_file)

Save Weights: Variable:0
Save Bias: Variable_1:0
Load Weights: Variable_1:0
Load Bias: Variable:0
INFO:tensorflow:Restoring parameters from ./model.ckpt


InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [3] rhs shape= [2,3]
	 [[node save/Assign (defined at <ipython-input-14-703b4ac2493d>:34) ]]

Caused by op 'save/Assign', defined at:
  File "/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/anaconda3/lib/python3.6/asyncio/base_events.py", line 427, in run_forever
    self._run_once()
  File "/anaconda3/lib/python3.6/asyncio/base_events.py", line 1440, in _run_once
    handle._run()
  File "/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell
    return runner(coro)
  File "/anaconda3/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3185, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-14-703b4ac2493d>", line 34, in <module>
    saver = tf.train.Saver()
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 354, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 73, in restore
    self.op.get_shape().is_fully_defined())
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 223, in assign
    validate_shape=validate_shape)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 64, in assign
    use_locking=use_locking, name=name)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [3] rhs shape= [2,3]
	 [[node save/Assign (defined at <ipython-input-14-703b4ac2493d>:34) ]]


The code above returns out an error. You'll notice that the name properties for weights and bias are different than when you saved the model. This is why the code produces the "Assign requires shapes of both tensors to match" error. The code saver.restore(sess, save_file) is trying to load weight data into bias and bias data into weights.

Instead of letting TensorFlow set the name property, let's set it manually:

In [15]:
# Remove the previous weights and bias
tf.reset_default_graph()

# Path to the saved model
save_file = 'model.ckpt'

# Create two tensor variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

# Session
with tf.Session() as sess:
    
    # Initialize the variables
    sess.run(tf.global_variables_initializer())
    
    # Save the model
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Create two tensor variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

# Class used to save and/or restore Tensor Variables
saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

# Session
with tf.Session() as sess:
    
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

Save Weights: weights_0:0
Save Bias: bias_0:0
Load Weights: weights_0:0
Load Bias: bias_0:0
INFO:tensorflow:Restoring parameters from model.ckpt
Loaded Weights and Bias successfully.


That worked! The Tensor names match and the data loaded correctly.

<br>

## 14. Dropout

<hr>

Dropout is a regularization technique for reducing overfitting. The technique temporarily drops units (artificial neurons) from the network, along with all of those units' incoming and outgoing connections. Figure 1 illustrates how dropout works.

<img width="600px" src="assets/dropout-node.jpeg">

TensorFlow provides the tf.nn.dropout() function, which you can use to implement dropout.

Let's look at an example of how to use tf.nn.dropout().

<code>keep_prob = tf.placeholder(tf.float32) # probability to keep units
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])</code>

The code above illustrates how to apply dropout to a neural network.

The tf.nn.dropout() function takes in two parameters:

hidden_layer: the tensor to which you would like to apply dropout
keep_prob: the probability of keeping (i.e. not dropping) any given unit
keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

During training, a good starting value for keep_prob is 0.5.

During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

Now let's start with the code from the ReLU Quiz and applying a dropout layer. Build a model with a ReLU layer and dropout layer using the keep_prob placeholder to pass in a probability of 0.5. Print the logits from the model.

Note: Output will be different every time the code is run. This is caused by dropout randomizing the units it drops.



In [16]:
# Weights for hidden layer and output layer
hidden_layer_weights = [[0.1, 0.2, 0.4],
                        [0.4, 0.6, 0.6],
                        [0.5, 0.9, 0.1],
                        [0.8, 0.2, 0.8]]

out_weights = [[0.1, 0.6],
               [0.2, 0.1],
               [0.7, 0.9]]

In [17]:
# Weights and biases
weights = [tf.Variable(hidden_layer_weights),
           tf.Variable(out_weights)]

biases = [tf.Variable(tf.zeros(3)),
          tf.Variable(tf.zeros(2))]

In [18]:
# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], 
                        [0.1, 0.2, 0.3, 0.4], 
                        [11.0, 12.0, 13.0, 14.0]])

In [19]:
# Create placeholder for keep_prob
keep_prob = tf.placeholder(tf.float32)

In [21]:
# 1st hidden layer
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])

# Apply RELU activation function
hidden_layer = tf.nn.relu(hidden_layer)

# Apply dropout
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

# Get the logits
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

In [22]:
# Session
with tf.Session() as sess:
    
    # Initialize the variables
    sess.run(tf.global_variables_initializer())
    
    # Print the logits
    print(sess.run(logits, feed_dict={keep_prob: 0.5}))

[[ 1.1         6.6000004 ]
 [ 0.30800003  0.7700001 ]
 [48.020004   76.479996  ]]
