TensorFlow is a powerful open source software library for numerical computations. Mathematical operations are defined as graphs which are then run in optimized C++. These operations can also be run in parallel on multiple CPUs or GPUs. 

# Installation

In [None]:
# Verify that TensorFlow has been installed
import tensorflow 

print(tensorflow.__version__)

# Creating Your First Graph and Running It in a Session

In [None]:
import tensorflow as tf

x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

Note: the code above does **not** perform any computation, all it does is create a graph. A tensorflow *session* must be created in order to initalize the variables and compute anything. A session takes care of splitting up all the work onto various computation devices on your computer (CPU, GPU). Here's an example:

In [None]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)

This can also be done in the following way:

In [None]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
    print(result)

When using the *with* keyword, the code above is equivalent to asking tensorflow what the deafult session is and then running each of those operations. Lastly, the *with* keyword also automatically closes the session at the end of the block.

Here's a quicker way to initalize variables (note that this method does not initialize them immediately, it only does so when the variables are needed):

In [None]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
    print(result)

Lastly, there is also an interactive session that does away with the *with* block. However, you must manually close it when done.

In [None]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)
sess.close()

# Managing Graphs

Any node you create is automatically added to the graph. To add it to another graph, use a *with* block:

In [None]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

print(x2.graph is graph, x2.graph is tf.get_default_graph())

# Lifecycle of a Node Value

Tensorflow automatically identifies the set of nodes that current node needs to be evaluated. For example consider the following:

In [None]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval(), z.eval())  

Tensorflow evaluates $w$ and $x$ first before $z$. However it does not reuse the result of evaluating $w$ and $x$ for $y$ so they must be evaluated twice to evaluate both $z$ and $y$. All node values are dropped in between graph runs but variables live on until the end of the session. 

To make the evaluation of $z$ and $y$ more efficient, ask tensorflow to evaluate them at the same time:

In [None]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val, z_val)  

# Linear Regression with TensorFlow

All inputs and outputs to tensorflow are multidimensional arrays called **tensors**. To see this, refer to the code below which solves the Normal Equation (from the Linear Regression chapter) using tensorflow operations:

In [None]:
import numpy as np
from sklearn.datasets import fetch_california_housing

def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()
    
print(theta_value)

# Implementing Gradient Descent

## *Manually Computing the Gradients*

Notes on the code below:

1. The random_uniform() function creates a node that acts much like NumPy's rand() function.
2. The assign() function crates a node that assigns a new value to a variable (in this case it updates the weight matrix)
3. The main loop repeats n_epoch times and prints the current MSE every 100 iterations.

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

## *Using autodiff*

The above code relies on the user knowing how to derive the derviative of the cost function. When the user is unable to manually derive the derivative of a function, tensorflow can do it for you using the autodiff feature. Here's an example:

In [None]:
gradients = tf.gradients(mse, [theta])[0]

The gradients function takse in an op as the first argument and a list of variables as the second. It returns a list of ops that represent the derivative with respect to each variable.  

## *Using an Optimizer*

TensorFlow provides a few optimizers out of the box. One example is the Gradient Descent optimizer: 

In [None]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

# Feeding Data to the Training Algorithm

In order to use the preceeding code to implement Mini-batch Gradient Descent we will need to use *placeholder* nodes (called via the $placeholder()$ function). These nodes are used by TensorFlow to output data at runtime. Here's an example of this feature:

In [None]:
reset_graph()

A = tf.placeholder(tf.float32, shape=(None, 3)) # Here 'None' means we accept any number of rows 
                                                # but it must have 3 columns
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

print("B_val_1: ", B_val_1)
print("B_val_2: ", B_val_2)

To implement Mini-batch gradient descent, modify the following lines:

In [None]:
X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

Define the batch size and the number of iterations:

In [None]:
batch_size = 100
n_batches = int(np.ceil(m/batch_size))

Lastly, make sure that tensorflow retreives the data from the mini-batches during execution:

In [None]:
def fetch_batch(epoch, batch_index, batch_size):
    # Load data specified by params
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    
    best_theta = theta.eval()

## *Saving and Restoring Models*

To save and restore a model, create a *saver* node at the end of the construction phase and call its $save()$ function whenever you want to create a checkpoint. Here's an example: 

In [None]:
# Variable Initalizations 
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)
    
    # Computations 
    
    save_path = saver.save(sess, "path/to_file/my_model_final.ckpt")

Here's an example of recalling the model:

In [None]:
# Variable Initalizations 

with tf.Session() as sess:
    sess.restore(sess, "path/to_file/my_model_final.ckpt")
    
    # Computations 

It is also possible to save only specific variables:

In [None]:
saver = tf.train.Saver({"weights": theta})

# Visualizing the Graph and Training Curves Using TensorBoard

With TensorFlow we no longer have to rely on just the $print()$ function, we can use TensorBoard. However, you must explicitly tell TensorFlow what information to save and where to save it. Save each run's information in a different directory or else TensorFlow will merge the information automatically (add a timestamp to solve this problem automatically).

To use TensorBoard, you must provide a path to the save-file:

In [None]:
from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

Next, you must tell TensorFlow which data to save:

In [None]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

Next update the log file every 10 or 100 iterations:

In [None]:
if batch_index % 10 == 0:
    summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})

Lastly close the file writer:

In [None]:
file_writer.close()

To actually see this file you must activate the TensorBoard server and point it to the directory where your log files are stored. By default, the sever should be accessible on port 6006 of your local host. 

# Name Scopes

You can name groups of nodes in order to better orignize them:

In [None]:
with tf.name_scopes("some_name") as scope:
    # Node definitions #

# Modularity 

TensorFlow has some features built in to help you keep your code repetition free. For example, the following code is very repetative:

In [None]:
reset_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z2, 0., name="relu2")  # Oops, cut&paste error! Did you spot it?

output = tf.add(relu1, relu2, name="output")

TensorFlow can help you reduce repition in this situation by automatically appending index numbers to the names of nodes when they are created in a loop: 

In [None]:
reset_graph()

def relu(X):
    with tf.name_scope("relu")
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

In the code above, TensorFlow will append an underscore and the index number to each name to keep the name unique (it will also do the same for each name scope). 

# Sharing Variables

The $get\_variable()$ function allows you to share variables across scopes. Simply tell a name scope to reuse variables (by setting the hyperparameter $reuse=True$) and tensorflow will make sure that each call to the $get\_variable()$ returns the same instance. 

# Exercises