# Working  with TensorFlow

This notebook contains the basics for coding with TensorFlow

# Setup

First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures:

In [None]:
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "tensorflow"

def save_fig(fig_id, tight_layout=True):
    path = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id + ".png")
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format='png', dpi=300)

# Creating and running a graph

The  basic principle of TensorFlow is simple: 

You first define in Python a graph of computations to perfom, as lectured, and then TensorFlow takes that graph and runs it efficiently.

The following code creates the graph represented in a lectured slide, at the beginning of Chpater 14.

In [None]:
import tensorflow as tf

reset_graph() # it's better to reset graph, before create a new one.

x = tf.Variable(3, name="x") 
y = tf.Variable(4, name="y")
f = x*x*y + y + 2 

In [None]:
f

The most important thing to understand is that the code abve does
not actually perform any computation, even though it looks like it does (especially the
last line). It just creates a computation graph. 

In fact, even the variables are not initialized yet. 

To evaluate this graph, you need to open a TensorFlow session and use it
to initialize the variables and evaluate f . A TensorFlow session takes care of placing
the operations onto devices such as CPUs and GPUs and running them, and it holds
all the variable values. 

The following code creates a session, initializes the variables, and evaluates, and f then closes the session (which frees up resources).



In [None]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)

In [None]:
sess.close()

The code above has to repeat sess.run() all the time is a bit cumbersome, but there is
a better way:

In [None]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

In [None]:
result

In [None]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()

Inside the with block, as above, the session is set as the default session. Calling x.initial
izer.run() is equivalent to calling tf.get_default_session().run(x.initializer) ,
and similarly f.eval() is equivalent to calling tf.get_default_session().run(f) . This makes the code easier to read. 


Moreover,the session is automatically closed at the end of the block. Instead of manually running the initializer for every single variable, you can use the global_variables_initializer() function. Note that it does not actually perform
the initialization immediately, but rather creates a node in the graph that will initialize
all variables when it is run.

In [None]:
result

Inside Jupyter or within a Python shell you may prefer to create an InteractiveSes
sion . The only difference from a regular Session is that when an InteractiveSes
sion is created it automatically sets itself as the default session, so you don’t need a with block (but you do need to close the session manually when you are done with it):


In [None]:
init = tf.global_variables_initializer()

In [None]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)

In [None]:
sess.close()

In [None]:
result

# Managing graphs

Any node you create is automatically added to the default graph (I did not talk abou this part 
in lecture time, however, you may learn by yourself):

In [None]:
reset_graph()

x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

In most cases this is fine, but sometimes you may want to manage multiple independent graphs. You can do this by creating a new Graph and temporarily making it the
default graph inside a with block, like so:

In [None]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

x2.graph is graph

In [None]:
# In Jupyter (or in a Python shell), it is common to run the samecommands more than once 
# while you are experimenting. 
#As a result, you may end up with a default graph containing many duplicate nodes. 
# One solution is to restart the Jupyter kernel (or the Python shell), but a more convenient 
# solution is to just reset the default graph by running tf.reset_default_graph()

x2.graph is tf.get_default_graph()

Lifecycle of a Node Value:

When you evaluate a node, TensorFlow automatically determines the set of nodes
that it depends on and it evaluates these nodes first.

For example, consider the following code:

In [None]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())  # 10
    print(z.eval())  # 15

In [None]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)  # 10
    print(z_val)  # 15

First, this code defines a very simple graph. Then it starts a session and runs the
graph to evaluate y : TensorFlow automatically detects that y depends on w , which
depends on x , so it first evaluates w , then x , then y , and returns the value of y . Finally,
the code runs the graph to evaluate z . Once again, TensorFlow detects that it must
first evaluate w and x . It is important to note that it will not reuse the result of the
previous evaluation of w and x . In short, the preceding code evaluates w and x twice.

# Linear Regression

TensorFlow operations (also called ops for short) can take any number of inputs and
produce any number of outputs. For example, the addition and multiplication ops
each take two inputs and produce one output. 

Constants and variables take no input (they are called source ops). The inputs and outputs are multidimensional arrays,called tensors (hence the name “tensor flow”). 

Just like NumPy arrays, tensors have a type and a shape. In fact, in the Python API tensors are simply represented by NumPyndarrays. They typically contain floats, but you can also use them to carry strings
(arbitrary byte arrays).

For example, the following code manipulates 2D arrays to perform Linear Regression on the California housing dataset. 



## Using the Normal Equation

It starts by fetching the dataset; then it adds an extra
bias input feature (x 0 = 1) to all training instances (it does so using NumPy so it runs
immediately); then it creates two TensorFlow constant nodes, X and y , to hold this
data and the targets, and it uses some of the matrix operations provided by Tensor‐
Flow to define theta . 


These matrix functions— transpose() , matmul() , and matrix_inverse() —are self-explanatory, but as usual they do not perform any computations immediately; instead, they create nodes in the graph that will perform them when the graph is run. 

You may recognize that the definition of theta corresponds to the Normal Equation, introduced in Session 3. Finally, the code creates a session and uses it to evaluate theta .

In [None]:
# We can use Tensorflow to implement linear regression, based on its normal equation
# using "california_housing" data for training
# the, see the results (the model's parameters' value, theta)

import numpy as np
from sklearn.datasets import fetch_california_housing

reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

In [None]:
theta_value

Compare with pure NumPy

In [None]:
# Alternatively, we may use NumPy to compute theta based on the model's "Normal Equation".

X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

Compare with Scikit-Learn

In [None]:
# Certainly, we can directly use LinearRegression model given in Scikit-Learn API for ttraining
# the model without implementing the "Normal Equation",

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

## Using Batch Gradient Descent

Before you look at the following codes, please review Section 2 "Implementaing Gradient Descent with Tensorflow" of Chapter 14 in my lecture-slides.

Gradient Descent requires scaling the feature vectors first. We could do this using TF, but let's just use Scikit-Learn for now.

In [None]:
# scaling the feature vectors first

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [None]:
print(scaled_housing_data_plus_bias.mean(axis=0))
print(scaled_housing_data_plus_bias.mean(axis=1))
print(scaled_housing_data_plus_bias.mean())
print(scaled_housing_data_plus_bias.shape)

### Manually computing the gradients

The code as below was discussed in the lecture.

In [None]:
# See lecture-slides for interpreting the code, if you need a help
# construct its graph at first
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

# Then, we run or evaluate the graph

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

In [None]:
best_theta

### Using autodiff

The preceding code works fine, but it requires mathematically deriving the gradients
from the cost function (MSE). 


In the case of Linear Regression, it is reasonably easy, but if you had to do this with deep neural networks you would get quite a headache: it would be tedious and error-prone. 

You could use symbolic differentiation to automatically find the equations for the partial derivatives for you, as discussed in my lecture.



Same as above except for the `gradients = ...` line:

In [None]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

How could you find the partial derivatives?

Fortunately, TensorFlow’s autodiff feature comes to the rescue: it can automatically
and efficiently compute the gradients for you. Simply replace the gradients = ...
line in the Gradient Descent code in the previous section with the following line, and
the code will continue to work just fine

In [None]:
gradients = tf.gradients(mse, [theta])[0]

In [None]:
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

### Using a `GradientDescentOptimizer`

So TensorFlow computes the gradients for you. 

But it gets even easier: it also provides a number of optimizers, including a Gradient Descent optimizer. 

You can simply replace the preceding gradients = ... and training_op = ... lines
with the following code, and once again everything will just work fine:

In [None]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [None]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

In [None]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

### Using a momentum optimizer

In [None]:
reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [None]:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
                                       momentum=0.9)

In [None]:
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [None]:
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

# Feeding data to the training algorithm

Let’s try to modify the previous code to implement Mini-batch Gradient Descent. 

For this, we need a way to replace X and y at every iteration with the next mini-batch. The
simplest way to do this is to use placeholder nodes. 

These nodes are special because they don’t actually perform any computation, they just output the data you tell them to output at runtime. They are typically used to pass the training data to TensorFlow during training. If you don’t specify a value at runtime for a placeholder, you get an exception.

## Placeholder nodes

In [None]:
# I have one lecture-slide for interpreting the code below:

reset_graph()

A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

print(B_val_1)

In [None]:
print(B_val_2)

## Mini-batch Gradient Descent

In [None]:
n_epochs = 1000
learning_rate = 0.01

To implement Mini-batch Gradient Descent, we only need to modify the existing code
slightly. First change the definition of X and y in the construction phase to make them
placeholder nodes:

In [None]:
reset_graph()

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

In [None]:
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [None]:
n_epochs = 10

Then define the batch size and compute the total number of batches:

In [None]:
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

Finally, in the execution phase, fetch the mini-batches one by one, then provide the
value of X and y via the feed_dict parameter when evaluating a node that depends
on either of them.

In [None]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

In [None]:
best_theta

# Saving and restoring a model  (Optional)

I did not talk about this part in the lecture, you try to learn by yourself.

In [None]:
reset_graph()

n_epochs = 1000                                                                       # not shown in the book
learning_rate = 0.01                                                                  # not shown

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")            # not shown
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")            # not shown
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")                                      # not shown
error = y_pred - y                                                                    # not shown
mse = tf.reduce_mean(tf.square(error), name="mse")                                    # not shown
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)            # not shown
training_op = optimizer.minimize(mse)                                                 # not shown

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())                                # not shown
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

In [None]:
best_theta

In [None]:
with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    best_theta_restored = theta.eval() # not shown in the book

In [None]:
np.allclose(best_theta, best_theta_restored)

If you want to have a saver that loads and restores `theta` with a different name, such as `"weights"`:

In [None]:
saver = tf.train.Saver({"weights": theta})

By default the saver also saves the graph structure itself in a second file with the extension `.meta`. You can use the function `tf.train.import_meta_graph()` to restore the graph structure. This function loads the graph into the default graph and returns a `Saver` that can then be used to restore the graph state (i.e., the variable values):

In [None]:
reset_graph()
# notice that we start with an empty graph.

saver = tf.train.import_meta_graph("/tmp/my_model_final.ckpt.meta")  # this loads the graph structure
theta = tf.get_default_graph().get_tensor_by_name("theta:0") # not shown in the book

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")  # this restores the graph's state
    best_theta_restored = theta.eval() # not shown in the book

In [None]:
np.allclose(best_theta, best_theta_restored)

This means that you can import a pretrained model without having to have the corresponding Python code to build the graph. This is very handy when you keep tweaking and saving your model: you can load a previously saved model without having to search for the version of the code that built it.