## Chapter 9. Up and Running with TensorFlow

Basic principle: 1. define a graph of computations to perform; 2. TensorFlow takes that graph and run it efficiently using optimized C++ code.

<div style="width:400 px; font-size:100%; text-align:center;"> <center><img src="img/fig9-1.png" width=400px alt="fig9-1" style="padding-bottom:1.0em;padding-top:2.0em;"></center>_Figure 9-1. A simple computation graph_</div>

Importantly, parallel with multiple CPUs or GPUs, distributed computing.

Characteristic: clean design, scalability, flexibility, and great documentation. (flexible, scalable, and production-ready)

Highlights:
 - All types of platforms
 - API _TF.Learn_ ($tensorflow.contrib.learn$)
 - _TF-slim_ ($tensorflow.contrib.slim$)
 - APIs <font color=blue>Keras</font> or <font color=blue>Pretty Tensor</font>
 - Its main Python API offers much more flexibility (at the cost of higher complexity) to create all sorts of computations
 - Highly efficient C++ implementations
 - Advanced optimization nodes to search for the parameters (gradient) that minimize a cost function (_automatic differentiating, autodiff_)
 - Visualization tool _TensorBoard_
 - Cloud computing
 - Great community: https://github.com/jtoy/awesome-tensorflow
 
### Installation

Install in a virtual environment
>$ pip3 install --upgrade tensorflow

Check version
>$ python3 -c 'import tensorflow; print(tensorflow.__version__)'

### Creating Your First Graph and Running It in a Session

1. Creates a computation graph

In [39]:
import tensorflow as tf
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

The code does not perform any computation, even any variable initialization. To evaluate this graph, open a TensorFlow _session_ and use it to initialize the variables and evaluate f. A TensorFlow session takes care
of placing the operations onto devices such as CPUs and GPUs and running them, and it holds all the variable values.

In [40]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()

42


Repeat $sess.run()$ all the time is a bit cumbersome

In [41]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

Inside the with block, the session is set as the default session. Calling x.initializer.run() is equivalent to calling tf.get_default_session().run(x.initializer), and similarly f.eval() is equivalent to calling tf.get_default_session().run(f). Moreover, the session is automatically closed at the end of the block.

Instead of manually running the initializer for every single variable, you can use the global_variables_initializer() function. Note that it does not actually perform the initialization immediately, but rather creates a node in the graph that will initialize all variables when it is run:

In [42]:
init = tf.global_variables_initializer() # prepare an init node

with tf.Session() as sess:
    init.run() # actually initialize all the variables
    result = f.eval()

Inside Jupyter or within a Python shell you may prefer to create an $InteractiveSession$. The only
difference from a regular $Session$ is that when an $InteractiveSession$ is created it automatically sets itself as the default session, so you don't need a with block (but you do need to close the session manually when you are done with it):

>sess = tf.InteractiveSession()}
<br>
>init.run()
<br>
>result = f.eval()
<br>
>print(result)
<br>
>42
<br>
>sess.close()

A TensorFlow program is typically split into two parts: 
1. Construction phase: builds a computation graph representing the ML model and the computations required to train it.
2. Execution phase: runs a loop that evaluates a training step repeatedly, gradually improving the model parameters.

### Managing Graphs

Any node you create is automatically added to the default graph:

In [43]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

Managing multiple independent graphs, create a new Graph and temporarily making it the default graph inside a $with$ block

In [44]:
>>> graph = tf.Graph()
>>> with graph.as_default():
...     x2 = tf.Variable(2)
...
>>> x2.graph is graph

True

In [45]:
>>> x2.graph is tf.get_default_graph()

False

<font color=blue>_TIP_</font>
>In Jupyter (or in a Python shell), it is common to run the same commands more than once while you are experimenting. As a result, you may end up with a default graph containing many duplicate nodes. One solution is to restart the Jupyter kernel (or the Python shell), but a more convenient solution is to just reset the default graph by running tf.reset_default_graph().

### Lifecycle of a Node Value

When evaluating a node, TensorFlow automatically determines the set of nodes that it depends on and it evaluates these nodes first.

In [46]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval()) # 10
    print(z.eval()) # 15

10
15


IMPORTANT: it will <font color=red>NOT</font> reuse the result of the previous evaluation of $w$ and $x$. In short, the preceding code evaluates $w$ and $x$ twice.

All node values are dropped between graph runs, except variable values, which are maintained by the session across graph runs. A variable starts its life when its initializer is run, and it ends when the session is closed.

If you want to evaluate $y$ and $z$ efficiently, without evaluating w and x twice as in the previous code, you
must ask TensorFlow to evaluate both y and z in just one graph run

In [47]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])

<font color=red>_WARNING_</font>
>In single-process TensorFlow, multiple sessions do not share any state, even if they reuse the same graph (each session would have its own copy of every variable). In distributed TensorFlow, variable state is stored on the servers, not in the sessions, so multiple sessions can share the same variables.

### Linear Regression with TensorFlow

TensorFlow operators (_ops_), constant and variables (_source ops_), inputs and outputs are multidimensional arrays (_tensors_).

The following code manipulates 2D arrays to perform Linear Regression on the California housing dataset.

In [48]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

The main benefit of this code versus computing the Normal Equation directly using NumPy is that TensorFlow will automatically run this on your GPU card.

### Implementing Gradient Descent

<font color=red>_WARNING_</font>
>When using Gradient Descent, remember that it is important to first normalize the input feature vectors, or else training may be much slower.

Gradient Descent requires scaling the feature vectors first. We could do this using TF, but let's just use Scikit-Learn for now.

In [51]:
from  sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [52]:
print(scaled_housing_data_plus_bias.mean(axis=0))
print(scaled_housing_data_plus_bias.mean(axis=1))
print(scaled_housing_data_plus_bias.mean())
print(scaled_housing_data_plus_bias.shape)

[ 1.00000000e+00  6.60969987e-17  5.50808322e-18  6.60969987e-17
 -1.06030602e-16 -1.10161664e-17  3.44255201e-18 -1.07958431e-15
 -8.52651283e-15]
[ 0.38915536  0.36424355  0.5116157  ... -0.06612179 -0.06360587
  0.01359031]
0.11111111111111005
(20640, 9)


#### Manually Computing the Gradients

The assign() function creates a node that will assign a new value to a variable.

In [53]:
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

# # TF autodiff 
# gradients = tf.gradients(mse, [theta])[0]

# # Gradient Descent optimizer
# optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
# optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9) # momentum optimizer
# training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)

    best_theta = theta.eval()

Epoch 0 MSE = 4.6820836
Epoch 100 MSE = 0.7089329
Epoch 200 MSE = 0.6406445
Epoch 300 MSE = 0.6128869
Epoch 400 MSE = 0.5923592
Epoch 500 MSE = 0.5768639
Epoch 600 MSE = 0.565109
Epoch 700 MSE = 0.55614835
Epoch 800 MSE = 0.54928374
Epoch 900 MSE = 0.54399776


#### Using autodiff

Use _symbolic differentiation_ to automatically compute the partial derivatives, but the resulting code would not necessarily be very efficient.

TensorFlow's autodiff feature can automatically and efficiently compute the gradients.
>gradients = tf.gradients(mse, [theta])[0]

The gradients() function takes an op (in this case mse ), a list of variables (in this case just theta ), and it creates a list of ops (one per variable) to compute the gradients of the op with regards to each variable. So the gradients node will compute the gradient vector of the MSE with regards to theta .

#### Using an Optimizer

Replace the preceding gradients = ... and training_op = ... lines with the following code
>optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
<br>
>training_op = optimizer.minimize(mse)

Change different type of optimizer
>optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)

### Feeding Data to the Training Algorithm


