# Up and Running with Tensorflow 

Tensorflow is a powerful open source software library for numerical computation, particularly well suited and fine-tuned for large-scale machine learning.

Basically, we definea graph of computations to perform, and Tensorflow takes that graph and runs it efficiently using optimized C++ code.

Most importantly, Tensorflow can perform parallel operations across multiple CPUs or GPUs, meaning that it can train on huge neural networks with masssive amounts of data. It is mainly behind Google Speech, Google Photos, and Google Search.

## Installation 

In [1]:
import tensorflow as tf

print(tf.__version__)

  from ._conv import register_converters as _register_converters


1.11.0


## Creating a graph and running it 

In [2]:
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2

The code above did not run. In fact, the variables are not even defined. We need to open a Tensorflow session to initialize the variables and compute the function. A *session* is responsible of placing the operations onto the CPU or the GPU. Usually, we close the session to free up resources once the computation is done.

In [3]:
#Open a session
sess = tf.Session()
print('TF session opened')

#Initialize variables
sess.run(x.initializer)
sess.run(y.initializer)

#Compute function
result = sess.run(f)
print(result)

#Close session
sess.close()
print('TF session closed')

TF session opened
42
TF session closed


It is cumbersome to open and close the session manually. Instead, we can use the following syntax to automatically close a TF session:

In [4]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
    print(result)

42


Also, instead of initializing each variable, we can create a node that will then initialize all variables when it is run:

In [5]:
#Initialize an init node
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result = f.eval()
    print(result)

42


We can also use an `InteractiveSession()` to avoid the `with` block, but we will need to close the session manually.

In [6]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)

42


In [7]:
sess.close()

Typically, a TS program is split into 2 parts: a construction phase and a an execution phase. During the construction phase, we simply build the computation graph. This represents all equations and steps to train a model Then, the execution phase usually consists of a loop that will repeatedly train on batches of data, gradually improving the model's parameters 

## Managing graphs 

Any node created is automatically added to the to the default graph:

In [8]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

In most cases, this is wanted, but you may want to manage multiple independent graphs. We can do it by making a `Graph` that will temporarily be the default graph inside a `with` block:

In [9]:
graph = tf.Graph()

with graph.as_default():
    x2 = tf.Variable(2)
    
x2.graph is graph

True

In [10]:
x2.graph is tf.get_default_graph()

False

## Lifecycle of a node value 

When we evaluate a node, Tensorflow knows automatically the set of nodes that it depends on and it evaluates these nodes first. For example:

In [11]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


Here, it understood that `y` depended on `y`, which depended on `w`. Therefore, it computed these two variables before evaluating `y`. 

However, all node values are dropped between graph runs. Therefore, `w` and `x` were evaluated twice. Only variable values are kept across graph runs. Othwerise, its life starts when the initializer is run, and ends once the session is closed.

To avoid having the graph compute the same two variables twice, we must ask Tensorflow to evaluate them in a single run:

In [12]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


## Linear regression with Tensorflow 

Tensorflow operations (ops) can take any number of inputs and produce any number of outputs. For example, a multiplication takes 2 inputs and generates 1 output. Constants and variables take no inputs (they are called source ops).

The inputs and outputs are multidimensional arrays called *tensors*. 

Now we wil perform linear regression with Tensorflow on the California housing dataset. First, we fetch the dataset and add a bias to the input features. Then, we create constant nodes `X` and `y` to hold the data and the targets respectively. Finally, we compute `theta`.

In [13]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()
    print(theta_value)

[[-3.7185181e+01]
 [ 4.3633747e-01]
 [ 9.3952334e-03]
 [-1.0711310e-01]
 [ 6.4479220e-01]
 [-4.0338000e-06]
 [-3.7813708e-03]
 [-4.2348403e-01]
 [-4.3721911e-01]]


## Implementing gradient descent 

We can do this multiple ways. First, we will manually compute the gradients, and then we will use `autodiff` and some out-of-the-box optimizers

### Manually compute the gradients 

In [14]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [15]:
n_epochs = 1000
learning_rate = 0.1

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name='theta')

y_pred = tf.matmul(X, theta, name='predictions')

error = y_pred - y

mse = tf.reduce_mean(tf.square(error), name='mse')

gradients = 2/m * tf.matmul(tf.transpose(X), error)

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch ', epoch, 'MSE = ', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()

Epoch  0 MSE =  9.590753
Epoch  100 MSE =  0.5256891
Epoch  200 MSE =  0.52454007
Epoch  300 MSE =  0.52435577
Epoch  400 MSE =  0.5243265
Epoch  500 MSE =  0.52432185
Epoch  600 MSE =  0.5243209
Epoch  700 MSE =  0.52432114
Epoch  800 MSE =  0.524321
Epoch  900 MSE =  0.52432084


In [16]:
best_theta

array([[ 2.0685577 ],
       [ 0.8296303 ],
       [ 0.11875372],
       [-0.2655477 ],
       [ 0.30571342],
       [-0.00450238],
       [-0.03932669],
       [-0.8998599 ],
       [-0.8705166 ]], dtype=float32)

### Using autodiff 

In [17]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.1

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name='theta')

y_pred = tf.matmul(X, theta, name='predictions')

error = y_pred - y

mse = tf.reduce_mean(tf.square(error), name='mse')

gradients = tf.gradients(mse, [theta])[0]

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch ', epoch, 'MSE = ', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()
    
print(best_theta)

Epoch  0 MSE =  8.617682
Epoch  100 MSE =  0.5299263
Epoch  200 MSE =  0.52459085
Epoch  300 MSE =  0.5243409
Epoch  400 MSE =  0.5243231
Epoch  500 MSE =  0.52432126
Epoch  600 MSE =  0.524321
Epoch  700 MSE =  0.52432096
Epoch  800 MSE =  0.5243207
Epoch  900 MSE =  0.5243209
[[ 2.0685577 ]
 [ 0.82962626]
 [ 0.11875296]
 [-0.26553985]
 [ 0.30570698]
 [-0.00450263]
 [-0.03932653]
 [-0.8998698 ]
 [-0.87052596]]


### Using an optimizer 

In [18]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.1

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name='theta')

y_pred = tf.matmul(X, theta, name='predictions')

error = y_pred - y

mse = tf.reduce_mean(tf.square(error), name='mse')

optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)

training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch ', epoch, 'MSE = ', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()
    
print(best_theta)

Epoch  0 MSE =  9.460548
Epoch  100 MSE =  0.52438575
Epoch  200 MSE =  0.52432096
Epoch  300 MSE =  0.5243209
Epoch  400 MSE =  0.52432096
Epoch  500 MSE =  0.52432096
Epoch  600 MSE =  0.52432096
Epoch  700 MSE =  0.52432096
Epoch  800 MSE =  0.52432096
Epoch  900 MSE =  0.52432096
[[ 2.0685582 ]
 [ 0.82961947]
 [ 0.1187517 ]
 [-0.26552707]
 [ 0.30569637]
 [-0.00450299]
 [-0.03932628]
 [-0.8998851 ]
 [-0.8705405 ]]


## Feeding data to the training algorithm 

In [19]:
n_epochs = 1000
learning_rate = 0.01

In [20]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

In [21]:
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [22]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In [23]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

In [24]:
best_theta

array([[ 2.070016  ],
       [ 0.8204561 ],
       [ 0.1173173 ],
       [-0.22739051],
       [ 0.3113402 ],
       [ 0.00353193],
       [-0.01126994],
       [-0.91643935],
       [-0.8795008 ]], dtype=float32)

## Saving and restoring models 

In [27]:
tf.reset_default_graph()

n_epochs = 1000                                                                       # not shown in the book
learning_rate = 0.01                                                                  # not shown

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")            # not shown
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")            # not shown
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")                                      # not shown
error = y_pred - y                                                                    # not shown
mse = tf.reduce_mean(tf.square(error), name="mse")                                    # not shown
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)            # not shown
training_op = optimizer.minimize(mse)                                                 # not shown

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())                                # not shown
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "tmp/my_model_final.ckpt")

Epoch 0 MSE = 2.7544262
Epoch 100 MSE = 0.632222
Epoch 200 MSE = 0.5727805
Epoch 300 MSE = 0.5585007
Epoch 400 MSE = 0.54907
Epoch 500 MSE = 0.54228795
Epoch 600 MSE = 0.5373789
Epoch 700 MSE = 0.533822
Epoch 800 MSE = 0.5312425
Epoch 900 MSE = 0.5293704


In [29]:
with tf.Session() as sess:
    saver.restore(sess, "tmp/my_model_final.ckpt")
    best_theta_restored = theta.eval()

INFO:tensorflow:Restoring parameters from tmp/my_model_final.ckpt


## Visualize the graph with TensorBoard

In [31]:
tf.reset_default_graph()

In [32]:
from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

In [33]:
n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [34]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

In [35]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In [36]:
with tf.Session() as sess:                                                        
    sess.run(init)                                                                

    for epoch in range(n_epochs):                                                 
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()                                               

In [37]:
file_writer.close()

In [38]:
best_theta

array([[ 2.070016  ],
       [ 0.8204561 ],
       [ 0.1173173 ],
       [-0.22739051],
       [ 0.3113402 ],
       [ 0.00353193],
       [-0.01126994],
       [-0.91643935],
       [-0.8795008 ]], dtype=float32)