In [1]:
import tensorflow as tf

In [2]:
tf.__version__

'1.10.1'

### Creating a compuation graph and running it in a Session

In [3]:
# Creates a computation graph
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2

In [4]:
type(x)

tensorflow.python.ops.variables.Variable

In [5]:
type(f)

tensorflow.python.framework.ops.Tensor

This code does not actually perform any computation. It **just creates a computation graph**. In fact, even the variables are not initialized yet.

To evaluate this graph, u need to open a **TF `session`** and use it initialize the variables and evaluate `f`. A TF `session` takes care of placing the operations onto devices such as CPUs and GPUs and running them, and it holds all the variable values.

+ way 1

In [6]:
sess = tf.Session()   # open a TF session
# initialization
sess.run(x.initializer) 
sess.run(y.initializer)

result = sess.run(f)   # evaluate f

In [7]:
result

42

> Having to repeat `sess.run()` all the time is a bit cumbersome(麻烦).

In [8]:
sess.close()   # close the session (which frees up resources)

+ way 2

In [9]:
# The session is automatically closed at the end of the block
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

In [10]:
result

42

Calling `x.initializer.run()` is equivalent to calling `tf.get_default_session().run(x.initializer)`.  
Calling `f.eval()` is equivalent to calling `tf.get_default_session.run(f)`

+ way 3

Use the `global_variables_initializer()` function to avoid manually running the initializer for every single variable.

In [12]:
init = tf.global_variables_initializer()   # prepare an init node

with tf.Session() as sess:
    init.run()   # actually initialize all the variables
    result = f.eval()

result

42

+ way 4

Inside Jupyter or within a Python shell u may prefer to create an `InteractiveSession`, which automatically sets itself as the default session, so u don't need a `with` block (but do need to close the session manually).

In [13]:
init = tf.global_variables_initializer()   # prepare an init node

sess = tf.InteractiveSession()
init.run()
result = f.eval()
result

42

In [14]:
sess.close()

**A TF program is typically split into two parts:**
1. construction phase----typically builds a computation graph representing the ML model and the computations required to train it.
2. execution phase----generally runs a loop that evaluates a training step repeatedly (e.g., one step per mini-batch), gradually improving the model params.

### Managing Graphs

In [15]:
tf.get_default_graph()

<tensorflow.python.framework.ops.Graph at 0x7f08fee06710>

In [16]:
x.graph

<tensorflow.python.framework.ops.Graph at 0x7f08fee06710>

In [17]:
y.graph

<tensorflow.python.framework.ops.Graph at 0x7f08fee06710>

In [18]:
f.graph

<tensorflow.python.framework.ops.Graph at 0x7f08fee06710>

**Any node u create is automatically added to the default graph:**

In [22]:
x1 = tf.Variable(1, name='x1')

In [23]:
x1.graph

<tensorflow.python.framework.ops.Graph at 0x7f08fee06710>

In [24]:
x1.graph is tf.get_default_graph()

True

Creating a new **Graph** and temporarily making it the default graph inside a `with` block. Thus, u can manage multiple independent graphs.

In [25]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2, name='x2')

In [26]:
x2.graph

<tensorflow.python.framework.ops.Graph at 0x7f08fedd4908>

In [27]:
x2.graph is graph

True

In [28]:
x2.graph is tf.get_default_graph()

False

### Lifecycle of a Node Value

When u evaluate a node, TF automatically determines the set of nodes that it depends on and it evaluates these nodes first.

In [29]:
# Define a simple graph
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

In [30]:
with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


> 计算`y`时，TF自动探测到y取决于x，x取决于w，所以先计算w，再x，再y，然后返回y的值。然后，计算z，类似于计算y。  
**前面的代码会两次计算`w`和`x`。**

All node values are dropped between graph runs, except variable values, which are maintained by the session across graphs runs. A variable starts its life when its initializer is run, and it ends when the session is closed.

In [33]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


### Linear Regression with TF

In [42]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
# add an extra bias input features (x0=1) to all training instances
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

tf.reset_default_graph()

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
# y.shape -> (20640, 1)
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

Downloading Cal. housing from https://ndownloader.figshare.com/files/5976036 to /home/bingli/scikit_learn_data


In [43]:
theta_value

array([[-3.7185181e+01],
       [ 4.3633747e-01],
       [ 9.3952334e-03],
       [-1.0711310e-01],
       [ 6.4479220e-01],
       [-4.0338000e-06],
       [-3.7813708e-03],
       [-4.2348403e-01],
       [-4.3721911e-01]], dtype=float32)

In [52]:
from sklearn.linear_model import LinearRegression

linear_reg = LinearRegression()
linear_reg.fit(housing.data, housing.target.reshape(-1, 1))

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [53]:
linear_reg.coef_

array([[ 4.36693293e-01,  9.43577803e-03, -1.07322041e-01,
         6.45065694e-01, -3.97638942e-06, -3.78654265e-03,
        -4.21314378e-01, -4.34513755e-01]])

In [55]:
linear_reg.coef_.shape

(1, 8)

In [54]:
linear_reg.intercept_

array([-36.94192021])

In [57]:
np.r_[linear_reg.intercept_.reshape(-1, 1),
      linear_reg.coef_.reshape(-1, 1)]

array([[-3.69419202e+01],
       [ 4.36693293e-01],
       [ 9.43577803e-03],
       [-1.07322041e-01],
       [ 6.45065694e-01],
       [-3.97638942e-06],
       [-3.78654265e-03],
       [-4.21314378e-01],
       [-4.34513755e-01]])

### Implementing Gradient Descent

**try batch gradient descent:**
1. do this by manually computing the gradients
2. use TF's autodiff feature to let TF compute the gradients automatically
3. use a couple of TF's out-of-the-box optimizers

Gradient Descent requires scaling the feature vectors first, or else training may be much slower.

In [58]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
housing_data_scaled = scaler.fit_transform(housing.data)
housing_data_scaled_plus_bias = np.c_[np.ones((m, 1)), housing_data_scaled]

In [67]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(housing_data_scaled_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform((n+1, 1), -1.0, 1.0), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2 / m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - gradients * learning_rate)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch', epoch, 'MSE=', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()

Epoch 0 MSE= 11.24972
Epoch 100 MSE= 0.86877763
Epoch 200 MSE= 0.65710276
Epoch 300 MSE= 0.6158784
Epoch 400 MSE= 0.5901362
Epoch 500 MSE= 0.5717854
Epoch 600 MSE= 0.55856186
Epoch 700 MSE= 0.54902375
Epoch 800 MSE= 0.5421429
Epoch 900 MSE= 0.53717923


In [68]:
best_theta

array([[ 2.0685523 ],
       [ 0.7677974 ],
       [ 0.14515601],
       [-0.07765672],
       [ 0.11908948],
       [ 0.00590412],
       [-0.04024149],
       [-0.7654486 ],
       [-0.7254704 ]], dtype=float32)

### Using autodiff

**TF's autodiff** feature automatically and effectively compute the gradients.

`tf.gradients()`function takes an op and a list of variables, and it creates a list of ops (one per variable) to compute the gradients of the op with regard to each variable.

In [69]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(housing_data_scaled_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform((n+1, 1), -1.0, 1.0, seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

In [70]:
gradients = tf.gradients(mse, [theta])[0]

In [82]:
tf.gradients(mse, [theta])

[<tf.Tensor 'gradients_2/prediction_grad/MatMul_1:0' shape=(9, 1) dtype=float32>]

In [75]:
gradients.shape

TensorShape([Dimension(9), Dimension(1)])

In [74]:
training_op = tf.assign(theta, theta - gradients * learning_rate)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch', epoch, 'MSE=', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()

print('Best theta:')
print(best_theta)

Epoch 0 MSE= 2.7544262
Epoch 100 MSE= 0.6322219
Epoch 200 MSE= 0.5727803
Epoch 300 MSE= 0.5585008
Epoch 400 MSE= 0.54907
Epoch 500 MSE= 0.54228795
Epoch 600 MSE= 0.5373791
Epoch 700 MSE= 0.53382194
Epoch 800 MSE= 0.5312425
Epoch 900 MSE= 0.5293705
Best theta:
[[ 2.06855226e+00]
 [ 7.74078071e-01]
 [ 1.31192386e-01]
 [-1.17845066e-01]
 [ 1.64778143e-01]
 [ 7.44077959e-04]
 [-3.91945131e-02]
 [-8.61356676e-01]
 [-8.23479772e-01]]


### Using an optimizer

In [83]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(housing_data_scaled_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform((n + 1, 1), -1., 1., seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

In [84]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

In [87]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch', epoch, 'MSE=', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()

print('Best theta:')
print(best_theta)

Epoch 0 MSE= 2.7544262
Epoch 100 MSE= 0.63222194
Epoch 200 MSE= 0.5727803
Epoch 300 MSE= 0.5585008
Epoch 400 MSE= 0.54907
Epoch 500 MSE= 0.54228795
Epoch 600 MSE= 0.5373791
Epoch 700 MSE= 0.53382194
Epoch 800 MSE= 0.5312425
Epoch 900 MSE= 0.5293705
Best theta:
[[ 2.06855226e+00]
 [ 7.74078071e-01]
 [ 1.31192386e-01]
 [-1.17845066e-01]
 [ 1.64778143e-01]
 [ 7.44078017e-04]
 [-3.91945131e-02]
 [-8.61356676e-01]
 [-8.23479772e-01]]


### Using a momentum optimizer

The momentum optimizer often converges much faster than Gradient Descent.

In [88]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(housing_data_scaled_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform((n + 1, 1), -1., 1., seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

In [89]:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
                                       momentum=0.9)
training_op = optimizer.minimize(mse)

In [90]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print('Epoch', epoch, 'MSE=', mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()

print('Best theta:')
print(best_theta)

Epoch 0 MSE= 2.7544262
Epoch 100 MSE= 0.5273161
Epoch 200 MSE= 0.5244147
Epoch 300 MSE= 0.5243281
Epoch 400 MSE= 0.5243218
Epoch 500 MSE= 0.52432114
Epoch 600 MSE= 0.524321
Epoch 700 MSE= 0.524321
Epoch 800 MSE= 0.524321
Epoch 900 MSE= 0.52432096
Best theta:
[[ 2.068558  ]
 [ 0.82961667]
 [ 0.11875114]
 [-0.265522  ]
 [ 0.30569217]
 [-0.00450316]
 [-0.03932617]
 [-0.89989185]
 [-0.87054676]]


### Feeding data to the training algorithm

#### Placeholder nodes

**Placeholder nodes** are special because they don't perform any computation, they just output the data u tell them to output at runtime. They are typically used to pass the training data to TF during training. If u dont't specify a value at runtime for a placeholder, u get an exception.

In [91]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5

In [92]:
type(A)

tensorflow.python.framework.ops.Tensor

In [96]:
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[1, 2, 3], [7, 8, 9]]})

In [97]:
B_val_1

array([[6., 7., 8.]], dtype=float32)

In [98]:
B_val_2

array([[ 6.,  7.,  8.],
       [12., 13., 14.]], dtype=float32)

#### Mini-Batch Gradient Descent

In [99]:
n_epochs = 1000
learning_rate = 0.01

In [100]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n+1), name='X')
y = tf.placeholder(tf.float32, shape=(None, 1), name='y')

In [105]:
theta = tf.Variable(tf.random_uniform((n+1, 1), -1., 1., seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [103]:
n_epochs = 10

In [104]:
# Define the batch size and compute the total number of batches:
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In the execution phase, fetch the mini-batches one by one, then provide the value of `X` and `y` via the feed_dict param when evaluating a node that depends on either of them.

In [106]:
def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)
    indices = np.random.randint(m, size=batch_size)
    X_batch = housing_data_scaled_plus_bias[indices]
    y_batch = housing.target.reshape(-1, 1)[indices]
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)
    
    for epcoh in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    best_theta = theta.eval()

In [107]:
best_theta

array([[ 1.2565614e+00],
       [-4.5977593e-03],
       [-2.9244909e-01],
       [ 1.5032510e-01],
       [-5.1998848e-01],
       [ 3.0148393e-01],
       [-6.9579689e+01],
       [-1.5456535e+00],
       [-5.6844789e-01]], dtype=float32)

### Saving and restoring models

+ save

Just creates a `Saver` node at the end of the construction phase (after all variables nodes are created); then, in the execution phase, just call its `save()` method whenever u want to save the model, passing it the session and path of the checkpoint file:

In [110]:
n_epochs = 1000
learning_rate = 0.01

tf.reset_default_graph()

X = tf.constant(housing_data_scaled_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')

theta = tf.Variable(tf.random_uniform((n+1, 1), -1., 1., seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='prediction')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

# Create a Saver
saver = tf.train.Saver()

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            # checkpoint every 100 epochs
            save_path = saver.save(sess, '/home/bingli/Documents/my_model.ckpt')
        sess.run(training_op)
    best_theta = theta.eval()
    save_path = saver.save(sess, '/home/bingli/Documents/my_model_final.ckpt')

+ restore a model

Create a `Saver` at the end of construction phase just like before, but then at the begining of the execution phase, instead of initializing the variables using the `init` node, u call the `restore()` method of the `Saver` object:

In [112]:
with tf.Session() as sess:
    saver.restore(sess, '/home/bingli/Documents/my_model_final.ckpt')
    best_theta_restored = theta.eval()

INFO:tensorflow:Restoring parameters from /home/bingli/Documents/my_model_final.ckpt


In [113]:
np.allclose(best_theta, best_theta_restored)

True

By default, a `Saver` saves and restores all variables under their own name, but if u need more control, u can specify which variables to save or restore, and what names to use.

In [114]:
# save or restore only the theta variable under the name weights
saver = tf.train.Saver({'weights': theta})

By default, the Saver also saves the graph structure itself in a second file with the extension .meta. U can use the func `tf.train.import_meta_graph()` to restore the graph structure. This func loads the graph into the default graph and returns a Saver that can then be used to restore the graph state (i.e., the variabel values):

In [115]:
tf.reset_default_graph()

# Load the graph structure
saver = tf.train.import_meta_graph('/home/bingli/Documents/my_model_final.ckpt.meta')
theta = tf.get_default_graph().get_tensor_by_name('theta:0')

with tf.Session() as sess:
    # restores the graph's state
    saver.restore(sess, '/home/bingli/Documents/my_model_final.ckpt')
    best_theta_restored = theta.eval()

INFO:tensorflow:Restoring parameters from /home/bingli/Documents/my_model_final.ckpt


In [116]:
np.allclose(best_theta, best_theta_restored)

True