In [0]:
import numpy as np
import tensorflow as tf
tf.reset_default_graph()

#Creating Your First Graph and Running It in a Session

In [0]:
x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2

this code does not actually perform any computation, it just creates a cmoputation graph nad the variables are not initialized yet.

To evaluate this graph, you need to open a TensorFlow session and use it to initialize the variables and evaluate f.

In [5]:
# create a session
sess = tf.Session()

# initialize all variables
sess.run(x.initializer)
sess.run(y.initializer)

# evaluate f
result = sess.run(f)
print(result)

42


In [0]:
# close the session to free up resource
sess.close()

Having to repeat sess.run() all the time is a bit cumbersome, but fortunately there is a better way:

In [0]:
with tf.Session() as sess:
  x.initializer.run()
  y.initializer.run()
  result = f.eval()

In [8]:
result

42

Inside the **with** block, the session is set as the default session.

Calling **x.initializer.run()** is equivalent to calling **tf.get_default_session().run(x.initializer)**, and similarly **f.eval()** is equivalent to calling **tf.get_default_session().run(f)**. This makes the code easier to read. Moreover, the session is automatically closed at the end of the block.

Instead of manually running the initializer for every single variable, you can use the **global_variables_initializer()** function.

In [9]:
init = tf.global_variables_initializer() # prepare an init node
with tf.Session() as sess:
  init.run() # actually initialize all the variables
  result = f.eval()

result

42

When an **InteractiveSession** is created it automatically sets itself as the default session, so you don’t need a **with** block. (but you do need to close the session manually when you are done with it):

In [11]:
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)

42




In [0]:
sess.close()

In [15]:
result

42

A TensorFlow program is typically split into two parts: the first part builds a computation graph (this is called the construction phase), and the second part runs it (this is the execution phase).

#Managing Graphs
Any node you create is automatically added to the default graph:

In [17]:
tf.reset_default_graph()
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

You can manage multiple independent graphs by creating a new **Graph** and temprarily making it the default graph inside a **with** block,

In [19]:
graph = tf.Graph()
with graph.as_default():
  x2 = tf.Variable(2)

x2.graph is graph

True

In [20]:
x2.graph is tf.get_default_graph()

False

it is common to run the same commands more than once while you are experimenting. As a result, you may end up with a default graph containing many duplicate nodes. One solution is to restart the Jupyter kernel (or the Python shell), but a more convenient solution is to *just reset the default graph by running **tf.reset_default_graph()***.

#Lifecycle of a Node Value
When you evaluate a node, TensorFlow automatically determines the set of nodes that it depends on and it evaluates these nodes first.

In [21]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
  print(y.eval()) # 10
  print(z.eval()) # 15

10
15


the preceding code evaluates w and x twice. Once for y, once for z

A variable starts its life when its initializer is run, and it ends when the session is closed.

To evaluate y and z efficiently without evaluating w and x twice,

In [22]:
with tf.Session() as sess:
  y_val, z_val = sess.run([y, z])
  print(y_val) # 10
  print(z_val) #15

10
15


#Linear Regression with TensorFlow
##Using the Normal Equation

In [0]:
from sklearn.datasets import fetch_california_housing

tf.reset_default_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

In [28]:
theta_value

array([[-3.7112991e+01],
       [ 4.3611991e-01],
       [ 9.4082914e-03],
       [-1.0654381e-01],
       [ 6.4201808e-01],
       [-4.0360574e-06],
       [-3.7822633e-03],
       [-4.2303962e-01],
       [-4.3648642e-01]], dtype=float32)

Compare with pure NumPy,

In [29]:
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]


Compare with sklearn,

In [30]:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]


The main benefit of this code versus computing the Normal Equation directly using NumPy is that TensorFlow will automatically run this on your GPU card if you have one.

#Implementing Gradient Descent
When using Gradient Descent, remember that it is important to first normalize the input feature vectors, or else training may be much slower. You can do this using TensorFlow, NumPy, Scikit-Learn’s StandardScaler, or any other solution you prefer.

In [0]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

In [33]:
print(scaled_housing_data_plus_bias.mean(axis=0))

[ 1.00000000e+00  6.60969987e-17  5.50808322e-18  6.60969987e-17
 -1.06030602e-16 -1.10161664e-17  3.44255201e-18 -1.07958431e-15
 -8.52651283e-15]


In [35]:
print(scaled_housing_data_plus_bias.mean(axis=1))

[ 0.38915536  0.36424355  0.5116157  ... -0.06612179 -0.06360587
  0.01359031]


In [36]:
print(scaled_housing_data_plus_bias.mean())

0.11111111111111005


In [37]:
print(scaled_housing_data_plus_bias.shape)

(20640, 9)


##Manually Computing the Gradients
* The **random_uniform()** function creates a node in the graph that will generate a tensor containing random values, given its shape and value range
* The assign() function creates a node that will assign a new value to a variable. In this case, it implements the Batch Gradient Descent step $\theta^{(next step)}= \theta - \eta\nabla_{\theta}MSE(\theta)$
* The main loop executes the training step over and over again (**n_epochs** times), and every 100 iterations it prints out the current Mean Squared Error (**mse**). You should see the MSE go down at every iteration.

In [38]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1 ,1), dtype=tf.float32, name='y')
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0, seed=42), name='theta')
# n is defined above m,n = housing.data.shape
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
  sess.run(init)
  
  for epoch in range(n_epochs):
    if epoch % 100 == 0:
      print('Epoch', epoch, 'MSE=', mse.eval())
    sess.run(training_op)
    
  best_theta = theta.eval()

Epoch 0 MSE= 2.7544262
Epoch 100 MSE= 0.632222
Epoch 200 MSE= 0.5727805
Epoch 300 MSE= 0.55850065
Epoch 400 MSE= 0.54907
Epoch 500 MSE= 0.54228795
Epoch 600 MSE= 0.53737885
Epoch 700 MSE= 0.5338219
Epoch 800 MSE= 0.53124243
Epoch 900 MSE= 0.5293705


In [39]:
best_theta

array([[ 2.0685523e+00],
       [ 7.7407807e-01],
       [ 1.3119237e-01],
       [-1.1784508e-01],
       [ 1.6477816e-01],
       [ 7.4407709e-04],
       [-3.9194513e-02],
       [-8.6135662e-01],
       [-8.2347977e-01]], dtype=float32)

##Using autodiff
The above codes use *symbolic dierentiation* to automatically find the equations for the partial derivatives for you, but the resulting code would not necessarily be very efficient.

TensorFlow’s autodiff feature comes to the rescue: it can automaticallyand efficiently compute the gradients for you.

In [0]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [0]:
gradients = tf.gradients(mse, [theta])[0]

The **gradients()** function takes an op (in this case **mse**) and a list of variables (in this case just **theta**), and it creates a list of ops (one per variable) to compute the gradients of the op with regards to each variable. So the **gradients** node will compute the
gradient vector of the MSE with regards to **theta**.

In [43]:
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Epoch 0 MSE = 2.7544262
Epoch 100 MSE = 0.632222
Epoch 200 MSE = 0.57278043
Epoch 300 MSE = 0.5585007
Epoch 400 MSE = 0.54907
Epoch 500 MSE = 0.5422879
Epoch 600 MSE = 0.53737885
Epoch 700 MSE = 0.53382194
Epoch 800 MSE = 0.5312425
Epoch 900 MSE = 0.5293704
Best theta:
[[ 2.0685525e+00]
 [ 7.7407801e-01]
 [ 1.3119237e-01]
 [-1.1784505e-01]
 [ 1.6477814e-01]
 [ 7.4407551e-04]
 [-3.9194509e-02]
 [-8.6135679e-01]
 [-8.2347989e-01]]


How could you find the partial derivatives of the following function with regards to a and b?

In [0]:
def my_func(a, b):
  z = 0
  for i in range(100):
    z = a * np.cos(z + i) + z * np.sin(b - i)
  return z

In [45]:
my_func(0.2, 0.3)

-0.21253923284754914

In [0]:
tf.reset_default_graph()

a = tf.Variable(0.2, name='a')
b = tf.Variable(0.3, name='b')
z = tf.constant(0.0, name='z0')

for i in range(100):
  z = a * tf.cos(z + i) + z * tf.sin(b - i)
  
grads = tf.gradients(z, [a, b])
init = tf.global_variables_initializer()

Let's compute the function at $a=0.2$ and $b=0.3$, and the partial derivatives at that point with regards to $a$ and with regards to $b$:

In [48]:
with tf.Session() as sess:
  init.run()
  print(z.eval())
  print(sess.run(grads))

-0.21253741
[-1.1388494, 0.19671395]


##Using a Optimizer
TensorFlow also provides a number of optimizers out of the box, including a Gradient Descent optimizer.

###Using a GradientDescentOptimizer

In [0]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [0]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

In [51]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
  sess.run(init)
  
  for epoch in range(n_epochs):
    if epoch % 100 == 0:
      print('Epoch', epoch, 'MSE=', mse.eval())
    sess.run(training_op)
    
  best_theta = theta.eval()
  
print('Best theta:')
print(best_theta)

Epoch 0 MSE= 2.7544262
Epoch 100 MSE= 0.632222
Epoch 200 MSE= 0.57278043
Epoch 300 MSE= 0.5585007
Epoch 400 MSE= 0.54907
Epoch 500 MSE= 0.54228795
Epoch 600 MSE= 0.53737885
Epoch 700 MSE= 0.53382194
Epoch 800 MSE= 0.53124255
Epoch 900 MSE= 0.5293704
Best theta:
[[ 2.0685525e+00]
 [ 7.7407807e-01]
 [ 1.3119237e-01]
 [-1.1784508e-01]
 [ 1.6477816e-01]
 [ 7.4407790e-04]
 [-3.9194509e-02]
 [-8.6135662e-01]
 [-8.2347977e-01]]


###Using a momentum optimizer

In [0]:
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

In [0]:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)

In [0]:
training_op = optimizer.minimize(mse)
init = tf.global_variables_initializer()

In [55]:
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

Best theta:
[[ 2.068558  ]
 [ 0.82961667]
 [ 0.11875115]
 [-0.26552212]
 [ 0.30569226]
 [-0.00450316]
 [-0.03932616]
 [-0.89989173]
 [-0.8705467 ]]


#Feeding Data to the Training Algorithm
The **placeholder** nodes can be used when we want to replace X and y in the previous code at every iteration with the next mini-batch. These nodes are special because they don’t actually perform any computation, they just output the data you tell them to output at runtime. They are typically used to pass the training data to TensorFlow during training.

To create a placeholder node, you must call the placeholder() function and specify the output tensor’s data type.

Optionally, you can also specify its shape, but if you specify **None** for a dimension, it means "any size".

In [0]:
A = tf.placeholder(tf.float32, shape=(None, 3))

For example, since A has to spcify a **shape** attribute, it must have rank 2 (i.e., two dimensions), and we also specify that data fed to A must have three columns.

To evaluate B, just pass a **feed_dict** to the **eval()** method that specifies the value of A

In [56]:
tf.reset_default_graph()

A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
  B_val_1 = B.eval(feed_dict={A:[[1, 2, 3]]})
  B_val_2 = B.eval(feed_dict={A:[[4, 5, 6], [7, 8, 9]]})
  
print(B_val_1)

[[6. 7. 8.]]


In [57]:
print(B_val_2)

[[ 9. 10. 11.]
 [12. 13. 14.]]


To implement the mini-batch gradient descent,

In [0]:
n_epochs = 1000
learning_rate = 0.01

In [0]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n+1), name='X')
y = tf.placeholder(tf.float32, shape=(None, 1), name='y')

In [0]:
theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0, seed=42), name='theta')
y_pred = tf.matmul(X, theta, name='predictions')
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name='mse')
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [0]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))
# recall: m,n = housing.data.shape, so m=#samples, n=#features

In [0]:
def fetch_batch(epoch, batch_index, batch_size):
  np.random.seed(epoch * n_batches + batch_index)
  indices = np.random.randint(m, size=batch_size)
  X_batch = scaled_housing_data_plus_bias[indices]
  y_batch = housing.target.reshape(-1, 1)[indices]
  return X_batch, y_batch

In [69]:
with tf.Session() as sess:
  sess.run(init)
  
  for epoch in range(n_epochs):
    for batch_index in range(n_batches):
      X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
      sess.run(training_op, feed_dict={X:X_batch, y:y_batch})
      
  best_theta = theta.eval()
  
best_theta

array([[ 2.070016  ],
       [ 0.8204561 ],
       [ 0.11731729],
       [-0.22739056],
       [ 0.31134024],
       [ 0.00353192],
       [-0.01126995],
       [-0.9164395 ],
       [-0.8795009 ]], dtype=float32)

#Saving and Restoring Models
To create a **Saver** node at the end of the construction phase (after all variable nodes are created); then, in the execution phase, just call its **save()** method whenever you want to save the model, passing it the session and path of the checkpoint file:

In [0]:
tf.reset_default_graph()

n_epochs = 1000     
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X") 
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions") 
error = y_pred - y 
mse = tf.reduce_mean(tf.square(error), name="mse") 
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse) 

init = tf.global_variables_initializer()

'''create a Saver node here'''
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval()) 
            '''save the model every epoch'''
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    '''save the entire model'''
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

To restore a model, create a **Saver** at the end of the construction phase just like before, but then at the beginning of the execution phase, instead of initializing the variables using the **init** node, you call the **restore()** method of the **Saver** object:

In [0]:
with tf.Session() as sess:
    '''restore model here'''
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    best_theta_restored = theta.eval()

In [0]:
np.allclose(best_theta, best_theta_restored)

If you want to have a saver that loads and restores theta with a different name, such as **weights**:

In [0]:
saver = tf.train.Saver({"weights": theta})

By default the saver also saves the graph structure itself in a second file with the extension .meta. You can use the function **tf.train.import_meta_graph()** to restore the graph structure. This function loads the graph into the default graph and returns a **Saver** that can then be used to restore the graph state (i.e., the variable values):

In [0]:
tf.reset_default_graph()
# notice that we start with an empty graph.

saver = tf.train.import_meta_graph("/tmp/my_model_final.ckpt.meta")  # this loads the graph structure
theta = tf.get_default_graph().get_tensor_by_name("theta:0")

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")  # this restores the graph's state
    best_theta_restored = theta.eval()

In [0]:
np.allclose(best_theta, best_theta_restored)

This means that you can import a pretrained model without having to have the corresponding Python code to build the graph. This is very handy when you keep tweaking and saving your model: you can load a previously saved model without having to search for the version of the code that built it.

#Visualizing the Graph and Training Curves Using TensorBoard

Inside Jupyter, To visualize the graph within Jupyter, we will use a TensorBoard server available online at https://tensorboard.appspot.com/ (so this will not work if you do not have Internet access). Alternatively, you could use a tool like tfgraphviz

In [0]:
from tensorflow_graph_in_jupyter import show_graph
show_graph(tf.get_default_graph())

To ask TensorBoard correct show the visualizations, you need to include a timestamp in the log directory name. For example, the training error (MSE)

In [0]:
tf.reset_default_graph()

from datetime import datetime

now = datetime.utcnow().strftime('%Y%m%d%H%M%S')
root_logdir = 'tf_logs'
logdir = '/run-{}/'.format(root_logdir, now)

In [0]:
n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

In [0]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

* The first line creates a node in the graph that will evaluate the MSE value and write it to a TensorBoard-compatible binary log string called a *summary*. 
* The second line creates a **FileWriter** that you will use to write summaries to logfiles in the log directory.
 * The first parameter indicates the path of the log directory (in this case something like *tf_logs/run-20160906091959/*, relative to the current directory).
 * The second (optional) parameter is the graph you want to visualize.
 
Upon creation, the **FileWriter** creates the log directory if it does not already exist (and its parent directories if needed), and writes the graph definition in a binary logfile called an *events file*.

In [0]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In [0]:
with tf.Session() as sess: 
    sess.run(init)

    for epoch in range(n_epochs): 
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()  

The **mse_summary** node will output a summary that you can then write to the events file using the **file_writer**.

Finally, you want to close the FileWriter at the end of the program:

In [0]:
file_writer.close()

In [77]:
best_theta

array([[ 2.070016  ],
       [ 0.8204561 ],
       [ 0.11731729],
       [-0.22739056],
       [ 0.31134024],
       [ 0.00353192],
       [-0.01126995],
       [-0.9164395 ],
       [-0.8795009 ]], dtype=float32)

In [82]:
!tensorboard --logdir tf_logs

TensorBoard 1.11.0 at http://28d6a589cddd:6006 (Press CTRL+C to quit)
s

^C


#Name Scopes
To avoid the graph becoming cluttered with thousands of nodes, you can create *name scopes* to group related nodes.

For example, we can define the **error** and **mse** ops within a name scope called **loss**

In [0]:
tf.reset_default_graph()
now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")

In [0]:
with tf.name_scope('loss') as scope:
  error = y_pred - y
  mse = tf.reduce_mean(tf.square(error), name='mse')

In [0]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

In [86]:
n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

file_writer.flush()
file_writer.close()
print("Best theta:")
print(best_theta)

Best theta:
[[ 2.070016  ]
 [ 0.8204561 ]
 [ 0.11731729]
 [-0.22739056]
 [ 0.31134024]
 [ 0.00353192]
 [-0.01126995]
 [-0.9164395 ]
 [-0.8795009 ]]


In [87]:
print(error.op.name)

loss/sub


In [88]:
print(mse.op.name)

loss/mse


In TensorBoard, the mse and error nodes now appear inside the loss namespace, which appears collapsed by default.

Another toy example,

In [89]:
tf.reset_default_graph()

a1 = tf.Variable(0, name='a')    # name == "a"
a2 = tf.Variable(0, name='a')    # name == "a_1"

with tf.name_scope('param'):     # name == "param"
  a3 = tf.Variable(0, name='a')  # name == "param/a"

with tf.name_scope('param'):     # name == "param_1"
  a4 = tf.Variable(0, name='a')  # name == "param_1/a"
  
for node in (a1, a2, a3, a4):
  print(node.op.name)

a
a_1
param/a
param_1/a


#Modularity
Suppose you want to create a graph that adds the output of two rectified linear units(ReLU).

The following code does the job, but it’s quite repetitive:

In [0]:
reset_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z2, 0., name="relu2")

output = tf.add(relu1, relu2, name="output")

TensorFlow allows to create a function to build a ReLU. The **add_n()** creates an operation that will compute the sum of a list of tensors

In [0]:
tf.reset_default_graph()

def relu(X):
  w_shape = (int(X.get_shape()[1]), 1)
  w = tf.Variable(tf.random_normal(w_shape), name='weights')
  b = tf.Variable(0.0, name='bias')
  z = tf.add(tf.matmul(X, w), b, name='z')
  return tf.maximum(z, 0., name='relu')

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

In [0]:
file_writer = tf.summary.FileWriter('logs/relu1', tf.get_default_graph())

Using name scopes, you can make the graph much clearer.

In [0]:
tf.reset_default_graph()

def relu(X):
  with tf.name_scope('relu'):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name='weights')
    b = tf.Variable(0.0, name='bias')
    z = tf.add(tf.matmul(X, w), b, name='z')
    return tf.maximum(z, 0., name='max')

In [0]:
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name='X')
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name='output')

file_writer = tf.summary.FileWriter('logs/relu2', tf.get_default_graph())
file_writer.close()

#Sharing Variables
If you want to share a variable between various components of your graph, one simple option is to create it first, then pass it as a parameter to the functions that need it.

For example, suppose you want to control the ReLU threshold using a shared threshold variable for all ReLUs.

In [0]:
tf.reset_default_graph()

def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1) 
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias") 
        z = tf.add(tf.matmul(X, w), b, name="z") 
        return tf.maximum(z, threshold, name="max")

threshold = tf.Variable(0.0, name="threshold")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

# create threshold variable then pass it to the relu()
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name="output")

However, if there are many shared parameters such as this one, it will be painful to have to pass them around as parameters all the time.

another option is to set the shared variable as an attribute of the relu() function upon the first call,

In [0]:
tf.reset_default_graph()

def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = int(X.get_shape()[1]), 1  
        w = tf.Variable(tf.random_normal(w_shape), name="weights") 
        b = tf.Variable(0.0, name="bias") 
        z = tf.add(tf.matmul(X, w), b, name="z")  
        return tf.maximum(z, relu.threshold, name="max")

In [0]:
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

TensorFlow offers another option, which may lead to slightly cleaner and more modular code than the previous solutions. 

Use the get_variable() function to create the shared variable if it does not exist yet, or reuse it if it already exists. The desired behavior (creating or reusing) is controlled by an attribute of the current **variable_scope()**.

For example, the following code will create a variable named **"relu/threshold"** (as a scalar, since **shape=()**, and using **0.0** as the initial value):

In [0]:
tf.reset_default_graph()

with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))

Note that if the variable has already been created by an earlier call to **get_variable()**, this code will raise an exception.

If you want to reuse a variable, you need to explicitly say so by setting the variable scope’s **reuse** attribute to **True**

In [0]:
with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

Alternatively, you can set the reuse attribute to True inside the block by calling the scope’s **reuse_variables()** method:

In [0]:
with tf.variable_scope("relu") as scope:
    # implement here
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

Once reuse is set to True, it cannot be set back to False within the block. Moreover, if you define other variable scopes inside this one,
they will automatically inherit **reuse=True**. Lastly, only variables created by **get_variable()** can be reused this way.

Now you have all the pieces you need to make the relu() function access the threshold variable without having to pass it as a parameter:

In [0]:
tf.reset_default_graph()

def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold")
        w_shape = int(X.get_shape()[1]), 1  
        w = tf.Variable(tf.random_normal(w_shape), name="weights") 
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z") 
        return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")

This code first defines the **relu()** function, then creates the **relu/threshold** variable (as a scalar that will later be initialized to **0.0**) and builds five ReLUs by calling the **relu()** function. The **relu()** function reuses the **relu/threshold** variable, and creates the other ReLU nodes.

In [0]:
tf.reset_default_graph()

def relu(X):
    with tf.variable_scope("relu"):
        threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("", default_name="") as scope:
    first_relu = relu(X)     # create the shared variable
    scope.reuse_variables()  # then reuse it
    relus = [first_relu] + [relu(X) for i in range(4)]
output = tf.add_n(relus, name="output")

file_writer = tf.summary.FileWriter("logs/relu8", tf.get_default_graph())
file_writer.close()

The following code creates the threshold variable within the **relu()** function upon the first call, then reuses it in subsequent calls.

Now the **relu()** function does not have to worry about name scopes or variable sharing: it just calls **get_variable()**, which will create or reuse the threshold variable (it does not need to know which is the case). The rest of the code calls **relu()** five times, making sure to set **reuse=False** on the first call, and **reuse=True** for the other calls.

In [0]:
tf.reset_default_graph()

def relu(X):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
    w_shape = (int(X.get_shape()[1]), 1)                        # not shown in the book
    w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
    b = tf.Variable(0.0, name="bias")                           # not shown
    z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
    return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = []
for relu_index in range(5):
    with tf.variable_scope("relu", reuse=(relu_index >= 1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name="output")

In [0]:
file_writer = tf.summary.FileWriter("logs/relu9", tf.get_default_graph())
file_writer.close()

#Extra explanation on name scope and variable scope

In [95]:
tf.reset_default_graph()

with tf.variable_scope("my_scope"):
    x0 = tf.get_variable("x", shape=(), initializer=tf.constant_initializer(0.))
    x1 = tf.Variable(0., name="x")
    x2 = tf.Variable(0., name="x")

with tf.variable_scope("my_scope", reuse=True):
    x3 = tf.get_variable("x")
    x4 = tf.Variable(0., name="x")

with tf.variable_scope("", default_name="", reuse=True):
    x5 = tf.get_variable("my_scope/x")

print("x0:", x0.op.name)
print("x1:", x1.op.name)
print("x2:", x2.op.name)
print("x3:", x3.op.name)
print("x4:", x4.op.name)
print("x5:", x5.op.name)
print(x0 is x3 and x3 is x5)

x0: my_scope/x
x1: my_scope/x_1
x2: my_scope/x_2
x3: my_scope/x
x4: my_scope_1/x
x5: my_scope/x
True


The first **variable_scope()** block first creates the shared variable **x0**, named **my_scope/x**. For all operations other than shared variables (including non-shared variables), the variable scope acts like a regular name scope, which is why the two variables **x1** and **x2** have a name with a prefix **my_scope/**. Note however that TensorFlow makes their names unique by adding an index: **my_scope/x_1** and **my_scope/x_2*.

The second **variable_scope()** block reuses the shared variables in scope **my_scope**, which is why **x0** is **x3**. Once again, for all operations other than shared variables it acts as a named scope, and since it's a separate block from the first one, the name of the scope is made unique by TensorFlow (**my_scope_1**) and thus the variable **x4** is named **my_scope_1/x**.

The third block shows another way to get a handle on the shared variable **my_scope/x** by creating a **variable_scope()** at the root scope (whose name is an empty string), then calling **get_variable()** with the full name of the shared variable (i.e. **"my_scope/x"**).



#Strings

In [96]:
tf.reset_default_graph()

text = np.array("Do you want some café?".split())
text_tensor = tf.constant(text)

with tf.Session() as sess:
    print(text_tensor.eval())

[b'Do' b'you' b'want' b'some' b'caf\xc3\xa9?']
