# Up and Running with TensorFlow

_TensorFlow_ is a powerful OSS library well-suited to large scale macine learning tasks. It can run tasks in parallel across multiple CPUs and GPUs and train a network of with millions parameters, billions of instances, and millions of features. In short it's flexible, scalable, and production-ready. Here's some of its qualities:
* Runs on Windows, Linux, macOS, iOS, and Android.
* Uses a simple Python API called _TF.Learn_ allowing users to train a neaural net with just a few lines of code.
* Extensible through plugins like Keras or Pretty Tensor
* Uses highly efficient C++ implementations of many standard ML operations, with a focus on neural networks.
* Provides several advanced optimization nodes to minimize cost functions. This is done through TensorFlow's _automatic differentiating_, it's ability to automatically compute gradients of defined cost functions.
* _TensorBoard_, a visualization tool for viewing computational graphs, learning curves, etc.
* Google's cloud computing API
* Dedicated development community


## Creating Your First Graph and Running It in a Session

Here's code for a simple graph:

In [2]:
import tensorflow as tf

x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

The most important thing to understand is this code doesn't actually compute anything: it just sets up a computation graph. To evaluate the graph we'll need to set up a TensorFlow _session_ and use it to initialize our variables and the function `f`. The session will take care of placing variables and operations onto _devices_ such as CPUs and GPUs and running them. Here's code that creates a session, initializes the variables, then evaluates `f` and closes the session:

In [3]:
sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()

42


Or written a different way:

In [4]:
with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
    print(result)

42


Inside the `with` block, the session is set as the default session. Calling `x.initializer.run()` is equivalent to calling `tf.get_default_session().run(x.initializer)`. Similaraly, `f.eval()` is equivalent to `tf.get_default_session().run(f)`. The session is also conveniently closed at the end of the with block.

Instead of manually initializing each variable, we can use the `global_variables_initializer()` function to do it for us. It won't perform the initalization immidiately, but will instead create a node in our graph to do it when run:

In [5]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run() #actually initialize the variables
    result = f.eval()
    print(result)

42


In Jupyter or the Python shell, it may be easier to create an InteractiveSession, which automatically sets itself as the default session. We won't need a `with` block, but we will need to manually close the session.

In [6]:
sess = tf.InteractiveSession()
init.run()
result = f.eval()
print(result)
sess.close()

42


TensorFlow programs are typically split into two parts: the first builds a computation graph (called the _construction phase_) and the second runs it (the _execution phase_). The construction phase builds the computation graph which usually represents an ML model and the computations required to train it. The execution phase runs a loop that evaluates a training step repeatedly, gradually improving the model parameters.

## Managing Graphs

Any node created is automatically added to the default graph:

In [7]:
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

True

This is usually fine, but we may want to use multiple independent graphs. We can do this by making a new Graph and temporarily making it the default graph inside a with block:

In [8]:
graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
print(x2.graph is graph)
print(x2.graph is tf.get_default_graph())

True
False


*Note: in Jupyter or the Python shell, running the same command multiple times can result in a graph which contains duplicate nodes. We can reset the graph by restarting the Jupyter kernel or Python shell, or we can use `tf.reset_default_graph()` method*

## Lifecycle of a Node Value

TensorFlow automatically identifies the dependencies of a node and evaluates those dependencies first, like so:

In [9]:
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(y.eval())
    print(z.eval())

10
15


First, the code defines a very simple graph. Then it detects that `y` depends on `x`, which depends on `w`, so it first evaluates `w`, then `x`, then `y`. Then it detects `z` depends on `x`, which depends on `w`, and evaluates `w`, then `x`, then `z`. TensorFlow does not reuse the result of the previous calculation of `x` and `w` and evaluates these variables twice.

All node values are dropped between graph runs except variables, whose life is maintained from their initializer call until the session is closed.

To efficiently evaluate `y` and `z` in the above example, we must ask TensorFlow to evaluate both `y` and `z` in just one graph run like so:

In [10]:
with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val)
    print(z_val)

10
15


*Warn: in single-process TensorFlow, multiple sessions do not share any state even when reusing the same graph. In distributed TensorFlow (chapter 12) variable state is stored on the servers, not sessions, so multiple sessions can share the same variables.*

## Linear Regression with TensorFlow

TensorFlow operations (or _ops_) can take any number of inputs and return any number of outputs. _Source ops_ like constants and variables take no inputs. The inputs and outputs are multidimensional arrays called _tensors_ and have a type and shape like NumPy arrays. The Python API actually represents the values with NumPy ndarrays. They typically contain floats, but can also contain strings or other arbitrary byte arrays.

In this example, we will manipulate the California housing dataset from chapter 2. It starts by fetching the data, then it adds an extra input bias feature ($x_0 = 1$) to all training instances using NumPy. It then creates two TensorFlow nodes, `X` and `y`, to hold the data and target values. After this it uses some matrix operations to define `theta`. These functions (`transpose()`, `matmul()`, and `matrix_inverse()`) do not perform computations immidiately but are represented as nodes in the graph to be performed when the graph is ran. `theta` corresponds directly to $(\hat{\theta} = X^T \cdot X)^{-1} \cdot X^T \cdot y$. Finally, the code creates a session and uses it to evaluate `theta`:

In [11]:
import numpy as np
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

The main benefit of this code over NumPy is that the Normal Equation computation will automatically be done on the GPU if available.

## Implementing Gradient Descent

Here we'll implement Batch Gradient Descent (Chapter 4) in TensorFlow. We'll manually compute the gradients, then use TensorFlow's autodiff feature to allow TensorFlow to automatically compute the gradients before we use some built-in optimizers.

*Warn: Remember to normalize the input feature vectors to drastically improve training time. This can be done in any of the frameworks we've used already. The code assumes we've already done this.*

### Manually Computing the Gradients

A few notes before we get into the code:
* The `random_uniform()` function creates a node in the graph that generates random values given it's shape and value range.
* The `assign()` function assigns a new value to a variable. We use it below in the Gradient Descent step $\theta^{(next step)} = \theta - \eta \nabla_{\theta}MSE(\theta)$
* The main loop executes the training ste `n_epoch` times, printing the MSE every 100 iterations. The MSE should go down every iteration.

In [17]:
n_epochs = 1000
learning_rate = 0.01

#Scale housing data with norm found with numpy
scaled_housing_data_plus_bias = housing_data_plus_bias / np.linalg.norm(housing_data_plus_bias)

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE = ", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

Epoch 0 MSE =  5.62933
Epoch 100 MSE =  5.62883
Epoch 200 MSE =  5.62833
Epoch 300 MSE =  5.62781
Epoch 400 MSE =  5.62732
Epoch 500 MSE =  5.62682
Epoch 600 MSE =  5.62632
Epoch 700 MSE =  5.62581
Epoch 800 MSE =  5.62531
Epoch 900 MSE =  5.62481


### Using autodiff

Although manually computing the gradient isn't necessarily hard for tasks like Linear Regression, this would be much more difficult and error-prone to perform on a neural network. We could use _symbolic differentiation_ to find the equations for the partial derivatives, but even then the code would likely not be very efficient. 

For example, take the function $f(x) = \exp(\exp(\exp(x)))$. From calculus, $f^{\prime}(x) = \exp(x)\times\exp(\exp(x))\times\exp(\exp(\exp(x)))$. Computing these seperately results in inefficent code which caluculates the exponential nine times. If we had a function that computed $\exp(x)$, then $\exp(\exp(x))$, then $\exp(\exp(\exp(x)))$, and returned all three, we could simply call it three times and be done.

How about the case where our function we need to partially derive is instead defined by arbitrary code? Good luck finding the partial derivatives here:

In [None]:
def my_func(a, b):
    z = 0
    for i in range(100):
        z = a * np.cos(z + i) + z * np.sin(b - i)
    return z

Luckily TensorFlow's autodiff feature does this work of automatically and efficiently computing these gradients for us. Just replace the `gradients =` line with the following line:

In [18]:
gradients = tf.gradients(mse, [theta][0])

The gradients() function takes an op (in this case `mse`) and a list of variables (in this case just `theta`) and creates a list of ops (one for each variable) to compute the gradients on the op with regards to each variable. So the gradients node will compute the gradient vector for MSE with regards to theta.

There are four ways to compute gradients automatically. TensorFlow uses reverse-mode autodiff which is efficient and accurate on a large number of inputs and a few outputs, a common case with neural networks. This can be done in $n_{outputs} + 1$ graph traversals.

Technique | # of graph traversals | Accuracy | Supports arbitrary code | Comment
:--- | :--- | :--- | :--- | :---
Numerical differentiation | $n_{inputs} + 1$ | Low | Yes | Trivial to implement
Symbolic differentiation | N/A | High | No | Builds a very different graph
Forward-mode autodiff | $n_{inputs}$ | High | Yes | Uses dual numbers
Reverse-mode autodiff | $n_{outputs} + 1$ | High | Yes | Implemented in TensorFlow

## Using an Optimizer

Not only does TensorFlow automatically compute the gradient, but it can further simplify ML work by providing a number of built-in optimizers (like the GradientDescentOptimizer). we can replace the `gradients =` and `training_ops =` line with the following lines and everything will work normally:

In [None]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

If we want to use a different optimizer we can make a change that will looks something like this (the momentum optimizer converges much faster thatn Gradient Descent, see chapter 11):

In [None]:
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.9)

## Feeding Data to the Training Algorithm

Let's try to implement Mini-batch Gradient Descent. To start, we need a way to replace `X` and `y` with a mini-batch at each iteration. We can do this by initializing placeholder nodes, which don't perform any computations and output the data we need at runtime. These are usually used in TensorFlow to during training and will throw an exception if it's value is never specified.

We'll set these nodes up with the `placeholder()` function and specify the data type. If needed, we could optionally pass in a shape that needs strict enforcement. Passing `None` to the dimension parameter means any size will be accepted. In the following code, we create a placeholder node `A` and `B = A + 5`. When we evaluate `B`, we pass the `feed_dict` to. the `eval()` method that specifies a value for `A`. `A` must have rank 2 (2-dimensional) and there must be three columns. It can have any number of rows.

In [20]:
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})
    
print(B_val_1)
print(B_val_2)

[[ 6.  7.  8.]]
[[  9.  10.  11.]
 [ 12.  13.  14.]]


*Note: We can feed in the output of **any** operations, not just placeholders.*

To implement Mini-batch Gradient Descent, we only need to tweak the code slightly by changing the definitions for `X` and `y` in the construction phase:

In [21]:
X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

We'll define the batch size and total number of batches:

In [22]:
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

In the execution phase we'll fetch our batches sequentially and supply values to `X` and `y` via `feed_dict` when an evaluation node requires it:

In [None]:
#We would actually need data on the disk to run this
def fetch_batch(epoch, batch_index, batch_size):
    #load data from disk
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        for batch_index in range(batch_size):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    
    best_theta = theta.eval()

*Note: we don't need to pass in `X` or `y` when evaluating `theta` because it doesn't depend on either of them.*

## Saving and Restoring Models

Once a model is trained, it is often beneficial to save the parameters and model to the disk to retrieve and use the model whenever we want. In the event of a crash or other interruption we can also continue use or training from the last checkpoint.

We can do this quickly in TensorFlow with the use of a `Saver` node at the end of the contruction phase (after all variables are created). Then we can call it's `save()` method in the execution phase whenever we want. We'll need to pass in the session we want to save and the checkpoint file path:

In [None]:
#some code
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
#more code
init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if epoch % 100 == 0: #checkpoint every 100 epochs:
            save_path = saver.save(sess, "/tmp/model_partial.ckpt")
        
        sess.run(training_op)
        
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/model_final.ckpt")

Restoring a model is just as easy: we still initialize a saver at the end of the construction phase but this time we call it's `restore()` method at the beginning of the execution phase. This replaces the variable initialization we do with the `init` node.

In [None]:
with tf.Session() as sess:
    saver.restore(sess, "/tmp/model_final.ckpt")

All variables are saved and restored under their own name by default, but we can optionally specify which variables to save/restore and what names to use:

In [25]:
saver = tf.train.Saver({"weights": theta})

## Visualizing the Graph and Training Curves Using TensorBoard

We can use TensorBoard to create interactive displays of things like training stats and our graph definition in our web browser. We can use this visualization to identify errors, find bottlenecks, etc.

First we'll need to tweak our program to write our graph definition and some training stats to a log directory. This directory should change between every run, otherwise TensorBoard will merge stats and give us misleading visuals. We can do this quickly by adding a timestamp to the directory name. Here's what we'll add at the beginning:

In [None]:
from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir="{}/run-{}/".format(root_logdir,now)

And at the end of the construction phase:

In [None]:
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

The first line creates a node to evaluate the MSE value and write it to a TensorBoard compatiable log string called a _summary_. The second line creates a a `FileWriter` to write the summaries to the log directory. The first parameter is the log directory while the second is the optional graph. The `FileWriter` automatically creates the directory if it doesn't exist and writes the graph definition in a binary log file called an _events file_.

Next we need to update the execution phase to evaluate `mse_summary` node regularly. This will output a summary that we can write to the events file using the `file_writer`, like so:

In [None]:
#code
for batch_index in range(n_batches):
    X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
    if batch_index % 10 == 0:
        summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
        step = epoch * n_batches + batch_index
        file_writer.add_summary(summary_str, step)
    sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
# more code

*Warn: logging data at every training step would significantly slow training.*

Finally we'll close the `FileWriter`:

In [None]:
file_writer.close()

Here's the completed program:

In [19]:
#for some reason this doesn't work
from datetime import datetime
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

#X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name="X")
#y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir="{}/run-{}/".format(root_logdir,now)
batch_size = 100
n_batches = int(np.ceil(m / batch_size))
n_epochs = 100
learning_rate = 0.01

def fetch_batch(epoch, batch_index, batch_size):
    if housing.data.size < batch_size*(batch_index + 1):
        X_batch = housing_data_plus_bias[batch_size*batch_index: batch_size*(batch_index + 1)]
        y_batch = housing.target[batch_size*batch_index: batch_size*(batch_index + 1)].reshape(-1,1)
    else:
        X_batch = housing_data_plus_bias[batch_size*batch_index:]
        y_batch = housing.target[batch_size*batch_index:].reshape(-1,1)
   
    return X_batch, y_batch

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")

y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()
mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            save_path = saver.save(sess, "/tmp/model_partial.ckpt")
            
        for batch_index in range(batch_size):
                X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
                if batch_index % 10 == 0:
                    summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                    step = epoch * n_batches + batch_index
                    file_writer.add_summary(summary_str, step)
                sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/model_final.ckpt")
    
file_writer.close()

From here we can open up our directory and type **`ls -l tf_logs/run*`** to list the contents of our log directory. Running the program above a second time will reveal a second log directory.

## Name scopes

To avoid clutter we can use _name scopes_ to group related nodes. We'll modify the previous code to add `error` and `mse` to the `"loss"` name scope:

In [None]:
with tf.name_scope() as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

The name is now prefixed with `"loss/"` and will now appear in the `loss` namespace in TensorBoard.

## Modularity

Suppose we want to add the the output of two _rectified linear units_ (ReLU). These compute a linear function of the inputs and outputs the result if it's positive, and 0 otherwise.

$$h_{w, b}(X) = \max(X\cdot w + b, 0)$$

The following code does this but it's very repetitive:

In [21]:
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z2, 0., name="relu2")

output = tf.add(relu1, relu2, name="output")

This kind of code is error prone and hard to maintain. TensorFlow allows us to stay DRY (Don't Repeat Yourself): simply create a function to build a ReLU:

In [22]:
def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

TensorFlow will automatically checki if a name has been used before, and if it has will append an underscore follwed by an index to keep track of them. These series will appear as a collapsed list in TensorBoard. We can clean this further by adding the content of `relu()` inside a name scope:

In [23]:
def relu(X):
    with tf.name_scope("relu") as scope:
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, 0., name="relu")

## Sharing Variables

A simple option to share a variable between components in a graph is to create it first, then pass it as a parameter to the functions that need it. If we want to control the ReLU threshold using a shared `threshold` variable for all ReLUs, we can create it and pass it to the `relu()` function.

In [24]:
def relu(X, threshold):
    with tf.name_scope("relu") as scope:
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="relu")

This works fine, but when the number of parameters is large it becomes difficult to keep track of which parameters are needed where. Some people use a dictionary of values or a class to control this, and another option might be setting the shared variable as an attribute like so:

In [25]:
def relu(X):
    with tf.name_scope("relu") as scope:
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="max")

TensorFlow adds another option which may result in cleaner, more modular code: we can use the `get_variable()` funtion to create a variable if it doesn't exist or reuse it if it does. The desired behavior is controlled by an attribute of the `variable_scope()`.

In [None]:
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))

If a variable has already been created by `get_variable()` an exception will be raised, which prevents reusing variables by mistake. To reuse a variable we need to set the variable scope's `reuse` attribute to `True`:

In [None]:
with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

This will use the existing `"relu/threshold"` variable or raise an exception if it des not exist or was not created. We can also set this value by using the scope's `reuse_variables()` method.

*Warn: once `reuse` is `True`, it cannot be set back to `False` within the block. Variables defined withing the block will inherit this value. Only variables created with `get_variables()` can be reused this way.*

Now our `relu()` function can access the threshold value without needing it to be passed as a parameter:

In [27]:
def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold")
        w_shape = (int(X.get_shape()[1]), 1)
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        z = tf.add(tf.matmul(X, w), b, name="z")
        return tf.maximum(z, threshold, name="max")
    
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

This code first defines the `relu()` function then creates the `relu/threshold` variable and builds 5 ReLUs with the `relu()` function. The `relu()` function reuses `relu/threshold` variable and creates the other nodes. 

*Note: variables created with `get_variable()` are always named using the name of their `variable_scope` as a prefix. , but for all other nodes the variable scope acts like a new name scope.*

We can move the `threshold` definition inside the `relu()` function such that it is initialized on the first call and reused thereafter:

In [32]:
def relu(X):
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, threshold, name="max")
        
    
n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus =[]
for relu_index in range(5):
    with tf.variable_scope("relu", reuse=(relu_index >= 1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name="output")