# Going Deeper - The Mechanics of TensorFlow

In the previous chapter, we trained a multilayer perceptron to classify MNIST digits, using various aspects of the TensorFlow Python API. That was a great way to dive us straight into some hands-on experience with TensorFlow neural network training and machine learning.

In this chapter, we will now shift our focus squarely on to TensorFlow itself, and explore in detail the impressive mechanics and features that TensorFlow offers:

* Key features and advantages of TensorFlow
* TensorFlow ranks and tensors
* Understanding and working with TensorFlow graphs
* Working with TensorFlow variables
* TensorFlow operations with different scopes
* Common tensor transformations: working with ranks, shapes, and types
* Transforming tensors as multidimensional arrays
* Saving and restoring a model in TensorFlow
* Visualizing neural network graphs with TensorBoard

We will stay hands-on in this chapter, of course, and implement graphs throughout the chapter to explore the main TensorFlow features and concepts. Along the way, we will also revisit a regression model, explore neural network graph visualization with TensorBoard, and suggest some ways that you could explore visualizing more of the graphs that you will make through this chapter. 

# Keys features of TensorFlow

TensorFlow gives us a scalable, multiplataform programming interface for implementing and running machine learning algorithms. The TensorFlow API has been relatively stable and mature since its 1.0 release in 2017. There are other deep learning libraries available, but they are still very experimental by comparison. 

A key feature of TensorFlow that we already noted is its ability to work with single or multiple GPUs. This allows users to train machine learning models very efficiently on large-scale systems. 

TensorFlow has strong growth drivers. Its development in funded and supported by Google, and so a large team of software engineers work on improvements continuosly. TensorFlow also has strong support from open source developers, who avidly contribute and provide user feedback. This has made the TensorFlow library more useful to both academic researchers and developers in their industry. A further consequence of these factors is that TensorFlow has extensive documentation and tutorials to help new users. 

Last but not least among these key features, TensorFlow supports mobile deployment, which makes it a very suitable tool for production. 

# TensorFlow ranks and tensors

The TensorFlow library lets users define operations and functions over tensors as computational graphs. Tensors are a generalizable mathematical notation for multidimensional arrays holding data values, where the dimensionality of a tensor is typically referred to as its **rank**. 

We have worked mostly, so far, with tensors of rank zero to two. For instance, a scalar, a single number such as an integer or float, is a tensor of rank 0. A vector is a tensor of rank 1, and a matrix is a tensor of rank 2. But, it does not stop here. The tensor notation can be generalized to higher dimensions - as we will see in the next chapter, when we work with an input of rank 3 and weight tensors of rank 4 to support images with multiple color channels. 

To make the concept of a **tensor** more intuitive, consider the following figure, which represents tensors of ranks 0 and 1 in the first row, and tensors of ranks 2 and 3 in the second row:

<img src='images/14_01.png'>

# How to get the rank and shape of a tensor

We can use the *tf.rank* function to get the rank of a tensor. It is important to note that *tf.rank* will return a tensor as output, and in order to get the actual value, we will need to evaluate that tensor. 

In addition to the tensor rank, we can also get the shape of a TensoFlow tensor (similar to the shape of a NumPy array). For example, if $x$ is a tensor, we can get its shape using *x.get_shape()*, which will return an object of a special class called *TensorShape*.

See the following examples on how to use the *tf.rank* function and the *get_shape* method of a tensor. The following code example illustrates how to retrieve the rank and shape of the tensor objects in a TensorFlow session:

In [1]:
import tensorflow as tf
import numpy as np

g = tf.Graph()

## define the computation graph
with g.as_default():
    ## define tensors t1, t2, t3
    t1 = tf.constant(np.pi)
    t2 = tf.constant([1, 2, 3, 4])
    t3 = tf.constant([[1, 2], [3, 4]])
    
    ## get their ranks
    r1 = tf.rank(t1)
    r2 = tf.rank(t2)
    r3 = tf.rank(t3)
    
    ## get their shapes
    s1 = t1.get_shape()
    s2 = t2.get_shape()
    s3 = t3.get_shape()
    
    print('Shapes:', s1, s2, s3)
    
with tf.Session(graph=g) as sess:
    print('Ranks:', r1.eval(), r2.eval(), r3.eval())

Shapes: () (4,) (2, 2)
Ranks: 0 1 2


As we can see, the rank of the *t1* tensor is 0 since it is just a scalar (corresponding to the *[]* shape). The rank of the *t2* vector is 1, and since it has four elements, its shape is the one-element tuple *(4, )*. Lastly, the shape of the $2 \times 2$ matrix *t3* is  2; thus, its corresponding shape is given by the *(2, 2)* tuple. 

# Understanding TensorFlow's computation graphs

TensorFlow relies on building a computation graph at its core, and it uses this computation graph to derive relationships between tensors from the input all the way to the output. Let's say, we have rank 0 (scalars) and tensors *a*, *b*, and *c* and we want to evaluate $z = 2 \times (a - b) + c$. This evaluation can be represented as a computation graph, as shown in the following figure: 

<img src='images/14_02.png'>

As we can see, the computation graph is simply a network of nodes. Each node resembles an operation, which applies a function to its input tensor or tensors and returns zero or more tensors as the output. 

TensorFlow builds this computation graph and uses it to compute the gradients accordingly. The individual steps for building and compiling such a computation graph in TensorFlow are as follows: 

1. Instantiate a new, empty computation graph. 
2. Add nodes (the tensors and operations) to the computation graph. 
3. Execute the graph:
    1. Start a new session
    2. Initialize the variables in the graph
    3. Run the computation graph in this session
   
So, let's create a graph for evaluating $z = 2 \times (a - b) + c$, as shown in the previous figure, where *a*, *b*, and *c* are scalars (single numbers). Here, we define them as TensorFlows constants. A graph can be created by calling *tf.Graph()*, then nodes can be added to it as follows:

In [2]:
g = tf.Graph()

with g.as_default():
    a = tf.constant(1, name='a')
    b = tf.constant(2, name='b')
    c = tf.constant(3, name='c')
    
    z = 2*(a-b) + c

In this code, we added nodes to the *g* graph using *with g.as_default()*. If we do not explicitly create a graph, there is always a default graph, and therefore, all the nodes are added to the default graph. In this book, we try to avoid working with the default graph for clarity. This approach is specially useful when we are developing code in a Jupyter notebook, as we avoid pilling up unwanted nodes in the default graph by accident. 

A TensorFlow session is an environment in which the operations and tensors of a graph can be executed. A session object is created by calling *tf.Session* that can receive an existing graph (here, *g*) as an argument, as in *tf.Session(graph=g)*, otherwise, it will launch the default graph, which might be empty. 

After launching a graph in a TensorFlow session, we can execute it nodes; that is, evaluating its tensors or executing its operators. Evaluating each individual tensor involves calling its *eval* method inside the current session. When evaluating a specific tensor in the graph, TensorFlow has to execute all the preceding nodes in the graph until it reaches that particular one. In case there are one or more placeholders, they would need to be fed, as we will see later in the next section. 

Quite similarly, executing operations can be done using a session's *run* method. In the previous example, *train_op* is an operator that does not return any tensor. This operator can be executed as *train_op.run()*. Furthermore, there is a universal way of runnning both tensors and operators: *tf.Session().run()*. Using this method, as we will see later on as well, multiple tensors and operators can be placed in a list or tuple. As a result, *tf.Session().run()* will return a list or tuple of the same size. 

Here, we will launch the previous graph in a TensorFlow session and evaluate the tensor *z* as follows:

In [3]:
with tf.Session(graph=g) as sess:
    print('2*(a-b)+c => ', sess.run(z))

2*(a-b)+c =>  1


Remember that we define tensors and operations in a computation graph context within TensorFlow. A TensorFlow session is then used to execute the operations in the graph and fetch and evaluate the results. 

In this section, we saw how to define a computation graph, how to add nodes to it, and how to evaluate the tensors in a graph within a TensorFlow session. We will now take a deeper look into the different types of nodes that can appear in a computation graph, including placeholders and variables. Along the way, we will see some other operators that do not return a tensor as the output. 

# Placeholders in TensorFlow

TensorFlow has special mechanisms for feeding data. One of these mechanisms is the use of placeholders, which are predefined tensors with specific types and shapes. 

These tensors are added to the computation graph using the *tf.placeholder* function, and they do not contain any data. However, upon the execution of certain nodes in the graph, these placeholders need to be fed with data arrays. 

In the following sections, we will see how to define placeholders in a graph and how to feed them with data values upon execution. 

## Defining placeholders

As you now know, placeholders are defined using the *tf.placeholder* function. When we define placeholders, we need to decide what their shape ant type should be, according to the shape and type of the data that will be fed through them upn execution. 

Let's start with a simple example. In the following code, we will define the same graph that was shown in the previous section for evaluating $z = 2 \times (a-b) + c$. This times, however, we use placeholders for the scalars *a*, *b*, and *c*. Also, we store the intermediate tensors associated with *r1* and *r2*, as follows:

In [5]:
import tensorflow as tf

g = tf.Graph()
with g.as_default():
    tf_a = tf.placeholder(tf.int32, shape=[], name='tf_a')
    tf_b = tf.placeholder(tf.int32, shape=[], name='tf_b')
    tf_c = tf.placeholder(tf.int32, shape=[], name='tf_c')
    
    r1 = tf_a - tf_b
    r2 = 2*r1
    z = r2 + tf_c

In this code, we defined three placeholders, named *tf_a*, *tf_b*, and *tf_c*, using type *tf.int32* (32-bit integers) and set their shape via *shape=[]* since they are scalars (tensors of rank 0). In the current book, we always precede the placeholder objects with *tf_* for clarity and to be able to distinguish them from the other tensors. 

Note that in the previous code example, we were dealing with scalars, and therefore, their shapes were specified as *shape=[]* However, it is very straightforward to define placeholders of higher dimensions. For example, a rank 3 placeholder of type *float* and shape $3 \times 4 \times 5$ can be defined as *tf.placeholder(dtype=tf.float32, shape=[2, 3, 4])*.

## Feeding placeholders with data

When we execute a node in the graph, we need to create a Python **dictionary** to feed the values of placeholders with data arrays. We do this according to the type and shape of the placeholders. This dictionary is passed as the input argument *feed_dict* to a session's *run* method. 

In the previous graph, we added three placeholders of the type *tf.int32* to feed scalars for computing *z*. Now, in order to evaluate the result tensor *z*, we can feed arbitrary integer values (here, 1, 2, and 3) to the placeholders, as follows:

In [6]:
with tf.Session(graph=g) as sess:
    feed = {tf_a: 1, tf_b: 2, tf_c: 3}
    print('z:', sess.run(z, feed_dict=feed))

z: 1


This means that having extra arrays for placeholders does not cause any error; it is just redundant to do so. However, if a placeholder is needed for the execution of a particular node, and is not provided via the *feed_dict* argument, it will cause a runtime error. 

## Defining placeholders for data arrays with varying batchsizes

Sometimes, when we are developing a neural network model, we may deal with mini-batches of data that have different sizes. For example, we may train a neural network with a specific mini-batch size, but we want to use the network to make predictions on one or more data point. 

A useful feature of placeholders is that can specify *None* for the dimension that is varying in size. For example, we can create a placeholder of rank 2, where the first dimension is unknown (or may vary), as shown here:

In [7]:
import tensorflow as tf

g = tf.Graph()

with g.as_default():
    tf_x = tf.placeholder(tf.float32, 
                          shape=[None, 2], 
                          name='tf_x')
    
    x_mean = tf.reduce_mean(tf_x, axis=0, name='mean')

Then, we can evaluate *x_mean* with two different input, *x1* and *x2*, which are NumPy arrays of shape *(5, 2)* and *(10, 2)*, as follows:

In [10]:
import numpy as np

np.random.seed(123)
np.set_printoptions(precision=2)

with tf.Session(graph=g) as sess:
    x1 = np.random.uniform(low=0, high=1, size=(5, 2))
    print('Feeding data with shape ', x1.shape)
    print('Result:', sess.run(x_mean, feed_dict={tf_x: x1}))
    
    x2 = np.random.uniform(low=0, high=1, size=(10, 2))
    print('Feeding data with shape ', x2.shape)
    print('Result:', sess.run(x_mean, feed_dict={tf_x: x2}))

Feeding data with shape  (5, 2)
Result: [0.62 0.47]
Feeding data with shape  (10, 2)
Result: [0.46 0.49]


Lastly, if we try printing the object *tf_x*, we will get *Tensor("tf_x:0", shape=(?, 2), dtype=float32)*, which shows that the shape of this tensor is *(?, 2)*

# Variables in TensorFlow

In the context of TensorFlow, variables are a special type of tensor objects that allow us to store and update the parameters of our models in a TensorFlow session during training. The following sections explain how we can define variables in a graph, initialize those variables in a session, organize variables via the so-called variable scope, and reuse existing variables. 

## Defining variables

TensorFlow variables store the parameters of a model than can be updated during training, for example, the weights in the input, hidden and output layers of a neural network. When we define a variable, we need to initialize it with a tensor of values. 

TensorFlow provides two ways for dealing with variables: 
* *tf.Variable(<initial-value>, name='variable-name')*
* *tf.get_variable(name, ...)*

The first one, *tf.Variable*, is a class that creates an object for a new variable and adds it to the graph. Note that *tf.Variable* does not have an explicit way to determine *shape* and *dtype*; the shape and type are set to be the same as those of the initial values. 

The second option, *tf.get_variable*, can be used to **reuse** an existing variable with a given name (if the name exists in the graph) or create a new one if the name does not exist. In this case, the name becomes critical; that is probably why it has to be placed as the first argument to this function. Furthermore, *tf.get_variable* provides an explicit way to set *shape* and *dtype*; these parameters are only required when creating a new variable, not reusing existing ones. 

The advantage of *tf.get_variable* over *tf.Variable* is twofold: *tf.get_variable* allows us to reuse existing variables and it already uses the popular Xavier/Glorot initialization scheme by default. Besides the initializer, the *get_variable* function provides other parameters to control the tensor, such as adding a regularizer for the variable. 

**Xavier/Glorot initialization**

In the early development of deep learning, it was observed that random uniform or random normal weight initialization could often result in a poor performance of the model during training. 

In 2010, Xavier Glorot and Yoshua Bengio investigated the effect of initialization and proposed a novel, more robust initialization scheme to facilitate the training of deep networks. 

The general idea behind Xavier initialization is to roughly balance the variance of the gradients across different layers. Otherwise, one layer may get too much attention during training while the other layer lags behind. 

According to the research paper by Glorot and Bengio, if we want to initialize the weights from uniform distribution, we should choose the interval of this uniform distribution as follows: 

$$W ~ Uniform \left(-\frac{\sqrt{6}}{\sqrt{n_{in} + n_{out}}}, \frac{\sqrt{6}}{\sqrt{n_{in} + n_{out}}} \right)$$

Here, $n_{in}$ is the number of input neurons that are multiplied with the weights, and $n_{out}$ is the number of output neurons that feed into the next layer. For initializing the weights from Gaussian (normal) distribution, the authors recommended choosing the standard deviation of this Gaussian to be $\sigma = \frac{\sqrt{2}}{\sqrt{n_{in} + n_{out}}}$.

TensorFlow support Xavier initialization in both uniform and normal distributions of weights. 

In either initialization technique, it is important to note that the initial values are not set until we launch the graph in *tf.Session* and explicitly run the initializer operator in that session. In fact, the required memory for a graph is not allocated until we initialize the variables in a TensorFlow session. 

Here is an example of creating a variable object where the initial values are created from a NumPy array. The *dtype* data type of this tensor is *tf.int64*, which is automatically **inferred** from its NumPy array input: 

In [11]:
import tensorflow as tf
import numpy as np

g1 = tf.Graph()

with g1.as_default():
    w = tf.Variable(np.array([[1, 2, 3, 4], 
                              [5, 6, 7, 8]]), name='w')
    print(w)

<tf.Variable 'w:0' shape=(2, 4) dtype=int64_ref>


## Initializing variables

Here, it is critical to understand that tensors defined as variables are not allocated in memory and contain no values until they are initialized. Therefore, before executing any node in the computation graph, we *must* initialize the variables that are within the path to the node that we want to execute. 

This initialization process refers to allocating memory for the associated tensors and assigning their initial values. TensorFlow provides a function named *tf.global_variables_initializer* that returns an operator for initializing all the variables that exist in a computation graph. Then, executing this operator will initialize the variables as follows:

In [13]:
with tf.Session(graph=g1) as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(w))

[[1 2 3 4]
 [5 6 7 8]]


We can also store this operator in an object such as *init_op = tf.global_variables_initializer()* and execute this operator later using *sess.run(init_op)* or *init_op.run()*. However, we need to make sure that this operator is created after we define all the variables. 

For example, in the following code, we define the variable *w1*, then we define the operator *init_op*, followed by the variable *w2*: 

In [14]:
import tensorflow as tf

g2 = tf.Graph()

with g2.as_default():
    w1 = tf.Variable(1, name='w1')
    init_op = tf.global_variables_initializer()
    w2 = tf.Variable(2, name='w2')

Now, let's evaluate *w1* as follows:

In [15]:
with tf.Session(graph=g2) as sess:
    sess.run(init_op)
    print('w1:', sess.run(w1))

w1: 1


This works fine. Now, let's try evaluating *w2*: 

In [16]:
with tf.Session(graph=g2) as sess:
    sess.run(init_op)
    print('w2:', sess.run(w2))

FailedPreconditionError: Attempting to use uninitialized value w2
	 [[Node: _retval_w2_0_0 = _Retval[T=DT_INT32, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](w2)]]

As shown in the code example, executing the graph raises an error because *w2* was not initialized via *sess.run(init_op)*, and therefore, could not be evaluated. The operator *init_op* was defined prior to adding *w2* to the graph; thus, executing *init_op* will not initialize *w2*. 

## Variable scope

In this subsection, we are going to discuss *scoping*, which is an important concept in TensorFlow, and especially useful if we are constructing large neural network graphs. 

With variable scopes, we can organize the variables into separate subparts. When we create a variable scope, the name of operations and tensors that are created within that scope are prefixed with that scope, and those scopes can further be nested. For example, if we have two subnetworks, where each subnetwork has several layers, we can define two scopes named *'net_A'* and *'net_B'*, respectively. Then, each layer will be defined within one of these scopes. 

Let's see how the variable names will turn out in the following code example:

In [17]:
import tensorflow as tf

g = tf.Graph()

with g.as_default():
    with tf.variable_scope('net_A'):
        with tf.variable_scope('layer-1'):
            w1 = tf.Variable(tf.random_normal(shape=(10, 4)), name='weights')
        with tf.variable_scope('layer-2'):
            w2 = tf.Variable(tf.random_normal(shape=(20, 10)), name='weights')
    with tf.variable_scope('net_B'):
        with tf.variable_scope('layer-1'):
            w3 = tf.Variable(tf.random_normal(shape=(10, 4)), name='weights')
        
    print(w1)
    print(w2)
    print(w3)

<tf.Variable 'net_A/layer-1/weights:0' shape=(10, 4) dtype=float32_ref>
<tf.Variable 'net_A/layer-2/weights:0' shape=(20, 10) dtype=float32_ref>
<tf.Variable 'net_B/layer-1/weights:0' shape=(10, 4) dtype=float32_ref>


Notice that the variables names are now prefixed with their nested scopes, separated by the forward slash *'/'* symbol. 

## Reusing variables

Let's imagine that we are developing a somewhat complex neural network model that has a classifier whose input data comes from more than once source. For example, we will assume that we have data $(X_A, y_A)$ coming from source $A$ and data $(X_B, y_B)$ comes from the source $B$. In this example, we will design our graph in such a way that it will use the data from only one source as input tensor to build the network. Then, we can feed the data from the other source to the same classifier. 

In the following example, we assume that data from source $A$ is fed through placeholder, and source $B$ is the output of a generator network. We will build by calling the *build_generator* function within the *generator* scope, then we will add a classifier by calling *build_classifier* within the *classifier* scope: 

In [18]:
import tensorflow as tf

def build_classifier(data, labels, n_classes=2):
    data_shape = data.get_shape().as_list()
    weights = tf.get_variable(name='weights', 
                              shape=(data_shape[1], 
                                     n_classes), 
                              dtype=tf.float32)
    bias = tf.get_variable(name='bias', 
                           initializer=tf.zeros(shape=n_classes))
    logits = tf.add(tf.matmul(data, weights), bias, name='logits')
    return logits, tf.nn.softmax(logits)

def build_generator(data, n_hidden):
    data_shape = data.get_shape().as_list()
    w1 = tf.Variable(tf.random_normal(shape=(data_shape[1], n_hidden)), name='w1')
    b1 = tf.Variable(tf.zeros(shape=n_hidden), name='b1')
    hidden = tf.add(tf.matmul(data, w1), b1, name='hidden_pre-activation')
    hidden = tf.nn.relu(hidden, 'hidden_activation')
    
    w2 = tf.Variable(tf.random_normal(shape=(n_hidden, data_shape[1])), name='w2')
    b2 = tf.Variable(tf.zeros(shape=data_shape[1]), name='b2')
    
    output = tf.add(tf.matmul(hidden, w2), b2, name='output')
    return output, tf.nn.sigmoid(output)

batch_size = 64
g = tf.Graph()

with g.as_default():
    tf_X = tf.placeholder(shape=(batch_size, 100), 
                          dtype=tf.float32, 
                          name='tf_X')
    with tf.variable_scope('generator'):
        gen_out1 = build_generator(data=tf_X, n_hidden=50)
    
    with tf.variable_scope('classifier') as scope:
        cls_out1 = build_classifier(data=tf_X, 
                                    labels=tf.ones(shape=batch_size))
        scope.reuse_variables()
        cls_out2 = build_classifier(data=gen_out1[1], labels=tf.zeros(shape=batch_size))

Notice that we have called the *build_classifier* function two times. The first call causes the building of the network. Then, we call *scope.reuse_variables()* and call that function again. As a result, the second call does not create new variables; instead, it reuses the same variables. Alternatively, we could reuse the variables by specifying the *reuse=True* parameter, as follows:

In [19]:
g = tf.Graph()

with g.as_default():
    tf_X = tf.placeholder(shape=(batch_size, 100), 
                          dtype=tf.float32, 
                          name='tf_X')
    with tf.variable_scope('generator'):
        gen_out1 = build_generator(data=tf_X, n_hidden=50)
    
    with tf.variable_scope('classifier'):
        cls_out1 = build_classifier(data=tf_X, 
                                    labels=tf.ones(shape=batch_size))
        
    with tf.variable_scope('classifier', reuse=True):
        cls_out2 = build_classifier(data=gen_out1[1], labels=tf.zeros(shape=batch_size))

While we have discussed how to define computational graphs and variables in TensorFlow, a detailed discussion of how we can compute gradients in a computational graph is beyond the scope of this book, where we use TensorFlow's convenient optimizer classes that perform backpropagation automatically for us. 