# TensorFlow tutorial
_MILA, November 2017_

## Protip: browsing through the TensorFlow API

The [devdocs.io](http://devdocs.io/) website is an amazing resource to browse through the TensorFlow Python API (as well as many other APIs such as numpy or the Python API itself).

# Using TensorFlow at MILA

The most straightforward way to access TensorFlow using the MILA software stack is through the `TODO: WRITEME` conda environment. To activate the `TODO: WRITEME` environment, use the command

```bash
source activate TODO: WRITEME
```

To return back to normal, simply use the bash command

```bash
source deactivate
```

# Installing TensorFlow outside MILA

Follow the [online documentation](https://www.tensorflow.org/install/), which describes how to install TensorFlow for all major platforms (Linux, macOS, Windows) in various ways (`virtualenv`, native `pip`, Docker, Anaconda).

# Importing TensorFlow

TensorFlow is imported as a Python package using the following statement:

In [1]:
import tensorflow as tf

_Note: the `tensorflow` package is usually aliased to `tf` for convenience._

# Paradigm

TensorFlow separates the _definition_ of computation from its _execution_.

Computation is defined via a [_dataflow_](https://en.wikipedia.org/wiki/Dataflow_programming) _graph_, i.e., a graph where nodes represent units of computation and the edges represent the data consumed or produced by the computation.

TensorFlow calls these edges _tensors_ (not to be confused with the mathematical object of the same name). In TensorFlow parlance, a tensor is simply a multi-dimensional array of a certain data type.

# Constant, variable, placeholder, and random tensors

Many types of tensors may be used as input to the computation graph. We will cover four of them here: constant, variable, placeholder, and random tensors.

## Constant

A constant tensor always evaluates to the same value. It can be created using the `tf.constant` function:

In [2]:
c = tf.constant(value=42.0, name='c')
print(c)

Tensor("c:0", shape=(), dtype=float32)


The `name` argument is not strictly necessary, but it is considered good practice to name things in TensorFlow, as it facilitates visualizing the computation graph and debugging.

To get the value associated with a constant tensor, we evaluate it within a session:

In [3]:
with tf.Session() as session:
    print('The value for c is {}'.format(session.run(c)))

The value for c is 42.0


The value for a constant _always_ stays the same, be it within the same session or across different sessions:

In [4]:
with tf.Session() as session:
    print('The value for c is {}'.format(session.run(c)))
    print('The value for c is {}'.format(session.run(c)))

with tf.Session() as session:
    print('The value for c is {}'.format(session.run(c)))

The value for c is 42.0
The value for c is 42.0
The value for c is 42.0


## Variable

It can be useful for an input tensor's value to evolve across the lifetime of a session. For instance, a tensor's value can represent the weights of a neural network which we want to update using gradient descent.

Tensors with this property are called _variables_. The preferred way to create variables is via `tf.get_variable`:

In [5]:
v = tf.get_variable(
    name='v', shape=[2], dtype=tf.float32,
    initializer=tf.zeros_initializer())
print(v)

<tf.Variable 'v:0' shape=(2,) dtype=float32_ref>


This time, the `name` argument is required. This is because TensorFlow refers to variables by name. As such, TensorFlow expects the name for the variable to be unique. Trying to create a variable with the same name will result in an error:

In [6]:
# Throughout this tutorial, we will wrap statements that
# we know will cause an error to be raised with a try-except
# block to print the error message only, and not the whole
# stack trace.
try:
    tf.get_variable(name='v')
except ValueError as e:
    print(str(e).split('\n')[0])

Variable v already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:


_(In case you are wondering, there is a way to bypass this behavior and retrieve by name a variable which has already been created. More on that later.)_

Note that a variable's value only makes sense _within the context of a session_. Furthermore, a variable's initial value has to be set before it can be used. See what happens if we try to evaluate `v` within a session:

In [7]:
with tf.Session() as session:
    try:
        session.run(v)
    except tf.errors.FailedPreconditionError as e:
        print(e)

Attempting to use uninitialized value v
	 [[Node: _retval_v_0_0 = _Retval[T=DT_FLOAT, index=0, _device="/job:localhost/replica:0/task:0/cpu:0"](v)]]


TensorFlow provides a function, `tf.global_variables_initializer`, which returns an op that can be evaluated to do just that:

In [8]:
with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    print('The value for v is {}'.format(session.run(v)))

The value for v is [ 0.  0.]


A variable's value persists across a session unless it is updated by running an assignment op. For instance, the op returned by `tf.assign_add` can be used to increment a variable's value:

In [9]:
with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    # The value for v persists across session.run calls...
    print('The value for v is {}'.format(session.run(v)))
    print('The value for v is {}'.format(session.run(v)))
    # ... until it is updated by running an assignment op.
    session.run(v.assign_add([1, 2]))
    print('The value for v is {}'.format(session.run(v)))

The value for v is [ 0.  0.]
The value for v is [ 0.  0.]
The value for v is [ 1.  2.]


In addition to `tf.assign_add`, the `tf.assign_sub` and `tf.assign` functions return ops which decrement and assign a variable's value, respectively.

## Placeholder

Oftentimes the computation we define depends on data which we don't yet have. For instance, the output of a neural network depends on a user-defined input which is only specified at runtime.

_Placeholder_ tensors are used to represent this data. They can be created via `tf.placeholder`:

In [10]:
p = tf.placeholder(dtype=tf.float32, shape=[], name='p')
print(p)

Tensor("p:0", shape=(), dtype=float32)


Once again, the `name` argument is optional, but it is good practice to provide it.

Because it has no pre-defined value, evaluating a placeholder tensor raises an error:

In [11]:
with tf.Session() as session:
    try:
        session.run(p)
    except tf.errors.InvalidArgumentError as e:
        # Cutting through the error message...
        print('\n'.join(str(e).split('\n')[-3:]))

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'p' with dtype float
	 [[Node: p = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]



Its value must be _explicitly_ passed to `session.run` via the `feed_dict` argument, which expects a `dict` mapping tensors to their value:

In [12]:
with tf.Session() as session:
    # feed p the value 42
    print('The value for p is {}'.format(session.run(p, feed_dict={p: 42})))
    # feed p the value 21
    print('The value for p is {}'.format(session.run(p, feed_dict={p: 21})))

The value for p is 42.0
The value for p is 21.0


## Random

Another useful input tensor to have in our toolbox is the random tensor. The random seed can be set globally via `tf.set_random_seed`:

In [13]:
tf.set_random_seed(1234)

There are many random distributions to choose from in TensorFlow. Let's look at `tf.random_uniform`:

In [14]:
r = tf.random_uniform(
    shape=[], minval=0.0, maxval=1.0, dtype=tf.float32, name='r')

A random tensor's value changes randomly between `session.run` calls, but the sequence of those random values stays the same across different sessions:

In [15]:
with tf.Session() as session:
    print('The value for r is {}'.format(session.run(r)))
    print('The value for r is {}'.format(session.run(r)))

with tf.Session() as session:
    print('The value for r is {}'.format(session.run(r)))
    print('The value for r is {}'.format(session.run(r)))

The value for r is 0.8478444814682007
The value for r is 0.23446130752563477
The value for r is 0.8478444814682007
The value for r is 0.23446130752563477


# Combining tensors

Tensors can be combined in various ways using what TensorFlow calls operations, or _ops_. Ops can take zero or more tensors as input and produce zero or more tensors as output, with our without side effects.

We have already dealt with ops when initializing or assigning values to variables, but there are _many_ more TensorFlow functions which can be used to create ops.

The best way to discover new useful ops is to browse the [TensorFlow Python API](https://www.tensorflow.org/api_docs/python/). For instance, we can discover that there exists a function, `tf.add`, which adds two tensors together and returns a tensor representing the output:

In [16]:
one_plus_three = tf.add(1, 3)

with tf.Session() as session:
    print('1 + 3 = {}'.format(session.run(one_plus_three)))

1 + 3 = 4


# Gradients

TensorFlow supports _automatic differentiation_, i.e., it can compute the derivative of scalars with respect to tensors in the graph and represent the result as a symbolic expression.

Take for instance the linear equation

In [17]:
x = tf.placeholder(dtype=tf.float32, shape=[], name='p')
y = 3 * x + 2

We can compute the derivative of `y` with respect to `x` with the `tf.gradients` function:

In [18]:
dy_dx, = tf.gradients(ys=y, xs=[x])

We can verify that the gradient evaluates to 3 as expected:

In [19]:
with tf.Session() as session:
    dy_dx_val = session.run(dy_dx)
    print('The gradient of y with respect '
          'to x is {}'.format(dy_dx_val))

The gradient of y with respect to x is 3.0


_Note: Some of you may have noticed that TensorFlow did not complain despite no value being provided for the `x` placeholder. This is because even though `x` is part of the computation graph, the derivative of `y` with respect to `x` does not involve `x`, and therefore evaluating it does not require a value to be passed for `x`._

# Exercise

Find the minimum of the expression

$$2(x - 2)^2 + 2(y + 3)^2$$

using gradient descent by filling in the following code block:

In [20]:
with tf.Graph().as_default():
    tf.set_random_seed(1234)

    # Create two scalar variables, x and y, initialized at random.
    # x = WRITEME.
    # y = WRITEME.

    # Create a tensor z whose value represents the expression
    #     2(x - 2)^2 + 2(y + 3)^2
    # z = WRITEME.
    
    # Compute the gradients of z with respect to x and y.
    # dx, dy = WRITEME.
    
    # Create an assignment expression for x using the update rule
    #    x <- x - 0.1 * dz/dx
    # and do the same for y.
    # x_update = WRITEME.
    # y_update = WRITEME.
    
    with tf.Session() as session:
        # Run the global initializer op for x and y.
        # WRITEME.
        
        for _ in range(10):
            pass
            # Run the update ops for x and y.
            # WRITEME.
            
            # Retrieve the values for x, y, and z, and print them.
            # x_val, y_val, z_val = WRITEME.
            # print('x = {:4.2f}, y = {:4.2f}, z = {:4.2f}'.format(x_val, y_val, z_val))

## Solution

In [21]:
with tf.Graph().as_default():
    tf.set_random_seed(1234)

    # Create two scalar variables, x and y, initialized at random.
    x = tf.get_variable(name='x', shape=[], dtype=tf.float32,
                        initializer=tf.random_normal_initializer())
    y = tf.get_variable(name='y', shape=[], dtype=tf.float32,
                        initializer=tf.random_normal_initializer())

    # Create a tensor z whose value represents the expression
    #     2(x - 2)^2 + 2(y + 3)^2
    z = 2 * (x - 2) ** 2 + 2 * (y + 3) ** 2
    
    # Compute the gradients of z with respect to x and y.
    dx, dy = tf.gradients(z, [x, y])
    
    # Create an assignment expression for x using the update rule
    #    x <- x - 0.1 * dz/dx
    # and do the same for y.
    x_update = tf.assign_sub(x, 0.1 * dx)
    y_update = tf.assign_sub(y, 0.1 * dy)
    
    with tf.Session() as session:
        # Run the global initializer op for x and y.
        session.run(tf.global_variables_initializer())
        
        for _ in range(10):
            # Run the update ops for x and y.
            session.run([x_update, y_update])
            
            # Retrieve the values for x, y, and z, and print them.
            x_val, y_val, z_val = session.run([x, y, z])
            print('x = {:4.2f}, y = {:4.2f}, z = {:4.2f}'.format(x_val, y_val, z_val))

x = 0.72, y = -1.62, z = 7.10
x = 1.23, y = -2.17, z = 2.55
x = 1.54, y = -2.50, z = 0.92
x = 1.72, y = -2.70, z = 0.33
x = 1.83, y = -2.82, z = 0.12
x = 1.90, y = -2.89, z = 0.04
x = 1.94, y = -2.94, z = 0.02
x = 1.96, y = -2.96, z = 0.01
x = 1.98, y = -2.98, z = 0.00
x = 1.99, y = -2.99, z = 0.00


# Optimization made easy

The solution to the exercise above can be shortened quite a bit by taking advantage of TensorFlow's optimization features. Here is the graph we were working with:

In [22]:
tf.set_random_seed(1234)
x = tf.get_variable(name='x', shape=[], dtype=tf.float32,
                    initializer=tf.random_normal_initializer())
y = tf.get_variable(name='y', shape=[], dtype=tf.float32,
                    initializer=tf.random_normal_initializer())

z = 2 * (x - 2) ** 2 + 2 * (y + 3) ** 2

TensorFlow provides utility classes to facilitate optimization in computation graphs. These classes inherit from `tf.train.Optimizer`. Let's look at the simplest one, `tf.train.GradientDescentOptimizer`.

We instantiate the `tf.train.GradientOptimizer` by passing it a scalar learning rate. Note that the learning rate itself can be symbolic, and is allowed to vary across a session.

In [23]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)

We can then call the optimizer's `minimize` function to obtain an op with, when evaluated, does a gradient descent step on the variables we specified:

In [24]:
update_op = optimizer.minimize(loss=z, var_list=tf.trainable_variables())

Here we took advantage of the fact that all variables created via `tf.get_variable` can be accessed as a list using the `tf.trainable_variables` function.

(Note: you can pass `trainable=False` to `tf.get_variable` to exclude a certain variable from ending up in that list.)

The code then proceeds as before:

In [25]:
with tf.Session() as session:
    session.run(tf.global_variables_initializer())

    for _ in range(10):
        session.run(update_op)
        x_val, y_val, z_val = session.run([x, y, z])
        print('x = {:4.2f}, y = {:4.2f}, z = {:4.2f}'.format(x_val, y_val, z_val))

x = 0.21, y = -1.30, z = 12.17
x = 0.93, y = -1.98, z = 4.38
x = 1.36, y = -2.39, z = 1.58
x = 1.61, y = -2.63, z = 0.57
x = 1.77, y = -2.78, z = 0.20
x = 1.86, y = -2.87, z = 0.07
x = 1.92, y = -2.92, z = 0.03
x = 1.95, y = -2.95, z = 0.01
x = 1.97, y = -2.97, z = 0.00
x = 1.98, y = -2.98, z = 0.00


# Control flow

**TODO: WRITEME**

# Scaling up to large computation graphs

We have now covered the bare minimum that would allow you to do machine learning with TensorFlow. We have not covered _every_ TensorFlow op, but you now possess the knowledge required to browse through the [TensorFlow Python API](https://www.tensorflow.org/api_docs/python/) and find what you need.

We will now concentrate on ways to scale what you learned to actual machine learning problems without increasing the maintenance complexity too much.

## Variable and name scopes

TensorFlow uses a soft convention for op and variable names: an op or variable that is part of a hierarchy should have a name that conveys its location in the hierarchy, with the `'/'` character being used to separate different levels in the hierarchy. For instance, a good name for the bias vector of the second layer of the model would be `'model/layer2/b'`.

In order to reduce code duplication and facilitate maintenance, TensorFlow provides two context managers, named `tf.name_scope` and `tf.variable_scope`, inside which variables and ops that are created see their name prepended with the name of the enclosing scope. The difference between the two is that `tf.variable_scope` operates on _all_ names, whereas `tf.name_scope` operates on all _but_ variable names:

In [26]:
# Variable scopes operate on all tensors
with tf.variable_scope('foo'):
    # Scopes can be nested
    with tf.variable_scope('bar'):
        print(tf.get_variable('a', shape=[]).name)
        print(tf.constant(0.0, name='b').name)
# Name scopes do not operate on variables
with tf.name_scope('machine'):
    with tf.name_scope('learning'):
        print(tf.get_variable('a', shape=[]).name)
        print(tf.constant(0.0, name='b').name)

foo/bar/a:0
foo/bar/b:0
a:0
machine/learning/b:0


# Graph visualization with TensorBoard

**TODO: WRITEME**