# AutoGrad API in Tensorflow

In [0]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [0]:
import tensorflow as tf

# Gradient Tape

TensorFlow provides the tf.GradientTape API for automatic differentiation - computing the gradient of a computation with respect to its input variables. Tensorflow "records" all operations executed inside the context of a tf.GradientTape onto a "tape". 

Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a "recorded" computation using reverse mode differentiation.

In [0]:
x = tf.ones((2, 2))

In [0]:
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.reduce_sum(x)
    z = tf.multiply(y, y)

In [0]:
dz_dx = tape.gradient(z, x)

In [0]:
print(dz_dx)

tf.Tensor(
[[8. 8.]
 [8. 8.]], shape=(2, 2), dtype=float32)


You can also request gradients of the output with respect to intermediate values computed during a "recorded" tf.GradientTape context.

In [0]:
x = tf.ones((2, 2))
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.reduce_sum(x)
    z = tf.multiply(y, y)

In [0]:
dz_dy = tape.gradient(z, y)
print(dz_dy)

tf.Tensor(8.0, shape=(), dtype=float32)


By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected.

In [0]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as tape:
    tape.watch(x)
    y = x * x
    z = y * y

In [0]:
dz_dx = tape.gradient(z, x)
dz_dy = tape.gradient(z, y)
dy_dx = tape.gradient(y, x)
print(dz_dx, dz_dy, dy_dx)

tf.Tensor(108.0, shape=(), dtype=float32) tf.Tensor(18.0, shape=(), dtype=float32) tf.Tensor(6.0, shape=(), dtype=float32)


In [0]:
del tape

Recording control flow

Because tapes record operations as they are executed, Python control flow (using ifs and whiles for example) is naturally handled:

In [0]:
def f(x, y):
    output = 1.0
    for i in range(y):
        if i > 1 and i < 5:
            output = tf.multiply(output, x)
    return output

In [0]:
def grad(x, y):
    with tf.GradientTape() as tape:
        tape.watch(x)
        out = f(x, y)
    
    return tape.gradient(out, x)

In [0]:
x = tf.convert_to_tensor(2.0)

In [0]:
x

<tf.Tensor: shape=(), dtype=float32, numpy=2.0>

In [0]:
print(grad(x, 6))

tf.Tensor(12.0, shape=(), dtype=float32)


In [0]:
print(grad(x, 5))

tf.Tensor(12.0, shape=(), dtype=float32)


In [0]:
print(grad(x, 4))

tf.Tensor(4.0, shape=(), dtype=float32)


# Higher Order Gradients

Operations inside of the GradientTape context manager are recorded for automatic differentiation. If gradients are computed in that context, then the gradient computation is recorded as well. As a result, the exact same API works for higher-order gradients as well. For example:

In [0]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape1:
    with tf.GradientTape() as tape2:
        y = x * x * x
        dy_dx = tape2.gradient(y, x)
        d2y_dx2 = tape1.gradient(dy_dx, x)


In [0]:
print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)


# Using @tf.function

In TensorFlow 2.0, eager execution is turned on by default. The user interface is intuitive and flexible (running one-off operations is much easier and faster), but this can come at the expense of performance and deployability.

To get peak performance and to make your model deployable anywhere, use tf.function to make graphs out of your programs. Thanks to AutoGraph, a surprising amount of Python code just works with tf.function, but there are still pitfalls to be wary of.

The main takeaways and recommendations are:

    Don't rely on Python side effects like object mutation or list appends.
    tf.function works best with TensorFlow ops, rather than NumPy ops or Python primitives.
    When in doubt, use the ` for x in y idiom `.


In [0]:
import traceback
import contextlib

In [0]:
# Some helper code to demonstrate the kinds of errors you might encounter.
@contextlib.contextmanager
def assert_raises(error_class):
  try:
    yield
  except error_class as e:
    print('Caught expected exception \n  {}:'.format(error_class))
    traceback.print_exc(limit=2)
  except Exception as e:
    raise e
  else:
    raise Exception('Expected {} to be raised but no error was raised!'.format(
        error_class))


## Basics

A tf.function you define is just like a core TensorFlow operation: You can execute it eagerly; you can use it in a graph; it has gradients; and so on.

In [0]:
@tf.function
def add(a, b):
    return a + b

In [0]:
add(tf.ones([2, 2]), tf.ones([2, 2]))

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2., 2.],
       [2., 2.]], dtype=float32)>

In [0]:
v = tf.Variable(1.0)
with tf.GradientTape() as tape:
    result = add(v, 10)

In [0]:
tape.gradient(result, v)

<tf.Tensor: shape=(), dtype=float32, numpy=1.0>

You can use functions inside functions

In [0]:
@tf.function
def dense_layer(x, w, b):
    return add(tf.matmul(x, w), b)

In [0]:
dense_layer(tf.ones([3, 2]), tf.ones([2, 2]), tf.ones([2]))

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[3., 3.],
       [3., 3.],
       [3., 3.]], dtype=float32)>