https://github.com/tensorflow/docs/blob/master/site/en/tutorials/eager/automatic_differentiation.ipynb

automatic differentiation, a key technique for optimizing machine learning models.

In [1]:
import tensorflow as tf

tf.enable_eager_execution()

## Gradient tapes
TensorFlow provides the tf.GradientTape API for automatic differentiation - computing the gradient of a computation with respect to its input variables. Tensorflow "records" all operations executed inside the context of a tf.GradientTape onto a "tape". Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a "recorded" computation using reverse mode differentiation.

In [2]:
x = tf.ones((2, 2))
x

<tf.Tensor: id=2, shape=(2, 2), dtype=float32, numpy=
array([[1., 1.],
       [1., 1.]], dtype=float32)>

In [3]:
with tf.GradientTape() as t:
  t.watch(x)
  y = tf.reduce_sum(x)
  z = tf.multiply(y, y)

In [6]:
y

<tf.Tensor: id=5, shape=(), dtype=float32, numpy=4.0>

In [7]:
z

<tf.Tensor: id=6, shape=(), dtype=float32, numpy=16.0>

In [4]:
# Derivative of z with respect to the original input tensor x
dz_dx = t.gradient(z, x)

In [5]:
dz_dx

<tf.Tensor: id=15, shape=(2, 2), dtype=float32, numpy=
array([[8., 8.],
       [8., 8.]], dtype=float32)>

In [8]:
x = tf.ones((2, 2))

with tf.GradientTape() as t:
  t.watch(x)
  y = tf.reduce_sum(x)
  z = tf.multiply(y, y)

# Use the tape to compute the derivative of z with respect to the
# intermediate value y.
dz_dy = t.gradient(z, y)
dz_dy

<tf.Tensor: id=28, shape=(), dtype=float32, numpy=8.0>

By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method. as resources are released when the tape object is garbage collected. For example:

In [9]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as t:
  t.watch(x)
  y = x * x
  z = y * y
dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)
print(dz_dx)
dy_dx = t.gradient(y, x)  # 6.0
print(dy_dx)
del t  # Drop the reference to the tape

tf.Tensor(108.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)


## Recording control flow
Because tapes record operations as they are executed, Python control flow (using ifs and whiles for example) is naturally handled:

In [11]:
def f(x, y):
  output = 1.0
  for i in range(y):
    if i > 1 and i < 5:
      output = tf.multiply(output, x)
  return output

def grad(x, y):
  with tf.GradientTape() as t:
    t.watch(x)
    out = f(x, y)
  return t.gradient(out, x)

x = tf.convert_to_tensor(2.0)

In [12]:
grad(x, 6).numpy()

12.0

In [16]:
grad(x, 7).numpy()

12.0

In [17]:
grad(x, 5).numpy()

12.0

In [18]:
grad(x, 4).numpy()

4.0

In [13]:
f(x, 6)

<tf.Tensor: id=69, shape=(), dtype=float32, numpy=8.0>

In [14]:
f(x, 5)

<tf.Tensor: id=74, shape=(), dtype=float32, numpy=8.0>

In [15]:
f(x, 4)

<tf.Tensor: id=78, shape=(), dtype=float32, numpy=4.0>

## Higher-order gradients 高阶导数
Operations inside of the GradientTape context manager are recorded for automatic differentiation. 

If gradients are computed in that context, then the gradient computation is recorded as well. As a result, the exact same API works for higher-order gradients as well. For example:

In [20]:
x = tf.Variable(1.0)  # Create a Tensorflow variable initialized to 1.0

with tf.GradientTape() as t:
  with tf.GradientTape() as t2:
    y = x * x * x
  # Compute the gradient inside the 't' context manager
  # which means the gradient computation is differentiable as well.
  dy_dx = t2.gradient(y, x)
d2y_dx2 = t.gradient(dy_dx, x)

Instructions for updating:
Colocations handled automatically by placer.


In [21]:
dy_dx

<tf.Tensor: id=134, shape=(), dtype=float32, numpy=3.0>

In [22]:
d2y_dx2

<tf.Tensor: id=149, shape=(), dtype=float32, numpy=6.0>