參考: https://www.tensorflow.org/api_docs/python/tf/GradientTape?hl=zh_tw

Tensorflow一般使用tf.GradientTape來記錄前向傳遞(Forward propagation)過程，然後反向傳播(Backpropagation)自動得到梯度值。

這種利用tf.GradientTape求微分的方法叫做Tensorflow的自動微分機制。

公式：
f(x)=x^n

微分(導數)：
f'(x)=n*x^(n-1)

例：
y=x^2
微分(導數)：
dy/dx=2x^(2-1)=2x

In [None]:
import tensorflow as tf
import numpy as np
#https://blog.csdn.net/walilk/article/details/50978864

In [None]:
#For example, consider the function y = x * x. The gradient at x = 3.0 can be computed as:
x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x)

print(dy_dx)

tf.Tensor(6.0, shape=(), dtype=float32)


In [None]:
#GradientTapes can be nested to compute higher-order derivatives. For example,
x = tf.constant(5.0)
with tf.GradientTape() as g:
  g.watch(x)
  with tf.GradientTape() as gg:
    gg.watch(x)
    y = x * x
  dy_dx = gg.gradient(y, x)  # dy_dx = 2 * x
d2y_dx2 = g.gradient(dy_dx, x)  # d2y_dx2 = 2
print(dy_dx)
print(d2y_dx2)

tf.Tensor(10.0, shape=(), dtype=float32)
tf.Tensor(2.0, shape=(), dtype=float32)


In [None]:
#By default GradientTape will automatically watch any trainable variables that are accessed inside the context. 
#If you want fine grained control over which variables are watched you can disable automatic tracking 
#by passing watch_accessed_variables=False to the tape constructor:
x = tf.Variable(2.0)
w = tf.Variable(5.0)
with tf.GradientTape(
    watch_accessed_variables=False, persistent=True) as tape:
  tape.watch(x)
  y = x ** 2  # Gradients will be available for `x`.
  z = w ** 3  # No gradients will be available as `w` isn't being watched.
dy_dx = tape.gradient(y, x)
print(dy_dx)

# No gradients will be available as `w` isn't being watched.
dz_dy = tape.gradient(z, w)
print(dz_dy)

tf.Tensor(4.0, shape=(), dtype=float32)
None


**Methods**

batch_jacobian: Computes and stacks per-example jacobians.

In [None]:
with tf.GradientTape() as g:
  x = tf.constant([[1., 2.], [3., 4.]], dtype=tf.float32)
  g.watch(x)
  y = x * x
batch_jacobian = g.batch_jacobian(y, x)
# batch_jacobian is [[[2,  0], [0,  4]], [[6,  0], [0,  8]]]
batch_jacobian

<tf.Tensor: shape=(2, 2, 2), dtype=float32, numpy=
array([[[2., 0.],
        [0., 4.]],

       [[6., 0.],
        [0., 8.]]], dtype=float32)>

[[1., 2.], [3., 4.]]
[1., 0.]
[0., 2.]

reset(): Clears all information stored in this tape.

In [None]:
with tf.GradientTape() as t:
  loss = ...
  if loss > k:
    t.reset()

NameError: ignored

**Method**

stop_recording(): Temporarily stops recording operations on this tape.

Operations executed while this context manager is active will not be recorded on the tape. This is useful for reducing the memory used by tracing all computations.

In [None]:
x = tf.constant(4.0)

with tf.GradientTape() as tape:
  with tape.stop_recording():
    y = x ** 2
dy_dx = tape.gradient(y, x)
print(dy_dx)

None
