Gradient tape aids in performing automatic differentiation

##Imports

In [2]:
import tensorflow as tf

##Basics of Gradient Tape

In [3]:
#Define a 2x2 array of 1's 
x = tf.ones((2,2))

with tf.GradientTape() as t:
  #Record the actions performed on tensor x with 'watch'
  t.watch(x)

  #Define y as the sum of the elements in x
  y = tf.reduce_sum(x)

  #Let z be the square of y
  z = tf.square(y)

#Get the derivative of z wrt the original input tensor x
dz_dx = t.gradient(z, x)

print(dz_dx)

tf.Tensor(
[[8. 8.]
 [8. 8.]], shape=(2, 2), dtype=float32)


##Gradient tape expires after one use by default

If the goal is to compute multiple gradients, not that by default, GradientTape is not persistent (`persistent=Fale`). 

This means that the gradient tape expires after it is used to calculate the gradient 

In [4]:
x = tf.constant(3.0)

#Notice that persisten is False by default
with tf.GradientTape() as t:
  t.watch(x)

  y = x**2

  z = y**2

#Compute dz/dx
#4 * x^3 at x=3 --> 108.0
dz_dx = t.gradient(z, x)
print(dz_dx)

tf.Tensor(108.0, shape=(), dtype=float32)


##Gradient Tape Expired 

Try to calculate another gradient after already using GradientTape() once

In [5]:
try:
  dy_dx = t.gradient(y, x)
  print(dy_dx)

except RuntimeError as e:
  print('The error message is:')
  print(e)

The error message is:
A non-persistent GradientTape can only be used to compute one set of gradients (or jacobians)


##Making the gradient tape persistent

`persistent=True`

In [6]:
x = tf.constant(3.0)

#Set persistent=True  so that GradientTape() can be reused 
with tf.GradientTape(persistent=True) as t:
  t.watch(x)

  y = x**2

  z = y**2

#Compute dz/dx
dz_dx = t.gradient(z, x)
print(dz_dx)

tf.Tensor(108.0, shape=(), dtype=float32)


Try reusing the same tape 

In [7]:
dy_dx = t.gradient(y, x)
print(dy_dx)

tf.Tensor(6.0, shape=(), dtype=float32)


Delete the variable `t` once it's no longer needed 

In [8]:
del t

##Nested Gradient Tapes

These are used to compute higher  order derivatives 



In [10]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
  with tf.GradientTape() as tape_1:
    y = x**3

    #The first calculation should occur at least within the outer with block
  dy_dx = tape_1.gradient(y, x)
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)


The first gradient calculation can also be inside the inner block

In [12]:
x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
  with tf.GradientTape() as tape_1:
    y = x**3

    #The first gradient can also be inside the inner `with` block
    dy_dx = tape_1.gradient(y, x)
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)

tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
