In [0]:
#https://www.tensorflow.org/tutorials/customization/autodiff

In [2]:
from __future__ import absolute_import, division, print_function, unicode_literals

# TensorFlow and tf.keras
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras
import numpy as np

TensorFlow 2.x selected.


#Gradient tapes

TensorFlow provides the `tf.GradientTape` API for automatic differentiation - computing the gradient of a computation with respect to its input variables. Tensorflow "records" all operations executed inside the context of a `tf.GradientTape` onto a "tape". Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a "recorded" computation using reverse mode differentiation.

For example:

In [0]:
x = tf.ones((2, 2)) #create a tf matrix of ones shape 2,2

with tf.GradientTape() as t:
  t.watch(x) #I think this is to record everything that happens to X
  y = tf.reduce_sum(x) #Basically sums the insides of the matrix and flattens it into a sole number
  z = tf.multiply(y, y) #multiplies that sole number by itself., returns 16 (4*4)

# Derivative of z with respect to the original input tensor x
dz_dx = t.gradient(z, x) #returns a (2,2) matrix full of 8. Why??, not sure...
for i in [0, 1]:
  for j in [0, 1]:
    assert dz_dx[i][j].numpy() == 8.0 #Note: assert is like an if, if we change it gives an error because the value is not 8.
    


In [17]:
dz_dx

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[8., 8.],
       [8., 8.]], dtype=float32)>

You can also request gradients of the output with respect to intermediate values computed during a "recorded" `tf.GradientTape` context.

In [0]:
x = tf.ones((2, 2))

with tf.GradientTape() as t:
  t.watch(x)
  y = tf.reduce_sum(x)
  z = tf.multiply(y, y)

# Use the tape to compute the derivative of z with respect to the
# intermediate value y.
dz_dy = t.gradient(z, y) #returns a single 8
assert dz_dy.numpy() == 8.0


In [20]:
dz_dy

<tf.Tensor: shape=(), dtype=float32, numpy=8.0>

By default, the resources held by a GradientTape are released as soon as `GradientTape.gradient()` method is called. To compute multiple gradients over the same computation, create a persistent gradient tape. This allows multiple calls to the gradient() method as resources are released when the tape object is garbage collected. For example:

In [0]:
x = tf.constant(3.0) #creates a simple 3
with tf.GradientTape(persistent=True) as t:
  t.watch(x)
  y = x * x #9
  z = y * y #81
dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3) #<- 27*4 = 108, is this the same calc for everything? where does the 4* and the ^3 come from?
dy_dx = t.gradient(y, x)  # 6.0 
del t  # Drop the reference to the tape


#Recording control flow

Because tapes record operations as they are executed, Python control flow (using ifs and whiles for example) is naturally handled:

In [0]:
#f does not return the same value as grad.
def f(x, y): # a simple function with for and ifs, Not sure if the values are arbitraty
  output = 1.0
  for i in range(y):
    if i > 1 and i < 5:
      output = tf.multiply(output, x)
  return output

def grad(x, y):
  with tf.GradientTape() as t:
    t.watch(x)
    out = f(x, y)
  return t.gradient(out, x)

x = tf.convert_to_tensor(2.0) #passing a tensor

assert grad(x, 6).numpy() == 12.0
assert grad(x, 5).numpy() == 12.0
assert grad(x, 4).numpy() == 4.0


#Higher-order gradients

Operations inside of the GradientTape context manager are recorded for automatic differentiation. If gradients are computed in that context, then the gradient computation is recorded as well. As a result, the exact same API works for higher-order gradients as well. For example:

In [0]:
x = tf.Variable(1.0)  # Create a Tensorflow variable initialized to 1.0

with tf.GradientTape() as t:
  with tf.GradientTape() as t2:
    y = x * x * x
  # Compute the gradient inside the 't' context manager
  # which means the gradient computation is differentiable as well.
  dy_dx = t2.gradient(y, x) #returns a 3, whyy????????, i think its because y = x*x*x... because if we do x*x it returns 2...
d2y_dx2 = t.gradient(dy_dx, x) #returns a 6....

assert dy_dx.numpy() == 3.0
assert d2y_dx2.numpy() == 6.0


<tf.Tensor: shape=(), dtype=float32, numpy=6.0>