## Introduction
- Track the computation and compute the graidents with `tf.GradientTape`

In [2]:
import tensorflow as tf

x = tf.constant(3.0)
with tf.GradientTape() as tape:
    tape.watch(x)
    y = x ** 3

print(tape.gradient(y, x).numpy())

27.0


- By default, `GradientTape` doesn’t track constants, so we must instruct it to with: `tape.watch(variable)`
- Then perform some computation on the variables we are watching
- Calculate gradients of a calculation w.r.t. a variable with `tape.gradient(target, sources)`. Note, `tape.gradient` returns an EagerTensor that you can convert to ndarray format with `.numpy()`

## Automatically Watching Variables

- If `x` were a trainable variable instead of a constant, there would be no need to tell the tape to watch it

In [3]:
x = tf.Variable(3.0, trainable=True)

with tf.GradientTape() as tape:
    y = x ** 3
print(tape.gradient(y, x).numpy())

27.0


## `watch_accessed_variables=False`

- If we don’t want `GradientTape` to watch all trainable variables automatically

In [6]:
x = tf.Variable(3.0, trainable=True)

with tf.GradientTape(watch_accessed_variables=False):
    y = x ** 3

print(tape.gradient(y, x).numpy())

RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

If you have a lot of trainable variables and are not optimizing them all at once, You may want to disable `watch_accessed_variables` to protect yourself from mistakes.

## Higher-Order Derivatives

In [7]:
x = tf.Variable(3.0, trainable=True)

with tf.GradientTape() as tape1:
    with tf.GradientTape() as tape2:
        y = x ** 3
    order_1 = tape2.gradient(y, x)
order_2 = tape1.gradient(order_1, x)

print(order_2.numpy())

18.0


## persistent=True
If we were to run the following:

In [8]:
a, b = tf.Variable(6.0, trainable=True), tf.Variable(2.0, trainable=True)
with tf.GradientTape() as tape:
    y1 = a ** 2
    y2 = b ** 3

print(tape.gradient(y1, a).numpy())
print(tape.gradient(y2, b).numpy())

12.0


RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.

- But in reality, calling `tape.gradient` a second time will raise an error.
- This is because immediately after calling `tape.gradient`, the `GradientTape` releases all the information stored inside of it for computational purposes.
- If we want to bypass this, we can set `persistent=True`

In [9]:
a, b = tf.Variable(6.0, trainable=True), tf.Variable(2.0, trainable=True)
with tf.GradientTape(persistent=True) as tape:
    y1 = a ** 2
    y2 = b ** 3

print(tape.gradient(y1, a).numpy())
print(tape.gradient(y2, b).numpy())

12.0
12.0


## stop_recording
- `tape.stop_recording()` temporarily pauses the tapes recording, leading to greater computation speed
- it is more readable to use `stop_recording` blocks multiple times to calculate gradients in the middle of a function, than to calculate all the gradients at the end of a function.
- For example, I prefer:

In [11]:
a, b = map(lambda x: tf.Variable(x, trainable=True), [6.0, 2.0])
with tf.GradientTape(persistent=True) as tape:
    y1 = a ** 2
    with tape.stop_recording():
        print(tape.gradient(y1, a).numpy())

    y2 = b ** 3
    with tape.stop_recording():
        print(tape.gradient(y2, b).numpy())

12.0
12.0


to:

In [12]:
a, b = map(lambda x: tf.Variable(x, trainable=True), [6.0, 2.0])
with tf.GradientTape(persistent=True) as tape:
    y1 = a ** 2
    y2 = b ** 3

print(tape.gradient(y1, a).numpy())
print(tape.gradient(y2, b).numpy())

12.0
12.0


# reset()
- Clears all information stored in this tape.

In [14]:
a, b = map(lambda x: tf.Variable(x, trainable=True), [6.0, 2.0])
with tf.GradientTape() as tape:
    y1 = a ** 2
with tf.GradientTape() as tape:
    y2 = b ** 3
print(tape.gradient(y2, b).numpy())

12.0


Equivalent to

In [15]:
a, b = map(lambda x: tf.Variable(x, trainable=True), [6.0, 2.0])
with tf.GradientTape() as tape:
    y1 = a ** 2
    tape.reset()
    y2 = b ** 3
print(tape.gradient(y2, b).numpy())

12.0


In [17]:
print(tape.watched_variables())

(<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=2.0>,)


# Advaned Uses
## Linear Regression
- To start off the more advanced uses of GradientTape

In [22]:
import numpy as np
import random
# Loss function
def loss(preds, labels):
    return tf.abs(labels - preds)

# training data
train_X = np.asarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
train_y = np.asarray([i*10+5 for i in train_X]) # y = 10x+5

# Trainable variables
a = tf.Variable(random.random(), trainable=True)
b = tf.Variable(random.random(), trainable=True)

Then, we can go ahead and define our step function. The step function will be run every epoch to update the trainable variables, a and b

In [28]:
def step(X, y):
    with tf.GradientTape(persistent=True) as tape:
        # Predict
        preds = a * X + b
        # Calculate loss
        reg_loss = loss(preds, y)

    # Calculate gradients
    a_grad, b_grad = tape.gradient(reg_loss, (a, b))

    # update variables
    a.assign_sub(a_grad * 0.001)
    b.assign_sub(b_grad * 0.001)

In [29]:
for epoch in range(100000):
    step(train_X, train_y)

print(f'y ≈ {a.numpy()}x^2 + {b.numpy()}x + {c.numpy()}')

NameError: name 'c' is not defined

## Conclusion

`tf.GradientTape` is one of the most potent tools a machine learning engineer can have in their arsenal — its style of programming combines the beauty of mathematics with the power and simplicity of TensorFlow and Keras.