In [12]:
import tensorflow as tf

2025-02-19 12:04:06.638121: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [13]:
x = tf.range(4, dtype=tf.float32)
x

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([0., 1., 2., 3.], dtype=float32)>

In [15]:
x = tf.Variable(x)
x

<tf.Variable 'Variable:0' shape=(4,) dtype=float32, numpy=array([0., 1., 2., 3.], dtype=float32)>

In [20]:
# Record all computations onto a tape
with tf.GradientTape() as t:
    y = 2 * tf.tensordot(x, x, axes=1)
y

<tf.Tensor: shape=(), dtype=float32, numpy=28.0>

In [21]:
x_grad = t.gradient(y, x)
x_grad

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.,  4.,  8., 12.], dtype=float32)>

In [22]:
with tf.GradientTape() as t:
    y = tf.reduce_sum(x)
t.gradient(y, x)  # Overwritten by the newly calculated gradient

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>

In [31]:
%%markdown
#### Trying a simpler example

#### Trying a simpler example


In [32]:
with tf.GradientTape() as t:
    y = (2 * (x ** 2)) + (3 * x) + 10
t.gradient(y, x)  # Overwritten by the newly calculated gradient

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 3.,  7., 11., 15.], dtype=float32)>

In [35]:
%%markdown
### Detaching Computation

Sometimes, we wish to move some calculations outside of the recorded computational graph.
For example, say that we use the input to create some auxiliary intermediate terms for which
we do not want to compute a gradient. In this case, we need to detach the respective computational
graph from the final result. The following toy example makes this clearer:
suppose we have z = x * y and y = x + x but we want to focus on the direct influence of x on z
rather than the influence conveyed via y. In this case, we can create a new variable u that takes
the same value as y but whose provenance (how it was created) has been wiped out. Thus u has no
ancestors in the graph and gradients do not flow through u to x. For example, taking the gradient
of z = x * u will yield the result u, (not 3 * x * x as you might have expected since z = x * x * x).

### Detaching Computation

Sometimes, we wish to move some calculations outside of the recorded computational graph.
For example, say that we use the input to create some auxiliary intermediate terms for which
we do not want to compute a gradient. In this case, we need to detach the respective computational
graph from the final result. The following toy example makes this clearer:
suppose we have z = x * y and y = x + x but we want to focus on the direct influence of x on z
rather than the influence conveyed via y. In this case, we can create a new variable u that takes
the same value as y but whose provenance (how it was created) has been wiped out. Thus u has no
ancestors in the graph and gradients do not flow through u to x. For example, taking the gradient
of z = x * u will yield the result u, (not 3 * x * x as you might have expected since z = x * x * x).


In [83]:
# Set persistent=True to preserve the compute graph.
# This lets us run t.gradient more than once
with tf.GradientTape(persistent=True) as t:
    y = x ** 3
    u = tf.stop_gradient(y)
    z = 2 * u * x

x_grad = t.gradient(z, x)
x_grad

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.,  2., 16., 54.], dtype=float32)>

In [84]:
x_grad == 2 * u

<tf.Tensor: shape=(4,), dtype=bool, numpy=array([ True,  True,  True,  True])>

In [85]:
y_grad_based_on_x = t.gradient(y, x)
y_grad_based_on_x

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.,  3., 12., 27.], dtype=float32)>

In [86]:
y_grad_based_on_x == 3 * (x ** 2)

<tf.Tensor: shape=(4,), dtype=bool, numpy=array([ True,  True,  True,  True])>

In [73]:
%%markdown
### Gradients and Python Control Flow

So far we reviewed cases where the path from input to output was well defined via a function such as z = x * x * x. Programming offers us a lot more freedom in how we compute results. For instance, we can make them depend on auxiliary variables or condition choices on intermediate results. One benefit of using automatic differentiation is that even if building the computational graph of a function required passing through a maze of Python control flow (e.g., conditionals, loops, and arbitrary function calls), we can still calculate the gradient of the resulting variable. To illustrate this, consider the following code snippet where the number of iterations of the while loop and the evaluation of the if statement both depend on the value of the input a.

### Gradients and Python Control Flow

So far we reviewed cases where the path from input to output was well defined via a function such as z = x * x * x. Programming offers us a lot more freedom in how we compute results. For instance, we can make them depend on auxiliary variables or condition choices on intermediate results. One benefit of using automatic differentiation is that even if building the computational graph of a function required passing through a maze of Python control flow (e.g., conditionals, loops, and arbitrary function calls), we can still calculate the gradient of the resulting variable. To illustrate this, consider the following code snippet where the number of iterations of the while loop and the evaluation of the if statement both depend on the value of the input a.


In [74]:
def f(a):
    b = a * 2
    while tf.norm(b) < 1000:
        b = b * 2
    if tf.reduce_sum(b) > 0:
        c = b
    else:
        c = 100 * b
    return c

In [75]:
%%markdown
Below, we call this function, passing in a random value, as input. Since the input is a random variable, we do not know what form the computational graph will take. However, whenever we execute f(a) on a specific input, we realize a specific computational graph and can subsequently run backward.

Below, we call this function, passing in a random value, as input. Since the input is a random variable, we do not know what form the computational graph will take. However, whenever we execute f(a) on a specific input, we realize a specific computational graph and can subsequently run backward.


In [76]:
a = tf.Variable(tf.random.normal(shape=()))
with tf.GradientTape() as t:
    d = f(a)
d_grad = t.gradient(d, a)
d_grad

<tf.Tensor: shape=(), dtype=float32, numpy=1024.0>