In [1]:
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf

### .assign_add()  BLOCKS gradients

**State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.**

**A tf.Variable has internal state—its value.** When you use the variable, the state is read. **It's normal to calculate a gradient with respect to a variable, but the variable's state blocks gradient calculations from going farther back.** For example:

In [2]:
x0 = tf.Variable(3.0)
x1 = tf.Variable(0.0)

with tf.GradientTape(persistent=True) as tape:
  # Update x1 = x1 + x0.
  x1.assign_add(x0)
  # The tape starts recording from x1.
  y = x1**2   # y = (x1 + x0)**2

# This doesn't work.
print(tape.gradient(y, x0))   #dy/dx0 = 2*(x1 + x0)
print(tape.gradient(y, x1))   #dy/dx0 = 2*(x1 + x0)

None
tf.Tensor(6.0, shape=(), dtype=float32)


Similarly, tf.data.Dataset iterators and tf.queues are stateful, and will stop all gradients on tensors that pass through them.

### The drawback of tf.py_function is that it's not portable or particularly performant, cannot be saved with SavedModel, and does not work well in distributed (multi-GPU, TPU) setups