# Introduction

I use this notebook to work myself into TensorFlow:

https://www.tensorflow.org/

Here I show how to use TensorFlow in order to

- compute gradients
- learn how to add or subtract two inputs

# Computing gradients with TensorFlow

## Display TensorFlow version

In [1]:
import tensorflow as tf

In [2]:
tf.__version__

'2.4.1'

## A mini computation graph

In [3]:
a = tf.constant(3)
b = tf.constant(4)

2022-04-25 15:13:27.560565: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-04-25 15:13:27.561150: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-04-25 15:13:27.562789: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


In [4]:
print(a)

tf.Tensor(3, shape=(), dtype=int32)


In [5]:
print(b)

tf.Tensor(4, shape=(), dtype=int32)


In [6]:
a+b

<tf.Tensor: shape=(), dtype=int32, numpy=7>

In [7]:
c = a+b

In [8]:
print(c)

tf.Tensor(7, shape=(), dtype=int32)


In [9]:
c.numpy()

7

## Reuse the computation graph?

In [10]:
a = tf.constant(10)
b = tf.constant(20)

In [11]:
print(a)

tf.Tensor(10, shape=(), dtype=int32)


In [12]:
print(b)

tf.Tensor(20, shape=(), dtype=int32)


In [13]:
c

<tf.Tensor: shape=(), dtype=int32, numpy=7>

## A reusable computation graph

In [14]:
a = tf.Variable(3)
b = tf.Variable(4)

In [15]:
print(a)

<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=3>


In [16]:
print(b)

<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=4>


In [17]:
c = a+b

In [18]:
print(c)

tf.Tensor(7, shape=(), dtype=int32)


In [19]:
a.assign(10)

<tf.Variable 'UnreadVariable' shape=() dtype=int32, numpy=10>

In [20]:
b.assign(10)

<tf.Variable 'UnreadVariable' shape=() dtype=int32, numpy=10>

In [21]:
c

<tf.Tensor: shape=(), dtype=int32, numpy=7>

In [22]:
def compute(a,b):
    c = a+b
    return c

In [23]:
a.assign(10)
b.assign(20)
result = compute(a,b)
result

<tf.Tensor: shape=(), dtype=int32, numpy=30>

In [24]:
id(result)

140302893729744

In [25]:
a.assign(30)
b.assign(30)
result = compute(a,b)
result

<tf.Tensor: shape=(), dtype=int32, numpy=60>

In [26]:
id(result)

140302948907136

We can see: now `c` is always a new tensor object!

Best practice is to *decorate* your function with `@tf.function`:

In [27]:
@tf.function
def compute(a,b):
    c = a+b
    return c

In [28]:
a.assign(10)
b.assign(20)
result = compute(a,b)
result

2022-04-25 15:13:27.813426: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-04-25 15:13:27.832689: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2899885000 Hz


<tf.Tensor: shape=(), dtype=int32, numpy=30>

In [29]:
id(result)

140302948900528

In [30]:
a.assign(30)
b.assign(30)
result = compute(a,b)
result

<tf.Tensor: shape=(), dtype=int32, numpy=60>

In [31]:
id(result)

140304368462128

See https://www.tensorflow.org/api_docs/python/tf/function

## Gradient tapes

What are *gradient tapes?*

See here:

https://www.tensorflow.org/guide/autodiff

    Computing gradients

    To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the forward pass. Then, during the backward pass, TensorFlow traverses this list of operations in reverse order to compute gradients.
    
    Gradient tapes
    TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.Variables. TensorFlow "records" relevant operations executed inside the context of a tf.GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

## A gradient computation example

In [32]:
x = tf.constant(3.0)
with tf.GradientTape() as t:
  t.watch(x)
  y = x * x
dy_dx = t.gradient(y, x) # Will compute to 6.0

In [33]:
dy_dx

<tf.Tensor: shape=(), dtype=float32, numpy=6.0>

In [34]:
y

<tf.Tensor: shape=(), dtype=float32, numpy=9.0>

In [35]:
x

<tf.Tensor: shape=(), dtype=float32, numpy=3.0>

## Importance of the `watch()` function

Note that the statement `t.watch(x)` is important. If we omit it, the tape has not stored any information in order to compute the gradient of $y=x^2$:

In [36]:
x = tf.constant(3.0)
with tf.GradientTape() as t:
  #t.watch(x)
  y = x * x
dy_dx = t.gradient(y, x) # Will compute to 6.0

In [37]:
dy_dx

In [38]:
print(dy_dx)

None


In [39]:
type(dy_dx)

NoneType

## Automatically watched tensors

Some tensors are watched automatically, e.g. `tf.Variables`

See https://www.tensorflow.org/api_docs/python/tf/GradientTape

    Trainable variables (created by tf.Variable or tf.compat.v1.get_variable, where trainable=True is default in both cases) are automatically watched. Tensors can be manually watched by invoking the watch method on this context manager.

In [40]:
x = tf.Variable(3.0)
with tf.GradientTape() as t:
  #t.watch(x)  # <-- we now omit this statement
  y = x * x
dy_dx = t.gradient(y, x) # Will compute to 6.0

In [41]:
print(dy_dx)

tf.Tensor(6.0, shape=(), dtype=float32)


So the derivative of `y` with respect to `x` is computed, although we did not explicitly tell TensorFlow to "watch" the tensor `y`.

## Computing the derivative of more than one tensor

In the following example we do not only compute $\frac{dz}{dx}$, but also $\frac{dy}{dx}$ :

In [42]:
x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as t:
#with tf.GradientTape() as t:
  t.watch(x)
  y = x * x # y=x^2
  z = y * y # z=y^2=(x^2)^2=x^4

dz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x=3 --> 4*27=108)
dy_dx = t.gradient(y, x)  # 6.0 (2*x at x=3 --> 6)

In [43]:
dz_dx

<tf.Tensor: shape=(), dtype=float32, numpy=108.0>

In [44]:
dy_dx

<tf.Tensor: shape=(), dtype=float32, numpy=6.0>

Did you notice the flag `persistent=True`?

Here is the description why we need this flag to be set:

    By default, the resources held by a GradientTape are released as soon as GradientTape.gradient() method is called. To compute multiple gradients over the same computation, create a persistent gradient tape.
    
If we do not set this flag to `True`, we will get the following `RuntimeError`:

    RuntimeError: A non-persistent GradientTape can only be used tocompute one set of gradients (or jacobians)