# Some TensorFlow and TensorFlow Probability basics

Based on: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/A_Tour_of_TensorFlow_Probability.ipynb

In [112]:
import tensorflow as tf
import numpy as np
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go

init_notebook_mode(connected=True)

## Linear algebra

As long as there are no for loops, all the operations are automatically vectorizes (and thus more performant).

Solving a linear system
$$
y = m x
$$
for $x$.

In [15]:
m = tf.random.uniform(shape=[10, 10])

y = tf.random.uniform(shape=[10, 1])

In [16]:
x = tf.linalg.solve(
    matrix=m,
    rhs=y
)

In [17]:
np.allclose(
    tf.linalg.matmul(m, x).numpy(),
    y.numpy()
)

True

This can of course be achieved by inverting $m$ by hand (if it's invertible!).

In [18]:
if tf.linalg.det(m) != 0:
    minv = tf.linalg.inv(m)
    
    x_alternative = tf.linalg.matmul(minv, y)

In [19]:
np.allclose(
    x_alternative.numpy(),
    x.numpy()
)

True

We can also define sets of tensors stacking them teogether using another dimension.

In [20]:
m_stacked = tf.random.uniform(shape=(5, 10, 10))

y_stacked = tf.random.uniform(shape=(5, 10, 1))

In [21]:
# Invert each matrix.
m_stacked_inv = tf.linalg.inv(m_stacked)

m_stacked_inv.shape

TensorShape([5, 10, 10])

In [26]:
x_stacked = tf.linalg.matmul(m_stacked_inv, y_stacked)

x_stacked.shape

TensorShape([5, 10, 1])

## Automatic differentiation

Differentiating
$$
b = \frac{1}{2}\,a^2
$$
w.r.t. $a$ and setting $a = 29$.

In [40]:
a = tf.constant(29.)

with tf.GradientTape() as tape:
    tape.watch([a])
    
    b = 0.5 * a**2
    
grad = tape.gradient(b, a)

grad

<tf.Tensor: id=212, shape=(), dtype=float32, numpy=29.0>

Differentiating w.r.t. multiple variables (proper gradient) the function
$$
F(a, b) = a\,\sin^b(b)
$$
and setting $a=1$ and $b=\pi/2$.

Result:
$$
\nabla F(a, b) = \left(\begin{array}{c}
\sin^b(b) \\
a\,b \cos(b)
\end{array}\right) =
\left(\begin{array}{c}
1 \\
0
\end{array}\right)
$$

In [42]:
a = tf.constant(1.)
b = tf.constant(np.pi/2.)

with tf.GradientTape() as tape:
    tape.watch([a, b])
    
    f = a * (tf.sin(b)) ** b

grad = tape.gradient(f, [a, b])

grad

[<tf.Tensor: id=254, shape=(), dtype=float32, numpy=1.0>,
 <tf.Tensor: id=282, shape=(), dtype=float32, numpy=-6.866169e-08>]

Differentiating w.r.t. vectors: let

$$
\mathbf{a} = \left(\begin{array}{r}
1 \\
2 \\
3
\end{array}\right) \equiv
\left(\begin{array}{r}
a_1 \\
a_2 \\
a_3
\end{array}\right)\in \mathbb{R}^3,\quad
b = \left(\begin{array}{ccc}
1 & 0 & 0\\
0 & -1 & 0 \\
0 & 0 & 1
\end{array}\right) \equiv
\left(\begin{array}{ccc}
b_{11} & b_{12} & b_{13} \\
b_{21} & b_{22} & b_{23} \\
b_{31} & b_{32} & b_{33}
\end{array}\right)\in \text{Mat}_{\mathbb{R}}(3)
$$

and consider the differentiation of the product $h(\mathbf{a}) = b\,\mathbf{a}$ w.r.t. (the components of) $\mathbf{a}$, setting $\mathbf{a}$ to the above value in the end

$$
\nabla h(\mathbf{a}) = \left(\begin{array}{r}
\partial_{a_1} h(a_1, a_2, a_3) \\
\partial_{a_2} h(a_1, a_2, a_3) \\
\partial_{a_3} h(a_1, a_2, a_3)
\end{array}\right) =
\left(\begin{array}{r}
b_{11} \\
b_{22} \\
b_{33}
\end{array}\right) =
\left(\begin{array}{r}
1 \\
-1 \\
1
\end{array}\right)
$$

In [71]:
a = tf.constant([[1], [2], [3]], dtype=tf.float32)

b = tf.constant([
    [1, 0, 0],
    [0, -1, 0],
    [0, 0, 1]
], dtype=tf.float32)

with tf.GradientTape() as tape:
    tape.watch(a)
    
    g = tf.linalg.matmul(b, a)
    
grad = tape.gradient(g, a)

grad

<tf.Tensor: id=324, shape=(3, 1), dtype=float32, numpy=
array([[ 1.],
       [-1.],
       [ 1.]], dtype=float32)>

Implement a gradinet descent algorithm to find a minimum of

$$
F(x) = - x^2 + 2\,x^4,
$$

where the minima are $x_1 = 0$ and $x_{2, 3} = \pm1/\sqrt{2}$. The algorithm will find either point, starting from $x=3$.

In [142]:
x = tf.constant(3, dtype=tf.float32)

eps = 0.001

n_iter = 1000

x_values = [x.numpy()]
f_values = []
der_values = []

for i in range(n_iter):
    # print(f"Iteration {i+1}")
    # print("-----------")
    
    with tf.GradientTape() as tape:
        tape.watch(x)

        f = - x ** 2 + 2. * x ** 4
        
        f_values.append(f.numpy())

    grad = tape.gradient(f, x)
    
    der_values.append(grad.numpy())

    # print(f"f'(x={x}): {grad}")

    x = x - grad * eps
    
    x_values.append(x.numpy())

    # print(f"x_new: {x}\n")

f_values.append((- x ** 2 + 2. * x ** 4).numpy())

print("Final values")
print("------------")
print(f"(x, f(x)) = ({x_values[-1]}, {f_values[-1]}), f'(x) = {der_values[-1]}")

Final values
------------
(x, f(x)) = (0.5044261813163757, -0.12496045231819153), f'(x) = 0.01801443099975586


In [143]:
trace = go.Scatter(
    x=list(range(len(x_values))),
    y=x_values,
    mode="markers"
)

fig = go.Figure(data=[trace])

iplot(fig)

In [144]:
trace = go.Scatter(
    x=x_values,
    y=f_values,
    mode="markers",
    marker=dict(
        opacity=np.linspace(0.5, 1, len(x_values)),
    ),
)

fig = go.Figure(data=[trace])

iplot(fig)