<a href="https://colab.research.google.com/github/hellocybernetics/TensorFlow2.0_Eager_Execution_Tutorials/blob/master/tutorials/99_others/probabilistic_model_by_tfp_layers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# !pip install -q --upgrade tf-nightly-2.0-preview
# !pip install -q tfp-nightly

## TFP layers basic

In [3]:
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt

plt.style.use("seaborn")

tfk = tf.keras
tfkl = tf.keras.layers
tfd = tfp.distributions
tfpl = tfp.layers

### simple regression
There are four parameters $w_\mu, w_\sigma, b_\mu, b_\sigma$ .

$$
p(y\mid x, w_\mu, w_\sigma, b_\mu, b_\sigma) = {\mathcal N} (y \mid w_\mu x + b_\mu,  w_\sigma x+b_\sigma)
$$

In [4]:
input_dim = 1
output_dim = 1

model = tfk.Sequential([
    tfkl.InputLayer([input_dim]),
    tfkl.Dense(tfpl.IndependentNormal.params_size(output_dim)),
    tfpl.IndependentNormal(output_dim)
])

In [5]:
model.summary() 

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2)                 4         
_________________________________________________________________
independent_normal (Independ ((None, 1), (None, 1))    0         
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [6]:
X = tf.constant([[3.],
                 [2.],
                 [-5.]])

In [7]:
model(X)

<tfp.distributions.Independent 'sequential_independent_normal_IndependentNormal_Independentsequential_independent_normal_IndependentNormal_Normal' batch_shape=[3] event_shape=[1] dtype=float32>

In [8]:
model(X).sample()

<tf.Tensor: id=227, shape=(3, 1), dtype=float32, numpy=
array([[-0.43018448],
       [-0.30774704],
       [-7.049762  ]], dtype=float32)>

### multiple regression
Input dim is $D$, then there are $2D+2 = 2(D+1)$ parameters.

$$
p(y\mid {\bf x, w_\mu, w_\sigma}, b_\mu, b_\sigma)
= \mathcal N(y \mid {\bf w^T_\mu x} + b_\mu, {\bf w^T_\sigma x} + b_\sigma)
$$

In [25]:
input_dim = 3
output_dim = 1

model = tfk.Sequential([
    tfkl.InputLayer([input_dim]),
    tfkl.Dense(tfpl.IndependentNormal.params_size(output_dim)),
    tfpl.IndependentNormal(output_dim)
])

In [26]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 2)                 8         
_________________________________________________________________
independent_normal_3 (Indepe ((None, 1), (None, 1))    0         
Total params: 8
Trainable params: 8
Non-trainable params: 0
_________________________________________________________________


In [27]:
X = tf.constant([[3., -4., 1.],
                 [2., 2., 2.],
                 [-5., 3., 1.]])

In [28]:
model(X)

<tfp.distributions.Independent 'sequential_3_independent_normal_3_IndependentNormal_Independentsequential_3_independent_normal_3_IndependentNormal_Normal' batch_shape=[3] event_shape=[1] dtype=float32>

In [29]:
model(X).sample()

<tf.Tensor: id=2168, shape=(3, 1), dtype=float32, numpy=
array([[-5.4155235],
       [-2.0300615],
       [ 3.971672 ]], dtype=float32)>

### まとめ
As you see, `tfp.layers` module is writen like `tf.keras.layers` module with `tf.keras.Sequential`.

Note that, `tf.keras.layers` has `__call__` method whici recive `tf.Tensor` and output `tf.Tensor`, but `tfp.layers` output `tfp.distributions.Distribution` class. When you get some value type of `tf.Tensor`, you use `sample()`.

If you want to `N` samples, you write `object.sample(N)`.

In [14]:
model(X).sample(5)

<tf.Tensor: id=538, shape=(5, 3, 1), dtype=float32, numpy=
array([[[ 1.6277615 ],
        [-0.57028174],
        [ 0.7915349 ]],

       [[ 1.6302956 ],
        [-2.5077786 ],
        [ 1.7426147 ]],

       [[ 1.5174006 ],
        [-3.0163887 ],
        [-0.4108941 ]],

       [[ 1.7809215 ],
        [-2.4185162 ],
        [ 3.8518336 ]],

       [[ 1.8309169 ],
        [-4.1943836 ],
        [-1.7214508 ]]], dtype=float32)>

This implementation is useful for bayes prediction distribution.

When you get from below distribution,

$$
p(y\mid x) = \int _ w p(y \mid w, x)p(w\mid D)dw 
$$

you can use below montecalro sampling.

$$
\begin{align}
w_i &\sim p(w \mid D)\\
y_i &\sim p(y\mid w_i, x) \\
\end{align}
$$

This calculation approximates the integral with the sum.

$$
p(y \mid x) \simeq \frac{1}{N} \sum _ {i=1} ^N p(y\mid w_i, x) p(w_i \mid D)
$$

### Variational Bayes Multiple regression
You design statistical model below, 

$$
p(y\mid {\bf x, w_\mu, w_\sigma}, b_\mu, b_\sigma)
= \mathcal N(y \mid {\bf w^T_\mu x} + b_\mu, {\bf w^T_\sigma x} + b_\sigma)
$$

and set prior below.

$$
\begin{align}
p({\bf w_\mu}) = \mathcal N(0, {\bf \Sigma_{w_\mu}})\\
p({\bf w_\sigma}) = \mathcal N(0, {\bf \Sigma_{w_\sigma}})\\
p(b_\mu) = \mathcal N(0, \sigma_{b_\mu})\\
p(b_\sigma) = \mathcal N(0, \sigma_{b_\sigma})
\end{align}
$$

In variational inference, you prepare variational distribution for approximate $q({\bf w_\mu}), q({\bf w_\sigma}), q(b_\mu) , q(b_\sigma)$, and when each distiributions are normal distribution, in case of input dim is $D$, there are $4D + 2$ parameters.

In [30]:
input_dim = 3
output_dim = 1

model = tfk.Sequential([
    tfkl.InputLayer([input_dim]),
    tfpl.DenseReparameterization(tfpl.IndependentNormal.params_size(output_dim)),
    tfpl.IndependentNormal(output_dim)
])

In [31]:
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_reparameterization_1 ( (None, 2)                 14        
_________________________________________________________________
independent_normal_4 (Indepe ((None, 1), (None, 1))    0         
Total params: 14
Trainable params: 14
Non-trainable params: 0
_________________________________________________________________


In [32]:
X = tf.constant([[3., -4., 1.],
                 [2., 2., 2.],
                 [-5., 3., 1.]])

In [33]:
model(X)

<tfp.distributions.Independent 'sequential_4_independent_normal_4_IndependentNormal_Independentsequential_4_independent_normal_4_IndependentNormal_Normal' batch_shape=[3] event_shape=[1] dtype=float32>

In [34]:
model(X).sample()

<tf.Tensor: id=2694, shape=(3, 1), dtype=float32, numpy=
array([[ 1.1869445 ],
       [-0.29076326],
       [-1.5732886 ]], dtype=float32)>