In [2]:
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors

In [3]:
normal = tfd.Normal(loc=0, scale=1.)

In [6]:
z = normal.sample(3)
z

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-0.76491195, -1.2679604 ,  1.3671497 ], dtype=float32)>

We are creating a normal distribution object and sampling from it.

In [7]:
scale_and_shift = tfb.Chain([tfb.Shift(1.), tfb.Scale(2.)])
x = scale_and_shift.forward(z)

The, we are creating a scale and shift bijector. We are transforming the tensor z by passing it to the forward method of the bijector. The tensor x is then a scaled and shifted version of z.

In [10]:
log_prob_z = normal.log_prob(z)
log_prob_x = (normal.log_prob(z) - scale_and_shift.forward_log_det_jacobian(z, event_ndims=0))
log_prob_x

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-1.9046309, -2.4159474, -2.546635 ], dtype=float32)>

We can compute the log probability of z by simply using the log prob method of the normal distribution. We can also compute the log probability of x using the change of variables formula, as the log probility of z minus the log of the jacobian determinant of the bijector transformation evaluated at z.

In [12]:
log_prob_x = (normal.log_prob(z) + scale_and_shift.inverse_log_det_jacobian(x, event_ndims=0))

We can also invert the change of variables formula to express it in terms of the inverse of the bijector transformation. In that case we can write that the log probability of x is equal to the log of probability of z plus the log of the jacobian determinant of inverse transformation evaluated at x.

In [None]:
log_prob_x = (normal.log_prob(z) + scale_and_shift.inverse_log_det_jacobian(x, event_ndims=0))

Notice that z is the image of x under the inverse transformation and so we can just replace z with the result of the bijectors inverse method applied to x. In the first approach we are computing the log probability of x only using the tensor z, while in the second we are computing the log probability of x only using the tensor x. In practice we will mostly use the second form of the change of variables formula.

#### Application

A normalizing flow is a generative model of the data. The model assumes the following: we have a latent variable z which is distributed according the some base distribution, which we will typically assume to be something simple, like a diagonal gaussian distribution.  We assume that the data generating process first samples z from this base distribution and transforms it in some way according to a function f to produce the data sample x. For a normalizing flow the function f will be bijective or invertible. It will also be parameterized and we will learn its parameters with maximum likelihood. That means that in the training process we will have sample datapoints x and we will want to compute the log probability of x under the model. That is precisely what the expression above is doing. The log probability of x is what we aim to maximize in a training loop. The bijector object will contain the parameters that we are trying to optimize. Once the model is trained we can then sample from the model by first sample from the base distribution and then pass that sample through the bijector transformation using the forward method of the bijector. 

By convention, we think of the forward transformation as being used for sampling and the inverse transformation, together with the log jacobian determinant is used to compute log probabilities.

### Example

In [13]:
normal = tfd.Normal(loc=0, scale=1.)
z = normal.sample(3)

We choose our base distribution (p0) to be a univariate standard normal. 

In [14]:
exp = tfb.Exp()
x = exp.forward(z)

The transformation is implemented using the exponential bijector. We can then sample from the base distribution, pass the sample through the bijector, using the forward method and this will produce a data sample x.

In [16]:
log_normal = tfd.TransformedDistribution(normal, exp)
log_normal

<tfp.distributions.TransformedDistribution 'expNormal' batch_shape=[] event_shape=[] dtype=float32>

The transformed distribution object is useful to directly define the data distribution (p1) with a distribution object. The transformed distribution comes from the distributions module and the constructor has two required arguments which are the base distribution and the bijector. In this case the transformed distribution is a log normal distribution. The `Transformed Distribution object` has the same methods and properties as regular distributions. It has a batch shape and event shape, both inherited from the base distribution. 

In [17]:
log_normal = exp(normal)
log_normal

<tfp.distributions.TransformedDistribution 'expNormal' batch_shape=[] event_shape=[] dtype=float32>

Another way to create the transformed distribution is to call the bijector on the base distribution. Remember that the call method of a bijector object can be applied to another bijector, in which case it chains the bijectors together and it can be called in a tensor like object, which is the same as calling the forward method. Here we see that calling a bijector on a distribution object creates an instance of a transformed distribution.

In [18]:
log_normal.sample()
log_normal.log_prob(x)

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([-0.63450694, -0.4306941 , -0.8849568 ], dtype=float32)>

Now that we have the transformed distribution object we can use the sample and log_prob methods as usual. In this case, when you sample you are sampling from the base distribution and then passing it through the bijector. When you compute the log probability the change of variables formula is being applied, which uses the inverse and the inverse log det jacobian methods on the bijector.

# Shapes

In [35]:
normal = tfd.MultivariateNormalDiag(loc=[0,0], scale_diag=[1,1])
normal

<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[] event_shape=[2] dtype=float32>

In [36]:
scale_tril = [[1.,0.], [1., 1.]]
scale = tfb.ScaleMatvecTriL(scale_tril=scale_tril)

In [40]:
mvn = tfd.TransformedDistribution(normal, scale)
mvn

<tfp.distributions.TransformedDistribution 'scale_matvec_trilMultivariateNormalDiag' batch_shape=[] event_shape=[2] dtype=float32>

Here we have a multivariate independent normal with event shape equal to 2. The bijector that we are using is a scalar bijector that multiplies the input by a lower triangular matrix (2x2). Then we create the transformed distribution object by passing in the mv normal distribution and the scaled bijector. Notice that the transformed distribution keeps the same shapes as the base distribution, namely the empty batch shape and the event shape of size 2.

We've created a two dimensional independent normal distribution. The two dimensional random variable is then scaled by the lower triangular matrix that we defined.

In [49]:
normal = tfd.MultivariateNormalDiag(loc=[[0., 0.], [0., 0.]], scale_diag=[[1., 1.], [1., 1.]])

scale_tril=[[[1., 0.], [1.,1.]], [[0.5, 0.], [-1, 0.5]]]
scale = tfb.ScaleMatvecTriL(scale_tril=scale_tril)

mvn = tfd.TransformedDistribution(normal, scale)
mvn

<tfp.distributions.TransformedDistribution 'scale_matvec_trilMultivariateNormalDiag' batch_shape=[2] event_shape=[2] dtype=float32>

In this case, the scaling factor is a ranked 3 tensor of shape (2,2,2). We are creating a batched independent standard normal for the base distribution and scaling each two dimensional random variable in the batch by the lower triangular matrix defined by the bijector. This is equivalent to defining a batched multivariate triL normal distribution using the rank 3 tensor for the scaling.

In [50]:
mvn2 = tfd.MultivariateNormalTriL(loc=0, scale_tril=scale_tril)

In [51]:
tfd.Normal(loc=0, scale=[1.,1.])

<tfp.distributions.Normal 'Normal' batch_shape=[2] event_shape=[] dtype=float32>