In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [5]:
import torch
from torch import Tensor

## Random sampling
As seen in part 01, PyTorch can generate tensors filled with random numbers samples according to basic distributions:

* `rand(N, M)` generates a NxM tensor of _uniformly_ distributed numbers between 0 and 1
* `randn(N)` generates and one-dimensional N-long tensor with numbers following a (0, 1)-Gaussian

In [6]:
torch.rand(10,3)  # Sample 30 numbers from a uniform distribution between 0 and 1

tensor([[0.7989, 0.1386, 0.9296],
        [0.0479, 0.5572, 0.8357],
        [0.2185, 0.9958, 0.6452],
        [0.3999, 0.2476, 0.4539],
        [0.3991, 0.5335, 0.2886],
        [0.1383, 0.9166, 0.1645],
        [0.9382, 0.2281, 0.5662],
        [0.5468, 0.0915, 0.8896],
        [0.4000, 0.7398, 0.2083],
        [0.6401, 0.5134, 0.1580]])

In [7]:
torch.randn(8)  # Sample 8 numbers from a unit-Gaussian

tensor([-0.0558, -0.0268,  0.0302, -0.5548, -0.7238,  0.4541, -1.9935,  0.2738])

## Distribution classes
Sometimes we instead want to sample from <mark>more complex distributions, or to be able to treat distributions as objects with their own parameters and methods</mark>. 
`torch.distributions` contains a variety of such classes.

The majority of this example will use the `Normal` distribution, but check https://pytorch.org/docs/stable/distributions.html for more info

In [9]:
from torch import distributions

norm = distributions.Normal(loc=0, scale=1)  # scale here is the standard deviation

Ones instantiated, `Distribution`s have a variety of methods, e.g.:

In [10]:
norm.log_prob(Tensor([2]))  # evaluate the log PDF at x=2

tensor([-2.9189])

Normally, methods can take multi-element tensors, which results in the operation being broadcast across each element:

In [11]:
norm.log_prob(Tensor([-2,-1,0,1,2]))  # evaluate the log probability at multiple values

tensor([-2.9189, -1.4189, -0.9189, -1.4189, -2.9189])

In [12]:
norm.cdf(Tensor([-2,-1,0,1,2]))  # evaluate the cumulative probability

tensor([0.0228, 0.1587, 0.5000, 0.8413, 0.9772])

We can also <mark>randomly sample from the distribution by specifying the desired shape of the resulting tensor</mark>:

In [24]:
norm.sample([3,2])

tensor([[[[10.1973,  1.5331],
          [20.2071,  3.3548]],

         [[ 9.5750,  0.3920],
          [21.0693,  6.0694]]],


        [[[ 9.1970,  0.5481],
          [21.4307,  4.6483]],

         [[ 9.2024,  0.3354],
          [20.2116,  4.4884]]],


        [[[10.5075,  2.7779],
          [18.3875,  5.0385]],

         [[ 9.2039,  2.2837],
          [18.4709,  3.7309]]]])

## Parameterised distributions
Previously, we created a distribution using floats, but using tensors gives us a bit more flexibility. <mark>You can create a tensor in which each element has different mean and standard deviation:</mark>

In [34]:
norm = distributions.Normal(loc=Tensor([[0,2],[-1,3]]),scale=Tensor([[1,1.5],[6,2]]))

Effectively, our `norm` now contains 4 different Gaussians with a specify shape, and methods will now return tensors with that shape

In [37]:
norm.log_prob(Tensor([2]))  # evaluate the log PDF of all 4 Gaussians at x=2

tensor([[-2.9189, -1.3244],
        [-2.8357, -1.7371]])

Multi-point evaluation, needs to be done such that the evaluation points can be reshaped automatically

In [38]:
norm.log_prob(Tensor([-2,-1,0,1,2])[:,None,None])  # evaluate the log PDF of all 4 Gaussians at several points

tensor([[[-2.9189, -4.8800],
         [-2.7246, -4.7371]],

        [[-1.4189, -3.3244],
         [-2.7107, -3.6121]],

        [[-0.9189, -2.2133],
         [-2.7246, -2.7371]],

        [[-1.4189, -1.5466],
         [-2.7663, -2.1121]],

        [[-2.9189, -1.3244],
         [-2.8357, -1.7371]]])

In [39]:
norm.log_prob(Tensor([[-2,-1],[1,2]]))  # evaluate the log PDF each Gaussian at a different specific point

tensor([[-2.9189, -3.3244],
        [-2.7663, -1.7371]])

### Parameter updates
<mark>When the distributions are initialised, the values of the tensors are not copied, instead the distribution is given a pointer to the tensor. This means that if the the value of the tensor changes, then the distribution will also change accordingly:</mark>

In [40]:
loc = torch.tensor([0])
scale = torch.tensor([1])

In [41]:
norm = distributions.Normal(loc=loc,scale=scale)

In [42]:
norm.log_prob(Tensor([2]))

tensor([-2.9189])

Now let's change the parameters in-place

In [43]:
loc[0] = 3

In [44]:
norm.log_prob(Tensor([2]))

tensor([-1.4189])

In [45]:
scale *= 4

In [46]:
norm.log_prob(Tensor([2]))

tensor([-2.3365])

## Differentiable distributions
<mark>Most of the methods of a `Distribution` are differentiable</mark>, meaning that if the parameters of the distribution require gradient, the <mark>returned values will carry a gradient function</mark>

In [47]:
loc = torch.tensor([0.], requires_grad=True)
scale = torch.tensor([1.], requires_grad=True)

In [48]:
norm = distributions.Normal(loc=loc,scale=scale)

In [49]:
norm.log_prob(Tensor([2]))

tensor([-2.9189], grad_fn=<SubBackward0>)

In [50]:
norm.cdf(Tensor([0,1,2]))

tensor([0.5000, 0.8413, 0.9772], grad_fn=<MulBackward0>)

The <mark>exception is the `sample()` method:</mark>

In [51]:
norm.sample([2])

tensor([[-0.8584],
        [ 0.3077]])

However some distributions can be <mark>re-parameterised such that the samples are differentiable</mark>, e.g. the Gaussian samples can be drawn as `(scale*z~N(0,1))+loc)`.
The <mark>`rsample` method will return differentiable samples, if that is possible for the distribution.</mark>

In [52]:
norm.rsample([2])

tensor([[0.3847],
        [0.5515]], grad_fn=<AddBackward0>)