# Trying Pyro

[Pyro](https://pyro.ai/) is an open-source Probabilistic Programming library, originally developed by Uber. It uses PyTorch on the backend.

## Quick introduction to PyTorch

This section is mostly a recreation of PyTorch's [own getting started documentation](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html).

### Tensors

To start off with terminology, PyTorch **tensors** can be thought of as GPU-accelerated ndarrays (from NumPy). For example, we can get a $5 \times 3$ random tensor as follows

In [1]:
import torch

print(torch.rand(5, 3))
print(torch.zeros(5, 3))

tensor([[0.5290, 0.9945, 0.8214],
        [0.5325, 0.6967, 0.5570],
        [0.2706, 0.3105, 0.7093],
        [0.2892, 0.3908, 0.3652],
        [0.7669, 0.1806, 0.8546]])
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])


We can also specify the datatype using the `dtype` parameter

In [2]:
torch.zeros(5, 3, dtype=torch.long)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

The interoperability with NumPy is a major feature. For example, we can transform a NumPy ndarray into a tensor and back

In [3]:
import numpy as np

numpy_array = np.ones(3, dtype=np.long)
pytorch_tensor = torch.from_numpy(numpy_array)
print(numpy_array)
print(pytorch_tensor)
print(pytorch_tensor.numpy())

[1 1 1]
tensor([1, 1, 1])
[1 1 1]


The memory is shared among both, so in-place operations are reflected in both variables

In [4]:
np.add(numpy_array, 2, out=numpy_array)
print(pytorch_tensor)

tensor([3, 3, 3])


A very important characteristic of tensors is their ability to track the operations on themselves, if `requires_grad` is true.

In [5]:
grad_tensor_1 = torch.ones(2, 2, requires_grad=True)
print(grad_tensor_1)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [6]:
grad_tensor_2 = grad_tensor_1 + 2
print(grad_tensor_2)
print(grad_tensor_2.grad_fn)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7efd60b318b0>


In [7]:
grad_tensor_3 = grad_tensor_2 * grad_tensor_2 * 3
out = grad_tensor_3.mean()
print(grad_tensor_3)
print(out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor(27., grad_fn=<MeanBackward0>)


Now we can do a backpropagation on the `out` variable, and get the matrix with the `grad` parameter of the first tensor.

In [8]:
out.backward()
print(grad_tensor_1.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


We can also mode tensors into the GPU and back using `.to`

In [19]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    gpu_tensor = torch.ones_like(grad_tensor_1, device=device)
    print(gpu_tensor)
    gpu_tensor_using_to = grad_tensor_1.to(device)
    print(gpu_tensor_using_to)
    sum_gpu_tensors = gpu_tensor + gpu_tensor_using_to
    print(sum_gpu_tensors)
    print(sum_gpu_tensors.to("cpu", dtype=torch.double))

tensor([[1., 1.],
        [1., 1.]], device='cuda:0')
tensor([[1., 1.],
        [1., 1.]], device='cuda:0', grad_fn=<CopyBackwards>)
tensor([[2., 2.],
        [2., 2.]], device='cuda:0', grad_fn=<AddBackward0>)
tensor([[2., 2.],
        [2., 2.]], dtype=torch.float64, grad_fn=<CopyBackwards>)


### Distributions

The [`torch.distributions`](https://pytorch.org/docs/master/distributions.html) package is composed of parameterizable probability distributions and sampling functions, and is designed like the TensorFlow Distributions package. The abstract base class of these is

In [9]:
torch.distributions.Distribution

torch.distributions.distribution.Distribution

And we can, for example, use the Bernoulli distribution as follows

In [10]:
dist = torch.distributions.Bernoulli(0.7)
dist.sample(sample_shape=[10, 5])

tensor([[0., 1., 1., 0., 0.],
        [1., 1., 0., 1., 0.],
        [1., 0., 1., 0., 1.],
        [1., 1., 0., 1., 1.],
        [1., 1., 0., 1., 1.],
        [0., 1., 1., 1., 1.],
        [1., 0., 0., 0., 1.],
        [1., 0., 0., 0., 1.],
        [0., 1., 0., 1., 1.],
        [1., 1., 1., 1., 0.]])

## Basic Pyro ideas

This section is a recreation of the [Pyro tutorials documentation](http://pyro.ai/examples/index.html).

A **stochastic function** (or **models** in Pyro's terminology) is the basic unit of probabilistic programs. Concretely, they can be any Python callable.

**Primitive stochastic functions** (or **distributions**) are stochastic functions for which we can explicitly compute the probability of the outputs, given the inputs. Pyro has the thin wrapper `pyro.distributions` for PyTorch's distributions package, for example

In [11]:
import pyro

pyro.set_rng_seed(101)
normal = pyro.distributions.Normal(0, 1)
normal.sample(sample_shape=[5])

tensor([-1.3905, -0.8152, -0.3204,  0.7377, -1.7534])