# Vectorization and Distribution shapes in Pyro
> This post present how to vectorize pyro codes
- toc: true 
- badges: true
- comments: true
- categories: [PPL, Pyro, Statistical Inference]
- image: images/ppl-pyro-intro.png


## Introduction

In the previous post we introduced pyro and its building blocks such as schotastic function, primitive sample and param primitive statement, model and guide. We also defined pyro model and use it to generate data, learn from data and predict future observations.

In this section, we will learn in details about inference in Pyro, how to use Pyro primitives and the effect handling library (pyro.poutine) to build custom tools for analysis.

Consider a previous poison regression model

In [15]:
import torch
import pyro
import pyro.distributions as dist
from torch.distributions import constraints
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
pyro.set_rng_seed(101)
torch.manual_seed(101)
%matplotlib inline

In [38]:
def model_(y):
    slope = pyro.sample("slope", dist.Normal(0, 0.1))
    intercept = pyro.sample("intercept", dist.Normal(0, 1))
    for t in range(len(y)):
        rate = torch.exp(intercept + slope * t)
        y[t] = pyro.sample("count_{}".format(t), dist.Poisson(rate),
                                obs=y[t])

## Plate statement

From the given  model above , **pyro.param** designate model parameters that we would like to optimize. Observations are denoted by the obs= keyword argument to pyro.sample. This specifies the likelihood function. Instead of log transforming the data, we use a LogNormal distribution. The observations are conditionally independent given the latent random variable slope and intercept. To explicitly mark this in Pyro, **plate** statement is used to construct conditionally independent sequences of variables.

```python
with pyro.plate("name", size, subsample_size, device) as ind:
    # ...do conditionally independent stuff with ind...
```
However compared to ``range()`` each invocation of **plate** requires the user to provide a unique name. The **plate**  statement can be used either sequentially as a generator or in parallel as a context manager. Sequential plate is similar to ``range()``in that it generates a sequence of values. 
```python
 # This version declares sequential independence and subsamples data:
    for i in plate('data', 100, subsample_size=10):
         if z[i]:  # Control flow in this example prevents vectorization.
                obs = sample('obs_{}'.format(i), dist.Normal(loc, scale), obs=data[i])
```
Vectorized plate is similar to ``torch.arange()`` in that it yields an array of indices by which other tensors can be indexed. However, unlike  ``torch.arange()`` **plate**  also informs inference algorithms that the variables being indexed are conditionally independent.
```python
     # This version declares vectorized independence:
     with plate('data'):
            obs = sample('obs', dist.Normal(loc, scale), obs=data)
```
Additionally, plate can take advantage of the conditional independence assumptions by subsampling the indices and informing inference algorithms to scale various computed values. This is typically used to subsample minibatches of data:
```python
with plate("data", len(data), subsample_size=100) as ind:
    batch = data[ind]
    assert len(batch) == 100
```

You can additionally nest plates, e.g. if you have per-pixel independence:

```python
with pyro.plate("x_axis", 320):
    # within this context, batch dimension -1 is independent
    with pyro.plate("y_axis", 200):
        # within this context, batch dimensions -2 and -1 are independent
```
Finaly you can declare multiple plates and use them as reusable context managers. For example if you want to mix and match plates for e.g. noise that depends only on x, some noise that depends only on y, and some noise that depends on both

```python
x_axis = pyro.plate("x_axis", 3, dim=-2)
y_axis = pyro.plate("y_axis", 2, dim=-3)
with x_axis:
    # within this context, batch dimension -2 is independent
with y_axis:
    # within this context, batch dimension -3 is independent
with x_axis, y_axis:
    # within this context, batch dimensions -3 and -2 are independent
```

In [39]:
def model_(y):
    slope = pyro.sample("slope", dist.Normal(0, 0.1))
    intercept = pyro.sample("intercept", dist.Normal(0, 1))
    with pyro.plate('N', len(y)) as t:                        
        log_y_hat = slope * t.type(torch.float) + intercept
        y=pyro.sample('y', dist.LogNormal(log_y_hat, 1.), obs=y)

## Distribution shapes

Unlike PyTorch Tensors which have  a single .shape attribute, pyro Distributions have two shape **batch_shape** and **event_shape**. These two combine to define the total shape of a sample. The batch_shape denote conditionally independent random variables, whereas .event_shape denote dependent random variables (ie one draw from a distribution). Because the dependent random variables define probability together, the .log_prob() method only produces a single number for each event of shape .event_shape.

In [40]:
d = dist.Bernoulli(0.5)
print(d.batch_shape)
print(d.event_shape)

torch.Size([])
torch.Size([])


In [41]:
x = d.sample()
x.shape

torch.Size([])

Distributions can be batched by passing in batched parameters.

In [42]:
d = dist.Bernoulli(0.5*torch.ones(50))
print(d.batch_shape)
print(d.event_shape)

torch.Size([50])
torch.Size([])


In [43]:
x = d.sample()
x.shape

torch.Size([50])

From the two examples above, we observe that univariate distributions have empty event shape (because each number is an independent event). Let also consider multivariate distribution.

In [44]:
md = dist.MultivariateNormal(torch.zeros(3), torch.eye(3))
print(md.batch_shape)
print(md.event_shape)

torch.Size([])
torch.Size([3])


In [45]:
y = md.sample()
y.shape

torch.Size([3])

We can also create batched multivariate distribution as follows.

In [46]:
md = dist.MultivariateNormal(torch.zeros(3), torch.eye(3)).expand([50])
print(md.batch_shape)
print(md.event_shape)

torch.Size([50])
torch.Size([3])


In [47]:
y = md.sample()
y.shape

torch.Size([50, 3])

Because Multivariate distributions have nonempty **.event_shape**, the shapes of .sample() and .log_prob(x) differ:

In [48]:
md.log_prob(y).shape

torch.Size([50])

The **Distribution.sample()** method also takes a sample_shape parameter that indexes over independent identically   distributed (iid) random varables, such that:

```python
sample.shape == sample_shape + batch_shape + event_shape
```




In [49]:
y_sample =md.sample([10])
y_sample.shape

torch.Size([10, 50, 3])

### Reshaping distributions

You can treat a univariate distribution as multivariate by calling the ``.to_event(n)`` property where **n** is the number of batch dimensions (from the right) to declare as dependent.

In [50]:
d = dist.Bernoulli(0.5*torch.ones(50, 3)).to_event(1)
print(d.batch_shape)
print(d.event_shape)

torch.Size([50])
torch.Size([3])


While working with distributions in pyro it is essential to note that: 
1. Samples have shape batch_shape + event_shape, 
2. ``.log_prob(x)`` values have shape batch_shape. 
3. You’ll need to ensure that ``batch_shape`` is carefully controlled by either trimming it down with ``.to_event(n)`` or by declaring dimensions as independent via ``pyro.plate``.


Often in Pyro we’ll declare some dimensions as dependent even though they are in fact independent. This allows us to easily swap in a MultivariateNormal distribution later, but aslo it simplifies the code as  we don’t need a plate. Consider the following two codes

In [51]:
x = pyro.sample("x", dist.Normal(0, 1).expand([10]).to_event(1))

In [52]:
x.shape

torch.Size([10])

In [53]:
with pyro.plate("y_plate", 10):
    y = pyro.sample("y", dist.Normal(0, 1))  # .expand([10]) is automatic
  

In [54]:
y.shape

torch.Size([10])

From the two code examples, the second version with plate informs Pyro that it can make use of conditional independence information when estimating gradients, whereas in the first version Pyro must assume they are dependent (even though the normals are in fact conditionally independent).