# SVI Part II

## Marking Conditional Independence in Pyro
### Sequential Plate

```python
def model(data):
    # sample f from the beta prior
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    # loop over the observed data using pyro.sample with the obs keyword argument
    for i in range(len(data)):
        # observe datapoint i using the bernoulli likelihood
        pyro.sample('obs_{}'.format(i), dist.Bernoulli(f), obs=data[i])
```

For this model, the observations are conditionally independent given the latent random variable `latent_fairness`. To explicitly mark thi sin Pyro we basically just need to replace the Python builtin `range` with the Pyro construct `plate`:

```python
def model(data):
    f = pyro.sample('latent_fairness', dist.Beta(alpha0, beta0))
    for i in pyro.plate('data_loop', len(data)):
        pyro.sample('obs_{}'.format(i), dist.Bernoulli(f), obs=data[i])
```

plate is very similar to range, but each invocation of plate requires a unique name.

In detail:
* each observed `pyro.sample` statement occurs within a different execution of the body of the `for` loop, Pyro marks each observation as independent
* this independence is properly a _conditional_ independence _given_ `latent_fairness` because `latent_fairness` is sampled outside the context of `data_loop`

Gotchas

THIS CODE IS WRONG
```python
my_reified_list = list(pyro.plate('data_loop', len(data)))
for i in my_reified_list:
    pyro.sample('obs_{}'.format(i), dist.Bernoulli(f), obs=data[i])
```

Pyro plate is not approprate for temporal models where each iteration of a loop depends on the previous iteration, `range` or `pyro.markov` should be used instead.

### Vectorized Plate

1. need data to be a tensor

```python
data = torch.zeros(10)
data[0:6] = torch.ones(6) # 6 heads 4 tails
```

```python
with pyro.plate('observe_data'):
    pyro.sample('obs' dist.Bernoulli(f), obs=data)
```

* both require unique name
* this code snippet only introduces a single observed random variable (obs) since the entire tensor is considered once.

### Subsampling

#### Automatic Subsampling with `plate`

Simplest case of subsampling.

```python
for i in pyro.plate('data_loop', len(data), subsample_size=5):
    pyro.sample('obs_{}'.format(i), dist.Bernoulli(f), obs=data[i])
```

This will use 5 randomly chosen data points.


With vectorized `plate`
```python
with plate('observe_data', size=10, subsample_size=5) as ind:
    pyro.sample('obs', dist.Bernoulli(f),
                obs=data.index_select(0, ind))
```

This causes plate to return a tensor of indices. User must pass a `device` argument to `plate` if `data` is on the GPU.

### Custom Subsampling strategies with `plate`

Random selection means that some datapoints are likely never sampled. User can control this with the `subsample` argument to `plate`.

### Subsampling when there are only local random variables

only local random variables like vanilla VAE is a special case where `subsample_size` and `subsample` are not used.

### Subsampling both global and local random variables

Intentionally leaving out some text here...

```python
def model(data):
    beta = pyro.sample('beta', ...)
    for i in pyro.plate('locals', len(data):
        z_i = pyro.sample('z_{i}'.format(i), ...)
        # compute the parameter used to define the observation likelihood using the local random variable
        theta_i = compute_something(z_i)
        pyro.sample("obs_{}".format(i), dist.MyDist(theta_i), obs=data[i])
```
* Note have `pyro.sample` statements both inside and outside the `plate` loop.

```python
def guide(data):
    beta = pyro.sample('beta', ...) # sample the global RV
    for i in pyro.plate('locals', len(data), subsample_size=5):
        # sample the local RVs
        pyro.sample('z_{}'.format(i), ..., lambda_i)
```
* Note that the indices will only be subsampled once in the guide. the pyro backend makes sure that the same set of indices are used during the execution of the model. For this reason, `subsample_size` only needs to be specified in the guide.


## Amortization

Lets consider a bmodel with global and local latent random variables and local variational parameters...


Instead of introducing local variational parameters, we're going to learn a single parametric function $f(\cdot)$ and work with a variational distribution that has the form.

q(beta) PI q(z_i | f(x_i))

The function $f(\cdot)$ which maps a given observation to a set of variational parameters tailored to that datapoint - will need to be sufficiently rich to capture the posterior accurately, but now we can handle large datasets without having to introduce an obscene number of variational parameters. This approach has other benefits too. For example, during learning $f(\cdot)$ effectively allows us to share statistical power among different datapoints. Note that this is precistely the approach used in VAE.