# High-dimensional Bayesian workflow

This tutorial describes a workflow for incrementally building pipelines to analyze high-dimensional data in Pyro. This workflow has evolved over a few years of applying Pyro to models with $10^5$ or more latent variables. While the individual components of the pipeline deserve their own tutorials, this tutorial focuses on incrementally combining those components. Workflow efficiency demands that code changes to upstream components dot break previous coding effort on downstream components. Pyro's approaches to this challenge include strategies for variational approximations ([pyro.infer.autoguide](https://docs.pyro.ai/en/stable/infer.autoguide.html)) and strategies for transforming model coordinate systems to improve geometry ([pyro.infer.reparam](https://docs.pyro.ai/en/stable/infer.reparam.html)).

#### Summary

- For simple black-box guides, try using components in [pyro.infer.autoguide](http://docs.pyro.ai/en/stable/infer.autoguide.html).
- For more complex guides, try using components in [pyro.contrib.easyguide](http://docs.pyro.ai/en/stable/contrib.easyguide.html).
- Decorate with `@easy_guide(model)`.
- Select multiple model sites using `group = self.group(match="my_regex")`.
- Guide a group of sites by a single distribution using `group.sample(...)`.
- Inspect concatenated group shape using `group.batch_shape`, `group.event_shape`, etc.
- Use `self.plate(...)` instead of `pyro.plate(...)`.
- To be compatible with subsampling, pass the `event_dim` arg to `pyro.param(...)`.
- To MAP estimate model site "foo", use `foo = self.map_estimate("foo")`.

#### Table of contents

- [Modeling time series data](#Modeling-time-series-data)
- [Writing a guide without EasyGuide](#Writing-a-guide-without-EasyGuide)
- [Using EasyGuide](#Using-EasyGuide)
- [Amortized guides](#Amortized-guides)

## Overview

Consider the problem of sampling from the posterior distribution of a probabilistic model with $10^5$ or more continuous latent variables, but whose data fits entirely in memory.
(For larger datasets, consider [amortized variational inference]().) Inference in such high-dimensional models can be challenging even when posteriors are known to be unimodal or even log-concave, due to strong correlations among latent variables.

To perform inference in such high-dimensional models in Pyro, we have evolved a [workflow](https://arxiv.org/abs/2011.01808) to incrementally build data analysis pipelines combining variational inference, MCMC, reparametrization effects, and ad-hoc initialization strategies. Our workflow is summarized as a sequence of steps, where validation after any step might suggest backtracking to change design decisions at a previous step.

1. Clean the data.
2. Create a generative model.
3. Create an initialization heuristic.
4. Sanity check using MAP or mean-field inference.
5. Reparameterize the model, evaluating results under mean field VI.
6. Customize the variational family (autoguides, easyguides, custom guides).
7. Optionally draw posterior samples via reparameterized, variationally preconditioned MCMC.

The crux to workflow efficiency is to ensure backtracking doesn't break the pipeline. That is, after one builds a number of pipeline stages, decides through validation that an early pipeline stage needs to be changed, and changes that early stage, one would like to minimize code changes needed in downstream stages. The remainder of this tutorial describes these steps individually, then describes nuances of interactions among stages.

In [None]:
import os
import torch
import pyro
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
from pyro.contrib.easyguide import easy_guide
from pyro.optim import Adam
from torch.distributions import constraints

smoke_test = ('CI' in os.environ)
assert pyro.__version__.startswith('1.7.0')