# Workflow for High-Dimensional Models

This notebook demonstrates a workflow for incrementally building pipelines to analyze high-dimensional data in Pyro. This tutorial assumes the reader is familiar with Pyro [models](http://pyro.ai/examples/intro_part_i.html), [inference](http://pyro.ai/examples/intro_part_ii.html), and [SVI](http://pyro.ai/examples/svi_part_i.html), and is aware of popular Bayesian workflows such as ([Gelman et al. 2020](https://arxiv.org/abs/2011.01808)).

#### Summary

Incrementally build a pipeline that includes some of the following steps:
1. Clean the data.
2. Create a generative model.
3. Create an [initialization](https://docs.pyro.ai/en/dev/infer.autoguide.html#module-pyro.infer.autoguide.initialization) heuristic.
4. Sanity check point estimates using MAP inference ([AutoDelta](https://docs.pyro.ai/en/dev/infer.autoguide.html#autodelta)).
5. Sanity check uncertainty using mean-field inference ([AutoNormal](https://docs.pyro.ai/en/dev/infer.autoguide.html#autonormal)).
6. [Reparameterize](https://docs.pyro.ai/en/dev/infer.reparam.html) the model to improve geometry.
7. Customize the variational family.
8. Draw high-quality posterior samples via variationally preconditioned [NUTS](https://docs.pyro.ai/en/dev/mcmc.html#nuts).

After each step, validate your results. If validation fails, backtrack and modify one of the previous steps.

#### Table of contents

## Overview

Consider the problem of sampling from the posterior of a probabilistic model with 100k or more continuous latent variables, but whose data fits entirely in memory.
Inference in such high-dimensional models can be challenging even when posteriors are known to be unimodal or even log-convex, due to strong correlations among latent variables.

To perform inference in such models in Pyro, we have evolved a workflow ([Gelman et al. 2020](https://arxiv.org/abs/2011.01808)) to incrementally build data analysis pipelines combining variational inference, MCMC, reparameterization effects, and ad-hoc initialization strategies.
Our workflow is summarized as a sequence of steps, where validation after any step might suggest backtracking to change design decisions at a previous step. (This workflow omits explicit steps for validation because in our experience validation is much more problem-specific than the other steps, and because some sort of validation follows each step.)


## Workflow steps

### Clean the data

This first step of the Bayesian workflow is notable in that failures at later steps often indicate errors in data processing, so we must often return here to e.g. remove mis-coded data or fix units errors.
Pytorch is great for data cleaning because it is fast, easily handles large tensors, and offers easy serialization via [torch.save()]() and [torch.load()]().

### Create a generative model.

### Create an initialization heuristic

### Sanity check point estimates using MAP inference

### Sanity check uncertainty using mean field variational inference

### Reparametrize the model

### Customize the variational family

### Draw high-quality samples using NUTS