# API abstraction levels in sbi

`sbi` offers flexibility ranging from simple, high-level workflows to full control over neural networks and sampling. This guide shows:

1. **Four abstraction levels** for controlling the density estimator (common to NPE and NLE)
2. **Additional sampling control** for NLE (4 more levels)

We'll use the same simple example throughout to keep things clear.

## Setup

First, let's define a simple linear Gaussian simulator and generate data we'll use for all examples:

In [None]:
import torch

from sbi.inference import NLE, NPE
from sbi.utils import BoxUniform


# Define a simple linear Gaussian simulator
def simulator(theta):
    """Linear Gaussian simulator with noise."""
    return theta + 1.0 + torch.randn_like(theta) * 0.1

# Define prior over 3 parameters
num_dim = 3
prior = BoxUniform(low=-2 * torch.ones(num_dim), high=2 * torch.ones(num_dim))

# Generate training data (used for all examples)
num_simulations = 2000
theta = prior.sample((num_simulations,))
x = simulator(theta)

# Generate a single observation for inference
theta_o = prior.sample((1,))
x_o = simulator(theta_o)

print(f"Generated {num_simulations} simulations for training")
print(f"Parameter shape: {theta.shape}, Data shape: {x.shape}")

Generated 2000 simulations for training
Parameter shape: torch.Size([2000, 3]), Data shape: torch.Size([2000, 3])


# Part 1: Density Estimator Abstraction Levels

The following **4 levels apply to both NPE and NLE**. They control how the neural density estimator is specified and constructed. We'll demonstrate with NPE first.

## Level 1: Trainer Classes (Recommended)

**Use case**: Standard workflows, most common approach

The trainer classes provide the recommended interface with string-based customization.

In [11]:
# Level 1: Simple trainer class with string specification
inference = NPE(prior=prior, density_estimator="nsf")

# Train on the data
inference.append_simulations(theta, x)
posterior_net = inference.train()

# Build posterior and sample
posterior = inference.build_posterior()
samples_lvl1 = posterior.sample((1000,), x=x_o)

print("Level 1 complete - used NSF with default settings")

 Neural network successfully converged after 100 epochs.

1094it [00:00, 50601.21it/s]            

Level 1 complete - used NSF with default settings





**Key features**:
- Simple string specification: `"nsf"`, `"maf"`, `"zuko_nsf"`, `"mdn"`, etc.
- Multi-round inference support
- Automatic handling of training loops
- **Start here** for most use cases

## Level 2: Factory Functions

**Use case**: Need specific architecture hyperparameters

Use factory functions like `posterior_nn()` when you need to tune the network architecture.

In [12]:
from sbi.neural_nets import posterior_nn

# Level 2: Factory function with custom hyperparameters
density_estimator = posterior_nn(
    model="maf",              # Masked Autoregressive Flow
    hidden_features=50,        # Customize hidden layer size
    num_transforms=5,          # Customize number of transform layers
)

# Pass to NPE (rest of workflow is the same)
inference = NPE(prior=prior, density_estimator=density_estimator)
inference.append_simulations(theta, x)
posterior_net = inference.train()

posterior = inference.build_posterior()
samples_lvl2 = posterior.sample((1000,), x=x_o)

print("Level 2 complete - used MAF with custom hyperparameters")

 Neural network successfully converged after 64 epochs.

100%|██████████| 1000/1000 [00:00<00:00, 62516.64it/s]

Level 2 complete - used MAF with custom hyperparameters





**Key features**:
- Fine-grained control over hyperparameters
- Can add embedding networks for high-dimensional data
- Still benefits from trainer conveniences
- For NPE: `posterior_nn()`, for NLE: `likelihood_nn()`, for NRE: `classifier_nn()`

## Level 3: Direct Network Builders

**Use case**: Custom neural network architecture with full parameter access

Use direct builder functions like `build_nsf()` for maximum control over network construction.

In [13]:
from functools import partial

from sbi.neural_nets.net_builders.flow import build_nsf

# Level 3: Direct builder with full parameter control
custom_builder = partial(
    build_nsf,
    hidden_features=60,
    num_transforms=3,
    num_bins=8,                # Number of spline bins
    tail_bound=3.0,            # Spline tail bound
)

# Pass to NPE (rest of workflow is the same)
inference = NPE(prior=prior, density_estimator=custom_builder)
inference.append_simulations(theta, x)
posterior_net = inference.train()

posterior = inference.build_posterior()
samples_lvl3 = posterior.sample((1000,), x=x_o)

print("Level 3 complete - used custom NSF configuration")

 Neural network successfully converged after 148 epochs.

1092it [00:00, 72132.23it/s]            

Level 3 complete - used custom NSF configuration





**Key features**:
- Direct access to all builder parameters
- Maximum flexibility for architecture design
- Can implement fully custom architectures by subclassing `DensityEstimator`

## Level 4: Custom Training Loops

**Use case**: Custom training logic, loss functions, research applications

For complete control over the training process, implement custom training loops. This is covered in detail in [advanced tutorial 18](https://sbi.readthedocs.io/en/latest/advanced_tutorials/18_training_interface.html).

At this level, you:
- Manually construct the density estimator
- Define custom loss functions and regularization
- Implement your own training loops with custom data loaders
- Have full control over optimization, early stopping, etc.

**When to use**: Research on new methods, custom loss functions, specialized data augmentation.

# Part 2: NLE - Same Levels + Sampling Control

## Understanding the Difference

**NPE** directly approximates the posterior $p(\theta|x)$:
- Sampling is straightforward: just sample from the neural network
- No additional configuration typically needed

**NLE** approximates the likelihood $p(x|\theta)$:
- Must combine with prior using MCMC, VI, or rejection sampling to get posterior samples
- This adds a **second dimension of control**: choosing and configuring the sampling method

**Important**: The 4 density estimator levels above work exactly the same for NLE - just use `likelihood_nn()` instead of `posterior_nn()` at Level 2.

## NLE Density Estimator (Same 4 Levels)

Quick example showing NLE uses the same abstraction levels:

In [None]:
# Level 1 with NLE - same pattern as NPE
inference_nle = NLE(prior=prior, density_estimator="nsf")
inference_nle.append_simulations(theta, x)
likelihood_net = inference_nle.train()

# Build posterior (defaults to MCMC)
posterior_nle = inference_nle.build_posterior()
samples_nle = posterior_nle.sample((1000,), x=x_o)

print("NLE Level 1 complete")
print("Default sampling method: MCMC with slice_np_vectorized")

 Neural network successfully converged after 78 epochs.

Generating 20 MCMC inits via resample strategy: 100%|██████████| 20/20 [00:01<00:00, 14.46it/s]
Running vectorized MCMC with 20 chains: 100%|██████████| 6000/6000 [00:19<00:00, 301.30it/s]

NLE Level 1 complete
Default sampling method: MCMC with slice_np_vectorized





**Note**: Levels 2-4 for the density estimator work identically:
- Level 2: Use `likelihood_nn()` instead of `posterior_nn()`
- Level 3: Use `build_nsf()` (same as NPE)
- Level 4: Custom training (see tutorial 18)

# Part 3: NLE Sampling Control

NLE provides **additional control over how posterior samples are generated**. This is independent of the density estimator configuration above.

Four levels of sampling control:

## Sampling Level 1: Default

**Use case**: Starting point, works well for most problems

Just call `build_posterior()` with no arguments - uses slice sampling by default.

In [15]:
# Sampling Level 1: Use defaults
posterior = inference_nle.build_posterior()
samples = posterior.sample((1000,), x=x_o)

print("Sampling Level 1: Default MCMC (slice_np_vectorized)")

Generating 20 MCMC inits via resample strategy: 100%|██████████| 20/20 [00:01<00:00, 15.70it/s]
Running vectorized MCMC with 20 chains: 100%|██████████| 6000/6000 [00:19<00:00, 306.88it/s]

Sampling Level 1: Default MCMC (slice_np_vectorized)





**Default behavior**: MCMC with `slice_np_vectorized` method, 200 warmup steps, 20 chains.

## Sampling Level 2: Choose Method

**Use case**: Different problem characteristics favor different sampling methods

Use the `sample_with` parameter to choose between MCMC, rejection sampling, VI, or importance sampling.

In [16]:
# Sampling Level 2: Choose sampling method
# Use rejection sampling instead of default MCMC (fast for few parameters)
posterior_rejection = inference_nle.build_posterior(sample_with="rejection")
samples_rejection = posterior_rejection.sample((1000,), x=x_o)

print("Sampling Level 2: Using rejection sampling instead of MCMC")

                        may take a long time to collect the remaining 994
                        samples. Consider interrupting (Ctrl-C) and switching to a
                        different sampling method with
                        `build_posterior(..., sample_with='mcmc')`. or
                        `build_posterior(..., sample_with='vi')`.
Drawing 1000 posterior samples: 1004it [00:26, 38.12it/s]                         

Sampling Level 2: Using rejection sampling instead of MCMC





**Available sampling methods**:
- `"mcmc"`: Markov Chain Monte Carlo (default) - accurate but can be slow
- `"rejection"`: Rejection sampling - fast and accurate for few parameters (<3)
- `"vi"`: Variational inference - faster for many parameters (>10), may be less accurate
- `"importance"`: Importance sampling - useful for refining VI posteriors

**Usage**: Simply change `sample_with="rejection"` to `sample_with="vi"` or any other method.

**See also**: [how_to_guide/09_sampler_interface.ipynb](https://sbi.readthedocs.io/en/latest/how_to_guide/09_sampler_interface.html) for detailed guidance on choosing sampling algorithms.

## Sampling Level 3: Configure Method Specifics

**Use case**: Choose specific algorithms within a sampling method

Use `mcmc_method` or `vi_method` parameters to select specific algorithms.

In [17]:
# Sampling Level 3: Configure method specifics
# Use NUTS (No-U-Turn Sampler) instead of default slice sampling
posterior_nuts = inference_nle.build_posterior(
    sample_with="mcmc",
    mcmc_method="nuts_pyro"
)
samples_nuts = posterior_nuts.sample((1000,), x=x_o)

print("Sampling Level 3: Using NUTS from Pyro")

https://sbi.readthedocs.io/en/latest/how_to_guide/19_posterior_parameters.html#
Generating 20 MCMC inits via resample strategy: 100%|██████████| 20/20 [00:01<00:00, 15.86it/s]
Sample [0]: 100%|██████████| 250/250 [00:14, 17.72it/s, step size=6.63e-01, acc. prob=0.907]
Sample [1]: 100%|██████████| 250/250 [00:14, 17.27it/s, step size=7.33e-01, acc. prob=0.892]
Sample [2]: 100%|██████████| 250/250 [00:12, 19.88it/s, step size=7.17e-01, acc. prob=0.905]
Sample [3]: 100%|██████████| 250/250 [00:13, 17.96it/s, step size=6.11e-01, acc. prob=0.889]
Sample [4]: 100%|██████████| 250/250 [00:14, 17.42it/s, step size=7.32e-01, acc. prob=0.905]
Sample [5]: 100%|██████████| 250/250 [00:13, 18.08it/s, step size=7.03e-01, acc. prob=0.862]
Sample [6]: 100%|██████████| 250/250 [00:15, 15.72it/s, step size=6.64e-01, acc. prob=0.909]
Sample [7]: 100%|██████████| 250/250 [00:12, 19.67it/s, step size=8.07e-01, acc. prob=0.892]
Sample [8]: 100%|██████████| 250/250 [00:13, 18.29it/s, step size=8.93e-01, acc.

Sampling Level 3: Using NUTS from Pyro





**Available MCMC methods** (use with `mcmc_method=`):
- `"slice_np_vectorized"`: Slice sampling (numpy, vectorized, **default**)
- `"slice_np"`: Slice sampling (numpy, sequential)
- `"nuts_pyro"`: No-U-Turn Sampler (Pyro)
- `"hmc_pyro"`: Hamiltonian Monte Carlo (Pyro)
- `"slice_pymc"`, `"hmc_pymc"`, `"nuts_pymc"`: PyMC samplers

**Available VI methods** (use with `vi_method=`):
- `"rKL"`: Reverse KL divergence (mode-seeking, **default**)
- `"fKL"`: Forward KL divergence (mass-covering)
- `"IW"`: Importance weighted
- `"alpha"`: Alpha divergence

**Usage**: Change `mcmc_method="nuts_pyro"` to any other MCMC method, or use `vi_method="fKL"` when `sample_with="vi"`.

## Sampling Level 4: Fine-Tune Parameters

**Use case**: Optimize sampling performance, troubleshoot convergence issues

Fine-tune sampling parameters using dictionaries or `PosteriorParameters` dataclasses.

In [18]:
# Sampling Level 4a: Using parameter dictionaries
posterior_tuned = inference_nle.build_posterior(
    sample_with="mcmc",
    mcmc_method="slice_np_vectorized",
    mcmc_parameters={
        "warmup_steps": 100,      # Burn-in samples to discard
        "num_chains": 4,          # Number of parallel chains
        "thin": 2,                # Thinning factor
        "num_workers": 2,         # CPU cores for parallelization
    }
)

print("Sampling Level 4a: Dictionary-based parameter tuning")

Sampling Level 4a: Dictionary-based parameter tuning


https://sbi.readthedocs.io/en/latest/how_to_guide/19_posterior_parameters.html#


In [19]:
# Sampling Level 4b: Using PosteriorParameters (recommended)
from sbi.inference.posteriors import MCMCPosteriorParameters

mcmc_params = MCMCPosteriorParameters(
    method="nuts_pyro",
    warmup_steps=100,
    num_chains=4,
    init_strategy="sir",                               # Sequential Importance Resampling for init
    init_strategy_parameters={"num_candidate_samples": 1000},
    num_workers=2,
    mp_context="spawn"                                 # Multiprocessing context
)

posterior_advanced = inference_nle.build_posterior(
    posterior_parameters=mcmc_params
)

samples_advanced = posterior_advanced.sample((1000,), x=x_o)

print("Sampling Level 4b: PosteriorParameters with validation")

  init_fn = self._build_mcmc_init_fn(
Generating 4 MCMC inits via sir strategy: 100%|██████████| 4/4 [00:04<00:00,  1.09s/it]
Warmup [1]:   0%|          | 0/350 [00:00, ?it/s]
[A

Warmup [1]:   0%|          | 1/350 [00:04,  4.76s/it, step size=1.87e+00, acc. prob=0.641]
[A

[A[A
Warmup [1]:   1%|          | 4/350 [00:05,  1.01s/it, step size=3.75e-02, acc. prob=0.433]

[A[A
Warmup [1]:   1%|▏         | 5/350 [00:05,  1.27it/s, step size=5.58e-02, acc. prob=0.545]
Warmup [1]:   2%|▏         | 7/350 [00:05,  2.10it/s, step size=7.14e-02, acc. prob=0.637]
Warmup [1]:   3%|▎         | 10/350 [00:05,  3.69it/s, step size=1.48e-01, acc. prob=0.711]
[A
Warmup [1]:   4%|▎         | 13/350 [00:06,  4.49it/s, step size=3.73e-02, acc. prob=0.697]

[A[A
[A
Warmup [1]:   4%|▍         | 14/350 [00:06,  4.57it/s, step size=5.04e-02, acc. prob=0.711]
Warmup [1]:   5%|▍         | 16/350 [00:06,  5.85it/s, step size=1.56e-01, acc. prob=0.744]
[A
[A
[A

Warmup [1]:   5%|▌         | 18/350 [0

Sampling Level 4b: PosteriorParameters with validation





**Key tuning parameters**:
- `warmup_steps`: Number of initial samples to discard (default: 200)
- `num_chains`: Number of parallel chains (default: 20)
- `thin`: Thinning factor - keep every nth sample (default: -1, auto)
- `init_strategy`: How to initialize chains (`"proposal"`, `"sir"`, `"resample"`)
- `num_workers`: Number of CPU cores for parallelization

**Advantages of PosteriorParameters**:
- Type checking and validation
- Better IDE autocomplete support
- Clear documentation of available parameters

**See also**: [how_to_guide/19_posterior_parameters.ipynb](https://sbi.readthedocs.io/en/latest/how_to_guide/19_posterior_parameters.html) for complete details.

# Decision Guides

## Guide 1: Which Density Estimator Level? (NPE and NLE)

| You want to... | Use Level | Example |
|----------------|-----------|----------|
| Standard workflows with good defaults | **1** | `NPE(prior, density_estimator="nsf")` |
| Try different density estimator types | **1** | Switch `"nsf"`, `"maf"`, `"zuko_nsf"` |
| Tune network depth or width | **2** | `posterior_nn(hidden_features=100)` |
| Add embedding networks for images/timeseries | **2** | `posterior_nn(embedding_net=my_cnn)` |
| Access specialized flow parameters | **3** | `build_nsf(num_bins=16, tail_bound=5.0)` |
| Implement custom network architecture | **3** | Subclass `DensityEstimator` |
| Define custom loss functions or training | **4** | See advanced tutorial 18 |

**Rule of thumb**: Start with Level 1. Move to higher levels only when you need specific control.

## Guide 2: Which Sampling Level? (NLE and NRE only)

| Situation | Sampling Level | Example |
|-----------|----------------|----------|
| Starting out, need good defaults | **1** | `build_posterior()` |
| Very few parameters (<3) | **2** | `sample_with="rejection"` |
| Many parameters (>10), speed is critical | **2** | `sample_with="vi"` |
| Want to use NUTS or HMC | **3** | `mcmc_method="nuts_pyro"` |
| MCMC not converging, need more warmup | **4** | `mcmc_parameters={"warmup_steps": 500}` |
| Want type checking and validation | **4** | `MCMCPosteriorParameters(...)` |
| Troubleshooting sampling issues | **4** | Tune `num_chains`, `init_strategy`, etc. |

**Rule of thumb**: 
- Start with default MCMC (Level 1)
- If too slow, try rejection (few params) or VI (many params) at Level 2
- Use Level 3-4 for optimization or troubleshooting

# Summary

## All Methods (NPE, NLE, NRE)

**4 Abstraction Levels for Density Estimator:**

- **Level 1**: Trainer classes with strings → `NPE(prior, density_estimator="nsf")`
- **Level 2**: Factory functions → `posterior_nn(model="maf", hidden_features=50)`
- **Level 3**: Direct builders → `build_nsf(num_bins=8, tail_bound=3.0)`
- **Level 4**: Custom training → Full control (see tutorial 18)

## NPE Sampling

- Direct sampling from neural network
- No additional configuration needed
- Optionally can use MCMC/VI for more control

## NLE and NRE Sampling (Additional Dimension)

**4 Sampling Control Levels:**

- **Level 1**: Default → `build_posterior()` (uses slice_np_vectorized)
- **Level 2**: Choose method → `sample_with="mcmc"/"vi"/"rejection"/"importance"`
- **Level 3**: Configure algorithm → `mcmc_method="nuts_pyro"`, `vi_method="fKL"`
- **Level 4**: Fine-tune parameters → `mcmc_parameters={...}` or `MCMCPosteriorParameters(...)`

## General Principle

**Start simple, add complexity only when needed:**
1. Begin with Level 1 for density estimator
2. For NLE, begin with default sampling (Level 1)
3. Move to higher levels only when you need specific control or encounter issues
4. Both dimensions are independent - you can use Level 1 density estimator with Level 4 sampling, or vice versa