## Neural Posterior Estimation (NPE)

#### For this approach, its primary goal is to solve inverse problems where we want to infer the underlying parameters $\mathbf{\theta}$ of a model that generated some observed data $\mathbf{x}_o$, but the likelihood function $p(\mathbf{x} | \mathbf{\theta})$ is computationally intractable, i.e. there isn't a explicit form of it. This is a forward process: $\mathbf{\theta} \rightarrow \mathbf{x}$.

### The NPE algorithm, typically implemented using a class of neural networks called normalizing flows, follows these steps: 

#### 1. Define Priors: Specify a prior distribution $p(\mathbf{\theta})$ over the model parameters.
#### 2. Generate a Training Set: Run the simulator many times. For each simulation, sample a parameter vector $\mathbf{\theta}_i$ from the prior $p(\mathbf{\theta})$.

#### 3. Run the simulator with $\mathbf{\theta}_i$ to generate a corresponding data vector $\mathbf{x}_i$.This creates a large training dataset of pairs: ${ (\mathbf{\theta}_1, \mathbf{x}_1), (\mathbf{\theta}_2, \mathbf{x}_2), ... }$.

#### 4. Train a Neural Network: Train a conditional density estimator (e.g., a normalizing flow) $q_{\phi}(\mathbf{\theta} | \mathbf{x})$. The network takes the data $\mathbf{x}$ as input and is trained to produce a probability distribution over $\mathbf{\theta}$ that matches the true posterior.

#### 5. Inference (Amortization): After training, you can perform inference for any new observation $\mathbf{x}_o$.

In [2]:
### This is a basic example to understand NPE algorithm 
import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from torch.distributions import Normal, Uniform

torch.manual_seed(42)
np.random.seed(42)

In [3]:

# --- 1. Define the Simulator and Priors ---
def simulator(theta):
    """
    Simple simulator: generates data from a Normal distribution.
    Args:
        theta: True mean (scalar tensor)
    Returns:
        x: Generated data ( tensor of shape (n_observations,)
    """
    # We'll fix the number of observations per theta to 100
    n_observations = 100
    # We use a fixed standard deviation of 1.0
    dist = Normal(theta, 1.0)
    return dist.sample((n_observations,))

In [4]:
# --- 2. Generate Training Data ---
def generate_training_data(n_simulations, simulator, prior):
    """
    Generates pairs of parameters and simulated data.
    """
    theta_train = prior.sample((n_simulations,))
    x_train = torch.stack([simulator(t) for t in theta_train])
    
    return theta_train, x_train

n_simulations = 10000
theta_train, x_train = generate_training_data(n_simulations, simulator, prior)

print(f"Theta shape: {theta_train.shape}")   # (n_simulations,)
print(f"Data shape: {x_train.shape}")        # (n_simulations, n_observations=100)

# --- 3. Define the Neural Network (Conditional Density Estimator) ---
# We will use a simple Mixture Density Network (MDN) with a single Gaussian component.
# It takes data x as input and outputs the parameters of a Gaussian distribution (mean and log_std) for theta.

class PosteriorEstimator(nn.Module):
    def __init__(self, input_dim, hidden_dim=50):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
        )
        # The final layer outputs two parameters: mean and log(std) of the Gaussian posterior for theta
        self.mean_layer = nn.Linear(hidden_dim, 1)
        self.log_std_layer = nn.Linear(hidden_dim, 1)
        
    def forward(self, x):
        hidden = self.net(x)
        mean = self.mean_layer(hidden)
        log_std = self.log_std_layer(hidden)
        # For stability, clamp the log_std to not get too small or large
        log_std = torch.clamp(log_std, min=-5, max=2)
        return mean, log_std

# The input is the summary statistic of the data: here we just use the empirical mean.
# Calculating a summary statistic is crucial for high-dimensional data.
input_dim = 1 # We input the mean of the 100 observations
model = PosteriorEstimator(input_dim=input_dim)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

# --- 4. Train the Model ---
# Precompute the summary statistic (mean) for all training data
summary_stat_train = x_train.mean(dim=1).unsqueeze(1) # Shape (n_simulations, 1)

n_epochs = 20001
loss_history = []

for epoch in range(n_epochs):
    optimizer.zero_grad()
    
    # Forward pass: get the Gaussian parameters for each data sample
    mean_pred, log_std_pred = model(summary_stat_train)
    std_pred = torch.exp(log_std_pred)
    
    # Define the predicted posterior distribution q(theta | x)
    pred_posterior = Normal(mean_pred, std_pred)
    
    # Calculate the loss: negative log probability of the true theta under the predicted posterior
    # We want to maximize the probability, so we minimize the negative log prob.
    loss = -pred_posterior.log_prob(theta_train.unsqueeze(1)).mean()
    
    loss.backward()
    optimizer.step()
    loss_history.append(loss.item())
    
    if epoch % 5000 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

# Plot the training loss
plt.plot(loss_history)
plt.xlabel('Epoch')
plt.ylabel('Loss (Negative Log Likelihood)')
plt.title('Training Loss')
plt.show()

# --- 5. Perform Amortized Inference on a New Observation ---
# Let's create a new observation x_o whose true mean we want to infer.
true_theta = 1.5
x_observed = simulator(torch.tensor(true_theta))

# Calculate the same summary statistic (mean) for the observed data
summary_stat_observed = x_observed.mean().unsqueeze(0).unsqueeze(1) # Shape (1, 1)

# Pass it through the trained network to get the posterior parameters
model.eval()
with torch.no_grad():
    post_mean, post_log_std = model(summary_stat_observed)
    post_std = torch.exp(post_log_std)

print(f"\nTrue theta: {true_theta}")
print(f"Estimated Posterior: Normal(mean={post_mean.item():.3f}, std={post_std.item():.3f})")

# --- 6. Analyze the Results ---
# For comparison, let's compute the *true* analytical posterior.
# We assume a Gaussian prior: p(theta) ~ Normal(0, sqrt(10)) which approximates our Uniform(-3,3) well.
# The true analytical posterior for a Gaussian prior N(μ_prior, σ_prior) and likelihood N(θ, σ_likelihood) is:
# Posterior ~ N(μ_post, σ_post), where:
# 1/σ_post^2 = 1/σ_prior^2 + n/σ_likelihood^2
# μ_post = (μ_prior/σ_prior^2 + n*x̄/σ_likelihood^2) * σ_post^2

n_obs = len(x_observed)
x̄ = x_observed.mean()
σ_likelihood = 1.0
μ_prior = 0.0
σ_prior = np.sqrt(10) # Approx for Uniform(-3,3)

σ_post = 1 / np.sqrt( (1/σ_prior**2) + (n_obs/σ_likelihood**2) )
μ_post = (μ_prior/σ_prior**2 + n_obs*x̄/σ_likelihood**2) * σ_post**2

print(f"Analytical Posterior: Normal(mean={μ_post:.3f}, std={σ_post:.3f})")

# --- 7. Visualize the Learned vs. True Posterior ---
theta_grid = torch.linspace(true_theta - 1.5, true_theta + 1.5, 1000)

# Get the learned posterior PDF
learned_dist = Normal(post_mean, post_std)
learned_pdf = torch.exp(learned_dist.log_prob(theta_grid.unsqueeze(1))).squeeze()

# Get the true analytical posterior PDF
true_post_dist = Normal(torch.tensor(μ_post), torch.tensor(σ_post))
true_pdf = torch.exp(true_post_dist.log_prob(theta_grid))

plt.figure(figsize=(10, 6))
plt.plot(theta_grid, true_pdf, label='True Analytical Posterior', lw=3, alpha=0.8)
plt.plot(theta_grid, learned_pdf, '--', label='NPE Estimated Posterior', lw=3)
plt.axvline(true_theta, color='k', linestyle=':', label=f'True θ ({true_theta})')
plt.xlabel('θ')
plt.ylabel('Probability Density')
plt.title('Neural Posterior Estimation vs. Analytical Solution')
plt.legend()
plt.show()

Key Steps Explained:

    Simulator & Prior: Our simulator generates 100 data points from a Normal(θ, 1). Our prior for θ is Uniform(-3, 3).

    Generate Data: We run the simulator 10,000 times, each time with a different θ sampled from the prior. This gives us our training set (θ_i, x_i).

    Network Architecture: We use a simple Mixture Density Network. It takes a summary statistic of the data x (in this case, just the empirical mean) as input. Its output defines a Gaussian distribution q(θ | x) = Normal(mean_net, std_net).

    Training: The network is trained to maximize the log-probability of the true parameters θ_i under the distribution it predicts q(θ | x_i). This is the core of NPE.

    Inference: After training, we generate a new observation x_o from a θ of 1.5. We feed x_o into the trained network, which instantly returns the parameters of the posterior distribution q(θ | x_o).

    Validation: We compare the network's posterior to the known analytical solution. Since we are using a conjugate model (Gaussian likelihood + Gaussian prior), we can calculate the true posterior exactly. The network's output should be very close to this true posterior.

Expected Output:

You should see that the estimated posterior (dashed line) almost perfectly overlaps the true analytical posterior (solid line), demonstrating that NPE successfully learned to perform Bayesian inference.
text

True theta: 1.5
Estimated Posterior: Normal(mean=1.482, std=0.100)
Analytical Posterior: Normal(mean=1.482, std=0.100)

This is a simplified example, but it captures the essence of NPE. For more complex models, you would use more powerful networks (like normalizing flows) and more sophisticated summary statistics.


SyntaxError: invalid syntax (3522679460.py, line 140)