# How to run simulations in parallel

Many simulators are computationally expensive. When performing Simulation-Based Inference, we often need thousands of simulations. If your simulator takes even a fraction of a second, running them sequentially can be prohibitively slow.

**Vectorization (or batching)** is one strategy to improve speed, where the simulator processes multiple parameters at once in a single call. However, this is not always straightforward to implement (e.g., rewriting legacy code) and can lead to high memory consumption if the batches are too large.

**Parallelization** is an alternative that allows us to run multiple simulations at the same time in separate processes, reducing the total wall-clock time without requiring the simulator to be vectorized.

This guide illustrates how to parallelize simulations using `sbi`'s `simulate_from_theta` utility. We will use a toy simulator that returns a file pathâ€”a common scenario where the simulator is an external executable that writes results to disk, making standard in-memory batching impossible or impractical.

In [None]:
import time
from typing import Any

import numpy as np


def simulator(theta: Any) -> str:
    """
    A toy simulator that sleeps and returns a dummy file path.
    """
    time.sleep(0.1)  # Simulate expensive computation

    # Format the parameter into a string suitable for a filename
    # We use 4 decimal places and replace the dot with a hyphen
    # e.g., 0.1234 -> 0-1234
    theta_id = f"{theta[0]:.4f}".replace(".", "_")

    return f"/path/to/simulation/output_{theta_id}.npy"

thetas = np.random.uniform(0, 1, size=(100, 1)) # 100 parameter sets

## Naive for loop
First, let's establish a baseline by running the simulations sequentially in a simple for loop. This represents the wall-clock time it takes without any parallelization optimization.

In [25]:
start = time.perf_counter()
x = [simulator(np.array(theta)) for theta in thetas]
end = time.perf_counter()
print(f"Elapsed time: {end - start} seconds")
print(x[:10])

Elapsed time: 10.022978331000104 seconds
['/path/to/simulation/output_0-5192.npy', '/path/to/simulation/output_0-0983.npy', '/path/to/simulation/output_0-6823.npy', '/path/to/simulation/output_0-9948.npy', '/path/to/simulation/output_0-3902.npy', '/path/to/simulation/output_0-4715.npy', '/path/to/simulation/output_0-0382.npy', '/path/to/simulation/output_0-9635.npy', '/path/to/simulation/output_0-8807.npy', '/path/to/simulation/output_0-2155.npy']


## Parallel execution

`sbi` provides the `simulate_from_theta` utility to easily parallelize simulations. It uses `joblib` under the hood.

We can use `joblib.parallel_config` context manager to specify the number of workers (`n_jobs`).

In [None]:
import joblib

from sbi.utils.simulation_utils import simulate_from_theta

start = time.perf_counter()
# Run simulations in parallel with 10 jobs
with joblib.parallel_config(n_jobs=10):
    thetas, x = simulate_from_theta(simulator, thetas)
end = time.perf_counter()
print(f"Elapsed time: {end - start} seconds")
print(x[:10])

  0%|          | 0/100 [00:00<?, ?it/s]

Elapsed time: 4.3353425840000455 seconds
['/path/to/simulation/output_0-5192.npy', '/path/to/simulation/output_0-0983.npy', '/path/to/simulation/output_0-6823.npy', '/path/to/simulation/output_0-9948.npy', '/path/to/simulation/output_0-3902.npy', '/path/to/simulation/output_0-4715.npy', '/path/to/simulation/output_0-0382.npy', '/path/to/simulation/output_0-9635.npy', '/path/to/simulation/output_0-8807.npy', '/path/to/simulation/output_0-2155.npy']


## Creating a vectorized simulator

While `simulate_from_theta` is convenient for generating a static dataset, you might sometimes need a **simulator object** that can handle batches of parameters (e.g., to pass into an inference algorithm that runs simulations on-the-fly).

The `parallelize_simulator` utility wraps your simulator and returns a new function that:
1. Accepts a batch of parameters.
2. Splits them into chunks.
3. Runs the chunks in parallel.
4. Re-assembles the results into a batch.

This effectively "vectorizes" your simulator without rewriting its internal logic.

In [None]:
from sbi.utils.simulation_utils import parallelize_simulator

# Create a new simulator function that handles batches automatically
batched_simulator = parallelize_simulator(simulator)

start = time.perf_counter()
# We can now call this new simulator with the full batch of parameters.
# The parallel execution is handled internally, governed by the joblib context.
with joblib.parallel_config(n_jobs=10):
    x = batched_simulator(thetas)

end = time.perf_counter()
print(f"Elapsed time: {end - start} seconds")
print(x[:10])

  return decorator(simulator)
  x = parallel_simulator(thetas)


  0%|          | 0/100 [00:00<?, ?it/s]

Elapsed time: 1.0284301520000554 seconds
['/path/to/simulation/output_0-5192.npy', '/path/to/simulation/output_0-0983.npy', '/path/to/simulation/output_0-6823.npy', '/path/to/simulation/output_0-9948.npy', '/path/to/simulation/output_0-3902.npy', '/path/to/simulation/output_0-4715.npy', '/path/to/simulation/output_0-0382.npy', '/path/to/simulation/output_0-9635.npy', '/path/to/simulation/output_0-8807.npy', '/path/to/simulation/output_0-2155.npy']
