# Bayesian Regression with BayeSpace Using simulated Data

This notebook demonstrates the use of **Bayesian regression** with BayeSpace on a series of simulated datasets, increasing in complexity. The goal is to showcase how BayeSpace can infer model parameters under uncertainty, across a range of functional forms.

We will walk through the following examples:

- A simple **line** (1D linear regression)
- A **curve** (non-linear polynomial regression)
- A **plane** (2D linear regression)
- A **non-linear** 2D function

For each example, we generate noisy data, define a generative model with priors, and perform Bayesian inference using BayeSpace to recover the posterior distributions of the parameters.

This notebook serves as both a tutorial and a stress test of BayeSpace across different types of regression problems — from well-behaved to more challenging, real-world-inspired models.


### Import Libraries

In [3]:
from regression_toolbox.model import Model, add_model, delete_model
from regression_toolbox.likelihood import Likelihood
from regression_toolbox.parameter import Parameter
from regression_toolbox.sampler import Sampler

from visualisation_toolbox.domain import Domain
from visualisation_toolbox.visualiser import RegressionVisualiser

from data_processing.sim_data_processor import SimDataProcessor
from data_processing.raw_data_processor import RawDataProcessor

import numpy as np
import pandas as pd
import os
import jax

os.chdir('/PhD_project/')
jax.config.update("jax_enable_x64", True)


## Example 1: Bayesian Regression on a Line

In this first example, we use BayeSpace to perform Bayesian linear regression on simulated data generated from a simple line:  
$$
f(x) = ax + b
$$  
The true values used to generate the data are $ a = 1 $, $ b = 1 $, with Gaussian noise of standard deviation 1 added to simulate measurement uncertainty.

We define the priors for the parameters $ a $ and $ b $ as standard Gaussians, and use a uniform prior for the noise term $ \sigma $. BayeSpace then performs sampling to infer the posterior distributions for these parameters.

We visualize the results through traceplots, autocorrelations, prior and posterior distributions, and model predictions over the domain.

This example serves as a baseline to demonstrate BayeSpace’s performance on a well-behaved, linear, and identifiable model.


In [4]:
# Add this line if model doesn't exist yet
# add_model('line', 'a*x + b', ['x'], 'y', ['a', 'b'])

# Define the true model for simulation: a line with a = 1, b = 1
sim_model = Model('line').add_fixed_model_param('a', 1).add_fixed_model_param('b', 1)

# Define the input domain for the simulation: 50 points from 0 to 100
sim_domain = Domain(1, 'linear').add_domain_param('min', 0).add_domain_param('max', 100).add_domain_param('n_points', 50)
sim_domain.build_domain()

# Generate noisy data using the model and domain, with Gaussian noise
sim_data_processor = SimDataProcessor('linear_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

# Set up the inference model structure (same form: linear)
model = Model('line')
likelihood = Likelihood('gaussian')

# Define inference parameters with priors
a = Parameter(name='a', prior_select='gaussian').add_prior_param('mu', 1).add_prior_param('sigma', 1)
b = Parameter(name='b', prior_select='gaussian').add_prior_param('mu', 1).add_prior_param('sigma', 1)
sigma = Parameter(name='sigma', prior_select='uniform').add_prior_param('low', 0.0001).add_prior_param('high', 5)

# Package parameters and set up the sampler
inference_params = pd.Series({'a': a, 'b': b, 'sigma': sigma})
sampler = Sampler(inference_params, model, likelihood, sim_data_processor, n_samples=10000, n_chains=3)

# Run Bayesian inference
sampler.sample_all()

# Visualise traceplots and autocorrelations for diagnostics
visualiser = RegressionVisualiser(sampler)
visualiser.get_traceplots()
visualiser.get_autocorrelations()

# Plot prior and posterior distributions for each parameter
visualiser.plot_prior('a', [-2, 2])
visualiser.plot_prior('b', [-2, 2])
visualiser.plot_prior('sigma', [0, 5])

visualiser.plot_posterior('a', [-2, 2])
visualiser.plot_posterior('b', [-2, 2])
visualiser.plot_posterior('sigma', [0, 5])

# Visualise predicted line with posterior uncertainty
vis_domain = Domain(1, 'linear').add_domain_param('min', 0).add_domain_param('max', 100).add_domain_param('n_points', 100)
vis_domain.build_domain()
visualiser.show_predictions(sim_domain, 'predictions', '1D')


Data loaded from /PhD_project/data/processed_sim_data/linear_example
Plot saved at: /PhD_project/data/processed_sim_data/linear_example
Samples loaded from /PhD_project/results/regression_results/linear_example/instance_1


## Example 2: Bayesian Regression on a Polynomial Curve

In this example, we apply BayeSpace to perform Bayesian regression on simulated data generated from a second-degree polynomial of the form:

$$
f(x) = ax^2 + bx + c
$$

The true values used to generate the data are $ a = 1.8 $, $ b = 2.8 $, and $ c = 1.4 $, with Gaussian noise of standard deviation $ \sigma = 1 $. This setup introduces moderate non-linearity while remaining well-behaved enough for effective inference.

We assign a **joint log-normal prior** to the parameters $ a $ and $ b $, reflecting a preference for positive values near their true peaks. Parameter $ c $ receives a standard Gaussian prior, and the error term $ \sigma $ is given a uniform prior between 0.0001 and 3.

After sampling the posterior with three chains of 10,000 samples each, we visualise the results via traceplots, autocorrelations, prior and posterior distributions, and a predictive regression curve over the domain.

This example extends the previous linear regression to a more flexible, non-linear model, and demonstrates BayeSpace’s capacity to infer multiple parameters simultaneously — including those governed by multivariate priors.


In [5]:
# Add this line if model doesn't exist yet
# add_model('polynomial', 'a*x**2 + b*x + c', ['x'], 'y', ['a', 'b', 'c'])

# Simulate data from a polynomial model: f(x) = ax^2 + bx + c
sim_model = Model('polynomial').add_fixed_model_param('a', 1.8).add_fixed_model_param('b', 2.8).add_fixed_model_param('c', 1.4)

# Define the input domain for the simulation: 100 points from -3 to 3
sim_domain = Domain(1, 'linear').add_domain_param('min', -3).add_domain_param('max', 3).add_domain_param('n_points', 100)
sim_domain.build_domain()

# Generate noisy data using the model and domain, with Gaussian noise
sim_data_processor = SimDataProcessor('polynomial_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

# Set up the inference model (same structure as used in simulation)
model = Model('polynomial')
likelihood = Likelihood('gaussian')

# Define inference parameters and priors
a_b = Parameter(name=['a','b'], prior_select='log_norm').add_prior_param('peak', [1.8, 2.8]).add_prior_param('scale', [[0.1, 0], [0, 0.1]])
c = Parameter(name='c', prior_select='gaussian').add_prior_param('mu', 1.4).add_prior_param('sigma', 0.1)
epsilon = Parameter(name='sigma', prior_select='uniform').add_prior_param('low', 0.0001).add_prior_param('high', 3)

# Bundle parameters and initialise sampler
inference_params = pd.Series({'a_and_b': a_b, 'c': c, 'sigma': epsilon})
sampler = Sampler(inference_params, model, likelihood, sim_data_processor, n_samples=10000, n_chains=3)

# Run Bayesian inference
sampler.sample_all()

# Visualise diagnostics and priors/posteriors
visualiser = RegressionVisualiser(sampler)
visualiser.get_traceplots()
visualiser.get_autocorrelations()

visualiser.plot_prior('a_and_b', [[0.001, 5], [0.001, 5]])
visualiser.plot_prior('c', [-3, 3])
visualiser.plot_prior('sigma', [0, 5])

visualiser.plot_posterior('a_and_b', [[0.001, 5], [0.001, 5]])
visualiser.plot_posterior('c', [-3, 3])
visualiser.plot_posterior('sigma', [0, 0.5])

# Visualise the regression curve and uncertainty
vis_domain = Domain(1, 'linear').add_domain_param('min', 0).add_domain_param('max', 100).add_domain_param('n_points', 100)
vis_domain.build_domain()
visualiser.show_predictions(sim_domain, 'predictions', '1D')


Data loaded from /PhD_project/data/processed_sim_data/polynomial_example
Plot saved at: /PhD_project/data/processed_sim_data/polynomial_example
Samples loaded from /PhD_project/results/regression_results/polynomial_example/instance_2


## Example 3: Bayesian Regression on a Plane

In this example, we extend Bayesian regression to two dimensions by modelling a plane of the form:

$$
f(x, y) = ax + by
$$

The data are simulated over a 2D rectangular grid with 20 points in each dimension, spanning from $-3$ to $3$ in both $x$ and $y$. The true parameter values used to generate the data are $ a = 1 $ and $ b = 2 $, and Gaussian noise with a standard deviation of \( \sigma = 1 \) is added to simulate observational uncertainty.

We assign Gaussian priors to both $ a $ and $ b $, centered on their true values with a standard deviation of 0.1, and a uniform prior for the noise term $ \sigma $. The model is then inferred using three chains of 10,000 samples each.

We evaluate convergence through traceplots and autocorrelations, examine the prior and posterior distributions, and visualise the inferred regression surface in two dimensions.

This example demonstrates BayeSpace’s ability to handle multivariate input domains and perform parameter inference for spatially extended models.


In [6]:
# Add this line if model doesn't exist yet
# add_model('plane', 'a*x + b*y', ['x', 'y'], 'C', ['a', 'b'])

# Set the true parameters for simulation
sim_model = Model('plane').add_fixed_model_param('a', 1).add_fixed_model_param('b', 2)

# Define the 2D input domain from -3 to 3 in both x and y, with 20 points each
sim_domain = Domain(2, 'rectangular').add_domain_param('min_x', -3)\
                                     .add_domain_param('max_x', 3)\
                                     .add_domain_param('n_points_x', 20)\
                                     .add_domain_param('min_y', -3)\
                                     .add_domain_param('max_y', 3)\
                                     .add_domain_param('n_points_y', 20)
sim_domain.build_domain()

# Generate synthetic data with Gaussian noise
sim_data_processor = SimDataProcessor('multi_dim_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

# Set up the same model structure for inference
model = Model('plane')
likelihood = Likelihood('gaussian')

# Define inference parameters and their priors
a = Parameter(name='a', prior_select='gaussian').add_prior_param('mu', 1).add_prior_param('sigma', 0.1)
b = Parameter(name='b', prior_select='gaussian').add_prior_param('mu', 2).add_prior_param('sigma', 0.1)
sigma = Parameter(name='sigma', prior_select='uniform').add_prior_param('low', 0.0001).add_prior_param('high', 1)

# Pack parameters and initialize the sampler
inference_params = pd.Series({'a': a, 'b': b, 'sigma': sigma})
sampler = Sampler(inference_params, model, likelihood, sim_data_processor, n_samples=10000, n_chains=3)

# Run Bayesian sampling
sampler.sample_all()

# Visual diagnostics and posterior analysis
visualiser = RegressionVisualiser(sampler)
visualiser.get_traceplots()
visualiser.get_autocorrelations()

visualiser.plot_prior('a', [-3, 3])
visualiser.plot_prior('b', [-3, 3])
visualiser.plot_prior('sigma', [0.001, 5])

visualiser.plot_posterior('a', [-3, 3])
visualiser.plot_posterior('b', [-3, 3])
visualiser.plot_posterior('sigma', [0.001, 5])

# Build a high-resolution domain for prediction plotting
vis_domain = Domain(2, 'rectangular').add_domain_param('min_x', -3)\
                                     .add_domain_param('max_x', 3)\
                                     .add_domain_param('n_points_x', 100)\
                                     .add_domain_param('min_y', -3)\
                                     .add_domain_param('max_y', 3)\
                                     .add_domain_param('n_points_y', 100)
vis_domain.build_domain()

# Plot the inferred plane
visualiser.show_predictions(sim_domain, 'predictions', '2D')


Data loaded from /PhD_project/data/processed_sim_data/multi_dim_example
Plot saved at: /PhD_project/data/processed_sim_data/multi_dim_example
Samples loaded from /PhD_project/results/regression_results/multi_dim_example/instance_1


## Example 4: Bayesian Regression on a Non-Linear 2D Function

In this final example, we apply BayeSpace to a highly non-linear two-dimensional function of the form:

$$
f(x, y) = \frac{\sin(x)}{y + a} + \frac{1}{b + x^2}
$$

The parameters used to generate the data are $a = 2$ and $b = 3$, with Gaussian noise added at a standard deviation of 1. The input domain spans $x, y \in [0, 10]$ with a resolution of 40 points in each direction.

We place Gaussian priors on both $a$ and $b$, each with mean 1 and standard deviation 1, and use a uniform prior for the noise term $\sigma$ over the interval $[0.0001, 10]$. Sampling is performed using three chains of 1,000 samples each.

This model introduces substantial non-linearity and potential numerical instability due to the presence of a singularity when $y + a \approx 0$. As such, it tests the robustness of the inference procedure. We use traceplots and autocorrelations to check convergence, and plot the prior and posterior distributions for each parameter, followed by a 2D visualisation of the predicted surface.

This example demonstrates the limits of BayeSpace when applied to highly sensitive and non-linear models.


In [None]:
# Add this line if model doesn't exist yet
# add_model('nonlinear_2D', 'sin(x)/(y+a) + 1/(b+x^2)', ['x', 'y'], 'C', ['a', 'b'])

# Set the true parameters for simulation
sim_model = Model('nonlinear_2D').add_fixed_model_param('a', 2).add_fixed_model_param('b', 3)

# Define a 2D rectangular input domain: x and y in [0, 10], with 40x40 grid points
sim_domain = Domain(2, 'rectangular')\
    .add_domain_param('min_x', 0)\
    .add_domain_param('max_x', 10)\
    .add_domain_param('min_y', 0)\
    .add_domain_param('max_y', 10)\
    .add_domain_param('n_points_x', 40)\
    .add_domain_param('n_points_y', 40)
sim_domain.build_domain()

# Generate synthetic data using the model and domain, with Gaussian noise (std = 1)
sim_data_processor = SimDataProcessor('non_linear_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

# Set up the model structure for inference
model = Model('nonlinear_2D')

# Define a Gaussian likelihood (alternatively, could use percentage error variant)
likelihood = Likelihood('gaussian')

# Define inference parameters with priors
# Gaussian priors for a and b (mean = 1, std = 1)
a = Parameter(name='a', prior_select='gaussian').add_prior_param('mu', 1).add_prior_param('sigma', 1)
b = Parameter(name='b', prior_select='gaussian').add_prior_param('mu', 1).add_prior_param('sigma', 1)
# Uniform prior for noise term sigma
sigma = Parameter(name='sigma', prior_select='uniform').add_prior_param('low', 0.0001).add_prior_param('high', 10)

# Bundle parameters into a Series
inference_params = pd.Series({'a': a, 'b': b, 'sigma': sigma})

# Create sampler with 3 chains of 1000 samples each
sampler = Sampler(inference_params, model, likelihood, sim_data_processor, n_samples=10000, n_chains=3)

# Run Bayesian inference
sampler.sample_all()

# Create visualiser and check traceplots and autocorrelations for convergence
visualiser = RegressionVisualiser(sampler)
visualiser.get_traceplots()
visualiser.get_autocorrelations()

# Plot prior and posterior distributions for each parameter
visualiser.plot_prior('a', [-2, 3])
visualiser.plot_posterior('a', [-2, 3])

visualiser.plot_prior('b', [-2, 3])
visualiser.plot_posterior('b', [-2, 3])

visualiser.plot_prior('sigma', [0.0001, 3])
visualiser.plot_posterior('sigma', [0.0001, 3])

# Create a high-resolution prediction domain (100x100 grid) for 2D surface visualisation
vis_domain = Domain(2, 'rectangular')\
    .add_domain_param('min_x', 0)\
    .add_domain_param('max_x', 10)\
    .add_domain_param('min_y', 0)\
    .add_domain_param('max_y', 10)\
    .add_domain_param('n_points_x', 100)\
    .add_domain_param('n_points_y', 100)
vis_domain.build_domain()

# Plot the predicted surface using posterior samples
visualiser.show_predictions(vis_domain, 'predictions', '2D')


Data loaded from /PhD_project/data/processed_sim_data/non_linear_example
Plot saved at: /PhD_project/data/processed_sim_data/non_linear_example
Samples loaded from /PhD_project/results/regression_results/non_linear_example/instance_5
