# Gaussian Process Regression with BayeSpace Using Simulated Data

This notebook demonstrates the use of **Gaussian Process Regression (GPR)** with BayeSpace on a series of simulated datasets, increasing in complexity. The goal is to showcase how BayeSpace can model complex functions and spatial patterns using GPR — a flexible, non-parametric method that infers function values and uncertainty without assuming a specific model form.

We will walk through the following examples:

- A simple **line** (1D regression)
- A **curve** (non-linear polynomial)
- A **plane** (2D regression)
- A **non-linear** 2D function

For each example, we generate noisy data, define a kernel structure, and train a Gaussian Process model using BayeSpace. We then visualise the predicted function values and uncertainties over the domain.

This notebook serves as both a tutorial and a showcase of BayeSpace’s GPR functionality across increasingly complex regression problems — from smooth trends to sharp non-linearities.


### Install Libraries

In [None]:
import os
import jax

os.chdir('/PhD_project/app/')

from regression_toolbox.model import Model, add_model, delete_model

from visualisation_toolbox.domain import Domain
from visualisation_toolbox.visualiser import GPVisualiser

from data_processing.sim_data_processor import SimDataProcessor

from gaussian_process_toolbox.kernel import Kernel
from gaussian_process_toolbox.gaussian_processor import GP
from gaussian_process_toolbox.transformation import Transformation

os.chdir('/PhD_project/')
jax.config.update("jax_enable_x64", True)


## Example 1: Gaussian Process Regression on a Line

In this first example, we use BayeSpace to perform **Gaussian Process Regression (GPR)** on simulated data generated from a simple line:  
$$
f(x) = ax + b
$$

The true values used to generate the data are $ a = 1 $, $ b = 1 $, with Gaussian noise of standard deviation 1 added to simulate measurement uncertainty.

Unlike Bayesian Regression, which infers explicit parameter values for $ a $, $ b $, and $ \sigma $, GPR directly models the function $ f(x) $ as a distribution over possible functions, conditioned on the observed data. We use a **Matérn kernel** defined over the 1D input space $ x $, with length scale and smoothness hyperparameters optimised during training.

We visualise the GP’s predictive mean and uncertainty across the domain, demonstrating how GPR captures both the trend and confidence of the inferred function — even for a simple linear case.

This example serves as a baseline for understanding BayeSpace’s GPR capability on well-behaved, 1D data.


In [2]:
# Add this line if model doesn't exist yet
# add_model('line', 'a*x + b', ['x'], 'y', ['a', 'b'])

# Define the true model for simulation: a line with a = 1, b = 1
sim_model = Model('line').add_fixed_model_param('a', 1).add_fixed_model_param('b', 1)

# Define the input domain for the simulation: 50 points from 0 to 100
sim_domain = Domain(1, 'linear').add_domain_param('min', 0).add_domain_param('max', 100).add_domain_param('n_points', 50)
sim_domain.build_domain()

# Generate noisy data using the model and domain, with Gaussian noise
sim_data_processor = SimDataProcessor('linear_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

kernel_config = {('matern', 'x'): [0]} 

# Instantiate the kernel with hyperparameters
kernel_obj = Kernel(kernel_config)
kernel_obj.add_kernel_param('matern', 'x', 'length_scale', 1)           # Initial guess
kernel_obj.add_kernel_param('matern', 'x', 'nu', 2.5)                              # Smoothness
kernel_obj.add_kernel_param('matern', 'x', 'length_scale_bounds', (0.001, 100))     # Bounds for optimisation

# Identity transformation — data remains in log-space
transformation = Transformation('identity')

# Initialise the GP model using the real-world data processor
gp = GP(sim_data_processor, kernel_obj, transformation=transformation, uncertainty_method='constant', uncertainty_params={'constant_error':1})

# Train the GP — fit hyperparameters and compute posterior
gp_model = gp.train()

# Visualise traceplots and autocorrelations for diagnostics
visualiser = GPVisualiser(gp)

# Visualise predicted line with posterior uncertainty
vis_domain = Domain(1, 'linear').add_domain_param('min', 0).add_domain_param('max', 100).add_domain_param('n_points', 100)
vis_domain.build_domain()
visualiser.show_predictions(sim_domain, 'predictions', '1D')

Data loaded from /PhD_project/data/processed_sim_data/linear_example
Plot saved at: /PhD_project/data/processed_sim_data/linear_example
Loading existing GP model from /PhD_project/results/gaussian_process_results/linear_example/instance_1/gaussian_process_model.pkl


## Example 2: Gaussian Process Regression on a Polynomial Curve

In this example, we use BayeSpace’s Gaussian Process Regression (GPR) capabilities to model the same second-degree polynomial relationship:

$$
f(x) = ax^2 + bx + c
$$

Instead of explicitly parameterising the polynomial, we treat the function as an unknown process and place a **Matern kernel** over the input space to infer its structure non-parametrically. The simulated dataset remains the same as in the Bayesian regression case, with true values $ a = 1.8 $, $ b = 2.8 $, and $ c = 1.4 $, and Gaussian noise with standard deviation $ \sigma = 1 $.

The Matern kernel used has a smoothness parameter $ \nu = 2.5 $, allowing for flexible yet relatively smooth functions. Hyperparameters such as the length scale are inferred during training, while a constant observational error of 1 is assumed.

This example highlights the flexibility of GPR to model complex functional forms without requiring an explicit equation, making it especially useful when the underlying structure is unknown or difficult to express analytically.


In [3]:
# Add this line if the model doesn't exist yet
# add_model('polynomial', 'a*x**2 + b*x + c', ['x'], 'y', ['a', 'b', 'c'])

# Step 1: Define the true model and generate synthetic data
sim_model = Model('polynomial').add_fixed_model_param('a', 1.8).add_fixed_model_param('b', 2.8).add_fixed_model_param('c', 1.4)

sim_domain = Domain(1, 'linear').add_domain_param('min', -3).add_domain_param('max', 3).add_domain_param('n_points', 100)
sim_domain.build_domain()

sim_data_processor = SimDataProcessor('polynomial_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

# Step 2: Define a Matern kernel for GPR
kernel_config = {('matern', 'x'): [0]}
kernel_obj = Kernel(kernel_config)
kernel_obj.add_kernel_param('matern', 'x', 'length_scale', 1)
kernel_obj.add_kernel_param('matern', 'x', 'nu', 2.5)
kernel_obj.add_kernel_param('matern', 'x', 'length_scale_bounds', (0.001, 100))

# Step 3: Apply identity transformation
transformation = Transformation('identity')

# Step 4: Fit the GP model
gp = GP(sim_data_processor, kernel_obj, transformation=transformation, uncertainty_method='constant', uncertainty_params={'constant_error': 1})
gp_model = gp.train()

# Step 5: Visualise predictions
visualiser = GPVisualiser(gp)

vis_domain = Domain(1, 'linear').add_domain_param('min', -3).add_domain_param('max', 3).add_domain_param('n_points', 100)
vis_domain.build_domain()

visualiser.show_predictions(vis_domain, 'predictions', '1D')


Data loaded from /PhD_project/data/processed_sim_data/polynomial_example
Plot saved at: /PhD_project/data/processed_sim_data/polynomial_example
Loading existing GP model from /PhD_project/results/gaussian_process_results/polynomial_example/instance_1/gaussian_process_model.pkl


## Example 3: Gaussian Process Regression on a Plane

In this example, we use BayeSpace to apply Gaussian Process Regression (GPR) on data simulated from a plane:

$$
f(x, y) = ax + by
$$

The data are generated on a 2D grid from $ -3 $ to $ 3 $ in both $ x $ and $ y $, using true parameter values $ a = 1 $ and $ b = 2 $, with added Gaussian noise of standard deviation $ \sigma = 1 $.

Instead of explicitly parameterizing $ a $ and $ b $ as in Bayesian regression, we model the surface as a Gaussian Process with a **Matern kernel** in both $ x $ and $ y $. This kernel accounts for spatial structure and smoothness in the function, allowing flexible, non-parametric modelling of the plane surface.

After fitting the GP, we visualise the predictive surface with confidence intervals to assess the model’s performance. This example demonstrates BayeSpace’s ability to generalise beyond parametric forms and effectively model multivariate input domains with spatial correlation.


In [4]:
# Add this line if model doesn't exist yet
# add_model('plane', 'a*x + b*y', ['x', 'y'], 'C', ['a', 'b'])

# Step 1: Define the true model and generate synthetic data
sim_model = Model('plane').add_fixed_model_param('a', 1).add_fixed_model_param('b', 2)

sim_domain = Domain(2, 'rectangular').add_domain_param('min_x', -3)\
                                     .add_domain_param('max_x', 3)\
                                     .add_domain_param('n_points_x', 20)\
                                     .add_domain_param('min_y', -3)\
                                     .add_domain_param('max_y', 3)\
                                     .add_domain_param('n_points_y', 20)
sim_domain.build_domain()

sim_data_processor = SimDataProcessor('plane_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

# Step 2: Define a Matern kernel in x and y
kernel_config = {('matern', 'xy'): [0, 1]}
kernel_obj = Kernel(kernel_config)
kernel_obj.add_kernel_param('matern', 'xy', 'length_scale', [1,1])
kernel_obj.add_kernel_param('matern', 'xy', 'nu', 2.5)
kernel_obj.add_kernel_param('matern', 'xy', 'length_scale_bounds', (0.001, 100))

# Step 3: Identity transformation for direct modelling
transformation = Transformation('identity')

# Step 4: Train the GP model
gp = GP(sim_data_processor, kernel_obj, transformation=transformation, uncertainty_method='constant', uncertainty_params={'constant_error': 1})
gp_model = gp.train()

# Step 5: Visualise the predictions
visualiser = GPVisualiser(gp)

vis_domain = Domain(2, 'rectangular').add_domain_param('min_x', -3)\
                                     .add_domain_param('max_x', 3)\
                                     .add_domain_param('n_points_x', 100)\
                                     .add_domain_param('min_y', -3)\
                                     .add_domain_param('max_y', 3)\
                                     .add_domain_param('n_points_y', 100)
vis_domain.build_domain()

visualiser.show_predictions(vis_domain, 'predictions', '2D')


Data loaded from /PhD_project/data/processed_sim_data/plane_example
Plot saved at: /PhD_project/data/processed_sim_data/plane_example
Loading existing GP model from /PhD_project/results/gaussian_process_results/plane_example/instance_1/gaussian_process_model.pkl


## Example 4: Gaussian Process Regression on a Non-Linear 2D Function

In this final example, we use BayeSpace to perform Gaussian Process Regression (GPR) on a complex, non-linear function:

$$
f(x, y) = \frac{\sin(x)}{y + a} + \frac{1}{b + x^2}
$$

The function introduces significant non-linearity and potential instability due to the division by $ y + a $. The true parameters used to simulate the data are $ a = 2 $ and $ b = 3 $, with Gaussian noise of standard deviation 1. We generate data on a grid over $ x, y \in [0, 10] $ using 40 points in each direction.

To model this surface, we use a **Matern kernel** for each input dimension, which provides flexibility and smoothness while remaining robust to sharp changes in curvature. GPR is especially suited to this kind of problem, where the function is non-linear and potentially sensitive to small changes in input.

After training, we visualise the GP’s prediction surface along with uncertainty, highlighting BayeSpace’s capability to model noisy, sensitive systems using non-parametric methods.


In [5]:
# Add this line if model doesn't exist yet
# add_model('nonlinear_2D', 'sin(x)/(y+a) + 1/(b+x^2)', ['x', 'y'], 'C', ['a', 'b'])

# Step 1: Define the true model
sim_model = Model('nonlinear_2D').add_fixed_model_param('a', 2).add_fixed_model_param('b', 3)

# Step 2: Define 2D rectangular domain
sim_domain = Domain(2, 'rectangular')\
    .add_domain_param('min_x', 0)\
    .add_domain_param('max_x', 10)\
    .add_domain_param('min_y', 0)\
    .add_domain_param('max_y', 10)\
    .add_domain_param('n_points_x', 40)\
    .add_domain_param('n_points_y', 40)
sim_domain.build_domain()

# Step 3: Generate noisy data
sim_data_processor = SimDataProcessor('nonlinear_example', sim_model, sim_domain, noise_dist='gaussian', noise_level=1)

# Step 4: Define Matern kernel in x and y
kernel_config = {('matern', 'xy'): [0, 1]}
kernel_obj = Kernel(kernel_config)
kernel_obj.add_kernel_param('matern', 'xy', 'length_scale', [1,1])
kernel_obj.add_kernel_param('matern', 'xy', 'nu', 2.5)
kernel_obj.add_kernel_param('matern', 'xy', 'length_scale_bounds', (0.001, 100))

# Step 5: Use identity transformation
transformation = Transformation('identity')

# Step 6: Train GP model
gp = GP(sim_data_processor, kernel_obj, transformation=transformation, uncertainty_method='constant', uncertainty_params={'constant_error': 1})
gp_model = gp.train()

# Step 7: Create high-res prediction domain for plotting
vis_domain = Domain(2, 'rectangular')\
    .add_domain_param('min_x', 0)\
    .add_domain_param('max_x', 10)\
    .add_domain_param('min_y', 0)\
    .add_domain_param('max_y', 10)\
    .add_domain_param('n_points_x', 100)\
    .add_domain_param('n_points_y', 100)
vis_domain.build_domain()

# Step 8: Visualise predicted surface with uncertainty
visualiser = GPVisualiser(gp)
visualiser.show_predictions(vis_domain, 'predictions', '2D')


Data generated and saved to /PhD_project/data/processed_sim_data/nonlinear_example
Plot saved at: /PhD_project/data/processed_sim_data/nonlinear_example
Fitted new GP model and saving to /PhD_project/results/gaussian_process_results/nonlinear_example/instance_1/gaussian_process_model.pkl
