# Gaussian Process Regression

This notebook comprises a simple demonstration of Gaussian processes regression (GPR) using
*scikit-learn*, a popular ML & data science toolbox. GPR is a supervised learning technique that
relies on a Bayesian Ansatz. In a nutshell, GPR aims to find a (somewhat optimal) Gaussian process
representation, conditioned on (noisy) input-output data we have for that function. GPR is 
non-parametric and quite flexible, as it can be based on a variety of kernel functions and 
hyperparameters. However, standard Gaussian processes are dense in the sense that every prediction
depends on all training points. Hence, they do not scale well in higher dimensions, although
extensions like sparse Gaussian processes can overcome this issue.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

%matplotlib widget
plt.close('all')
plt.style.use('bmh')

In this notebook, we want to showcase the regression problem for a 1D function given noisy data.
As a first step, we need to generate a ground truth and perturbed data points.

**Exercise:** Evaluate the function $f(x)=(3x-1.4) \sin (18x)$ on a 1D grid. Choose 10 random 
$x$-values and create data points for these locations by perturbing the exact solution with i.i.d
Gaussian increments,
$$
    y_i = f(x_i) + \eta,\quad \eta\sim\mathcal{N}(0,\sigma^2),\ \sigma=0.1,\quad
    \mathrm{for}\ i=1,2,\ldots,10.
$$

In [None]:
noise_std = 0.1
#x_grid = #???
#x_data = #???
#y_data = #???

Next, we use the training data to construct a Gaussian process regressor with `sklearn`. We choose
a radial basis function kernel with fixed length scale $l$,
$$
    k(x_1, x_2) = \exp\Bigl( \frac{(x_1-x_2)^2}{2 l^2} \Bigr).
$$

In [None]:
kernel_length_scale = 0.1
kernel = RBF(length_scale=kernel_length_scale, length_scale_bounds='fixed')
gaussian_process = GaussianProcessRegressor(kernel=kernel, alpha=noise_std**2)
gaussian_process.fit(x_data, y_data)
mean_prediction, std_prediction = gaussian_process.predict(x_grid, return_std=True)

**Exercise**: Evaluate and plot the posterior mean and 95% confidence interval of the GPR compared
to the exact solution for different values of the kernel length scale $l$. What do you observe and
why? What could be a good strategy to "learn" $l$ from the training data?