# Gaussian processes

A **Gaussian process** (GP) is a continuously indexed collection of random variables, such that every finite subset thereof follows a **multivariate normal** (MVN) **distribution**. Let $\boldsymbol{x} \in \mathbb{R}^d$ represent a $d$-dimensional continuous index, e.g. a spatial coordinate. A GP $\{f(\boldsymbol{x}) | \boldsymbol{x} \in \mathbb{R}^d\}$ is then defined by a **mean function** $m(\boldsymbol{x})$ and a **covariance function** $k(\boldsymbol{x}, \boldsymbol{x}^\prime)$. One usually writes
$$
f(\boldsymbol{x}) \sim
\mathcal{GP} \left( m(\boldsymbol{x}), k(\boldsymbol{x}, \boldsymbol{x}^\prime) \right).
$$
For every finite collection of indices $\{\boldsymbol{x}_i\}_{i=1}^n$ one has that $\boldsymbol{f} = (f(\boldsymbol{x}_1), \ldots, f(\boldsymbol{x}_n))^\top \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ is jointly normal. Here, the elements of the mean vector and the covariance matrix are respectively given as $\mu_i = m(\boldsymbol{x}_i)$ and $\Sigma_{ij} = k(\boldsymbol{x}_i, \boldsymbol{x}_j)$ for $i,j=1,\ldots,n$.

In a certain sense, a GP is a generalization of a MVN distribution to infinitely many dimensions. It can be viewed as a distribution over functions. This allows for quantifying the uncertainty of an unknown function in a Bayesian setting.

The properties of a GP heavily depend on the chosen covariance kernel. Some of the most common kernel families are the **squared exponential** (radial basis function), **absolute exponential** (Ornstein-Uhlenbeck) and the **Matérn kernel**. The former two kernels can be written as
$$
\begin{align*}
k_{\mathrm{RBF}}(\boldsymbol{x}, \boldsymbol{x}^\prime) &=
\sigma^2 \exp \left( - \frac{\lVert \boldsymbol{x} - \boldsymbol{x}^\prime \rVert_2^2}{2 \ell^2} \right), \\
k_{\mathrm{OU}}(\boldsymbol{x}, \boldsymbol{x}^\prime) &=
\sigma^2 \exp \left( - \frac{\lVert \boldsymbol{x} - \boldsymbol{x}^\prime \rVert_2}{\ell} \right).
\end{align*}
$$
They are isotropic in the sense that they are a function of the distance $\lVert \boldsymbol{x} - \boldsymbol{x}^\prime \rVert_2$ only. Moreover, they contain a variance $\sigma^2 > 0$ and a lengthscale parameter $\ell > 0$. The influence on the GP is hereafter shortly demonstrated by simulating sample paths from GPs with different covariance functions.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
sys.path.append('..')

In [None]:
import matplotlib.pyplot as plt
import torch
import torch.distributions as dist
import gpytorch

from utils.kernels import (
    SquaredExponential,
    AbsoluteExponential
)

In [None]:
torch.set_default_dtype(torch.float64)

## Setup

In [None]:
lower = -10
upper = 10
num_coords = 101

coords = torch.linspace(lower, upper, num_coords)

In [None]:
sigma = 1 # standard deviation
length = 1 # lengthscale parameter

kernel_rbf = SquaredExponential(sigma, length)
kernel_ou = AbsoluteExponential(sigma, length)

In [None]:
nu = 2.5 # smoothness parameter

kernel_matern = gpytorch.kernels.ScaleKernel(
    gpytorch.kernels.MaternKernel(nu=nu)
)

kernel_matern.outputscale = sigma
kernel_matern.base_kernel.lengthscale = length

for p in kernel_matern.parameters():
    p.requires_grad = False

## Kernel functions

In [None]:
x_values = torch.linspace(coords.min(), coords.max(), 1001)

y_values_rbf = kernel_rbf.kernel(x_values)
y_values_ou = kernel_ou.kernel(x_values)
y_values_matern = kernel_matern(x_values, x_values).evaluate()[x_values.abs().argmin()]

In [None]:
kernel_dict = {
    'RBF': y_values_rbf,
    'OU': y_values_ou,
    'Matern': y_values_matern
}

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(9, 2.5))
for ax, (key, y_values) in zip(axes.ravel(), kernel_dict.items()):
    ax.plot(x_values.numpy(), y_values.numpy(), alpha=0.7, clip_on=False)
    ax.set(xlabel='distance', ylabel='value')
    ax.set_xlim((x_values.min(), x_values.max()))
    ax.set_ylim((0, y_values.max()))
    ax.grid(visible=True, which='both', color='lightgray', linestyle='-')
    ax.set_axisbelow(True)
    ax.set_title(key)
fig.tight_layout()

## Covariance matrices

In [None]:
cov_rbf = kernel_rbf(coords, coords)
cov_ou = kernel_ou(coords, coords)
cov_matern = kernel_matern(coords, coords).evaluate()

In [None]:
cov_dict = {
    'RBF': cov_rbf,
    'OU': cov_ou,
    'Matern': cov_matern
}

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(9, 2.5))
imgs = []
for ax, (key, cov) in zip(axes.ravel(), cov_dict.items()):
    img = ax.matshow(cov.numpy(), cmap='viridis', vmin=0)
    ax.set_aspect('equal', adjustable='box')
    ax.set_title(key)
    imgs.append(img)
for ax, img in zip(axes, imgs):
    fig.colorbar(img, ax=ax)
fig.tight_layout()

## Sampling

In [None]:
num_samples = 10

samples_rbf = dist.MultivariateNormal(
    loc=torch.zeros_like(coords),
    covariance_matrix=cov_rbf + 1e-07*torch.eye(num_coords) # add nugget
).sample((num_samples,))

samples_ou = dist.MultivariateNormal(
    loc=torch.zeros_like(coords),
    covariance_matrix=cov_ou + 1e-07*torch.eye(num_coords) # add nugget
).sample((num_samples,))

samples_matern = dist.MultivariateNormal(
    loc=torch.zeros_like(coords),
    covariance_matrix=cov_matern
).sample((num_samples,))

In [None]:
samples_dict = {
    'RBF': samples_rbf,
    'OU': samples_ou,
    'Matern': samples_matern
}

fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(6, 12))
for ax, (key, samples) in zip(axes.ravel(), samples_dict.items()):
    ax.plot(coords.numpy(), samples.T.numpy(), alpha=0.7)
    ax.set(xlabel='coordinate', ylabel='value')
    ax.set_xlim((coords.min(), coords.max()))
    ax.grid(visible=True, which='both', color='lightgray', linestyle='-')
    ax.set_axisbelow(True)
    ax.set_title(key)
fig.tight_layout()