# Gaussian process examples - different kernels

The behavior of a Gaussian process is determined by its mean and covariance functions. 

The covariance function is usually specified as a 'kernel', which is a formula for the covariance between two values.

In this exercise we will examine the behavior of a number of different choices for the kernel.

You can find a good summary here:

http://www.cs.toronto.edu/~duvenaud/cookbook/index.html

In [None]:
import numpy as np

from matplotlib import pyplot as plt

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import (RBF, Matern, RationalQuadratic,
                                              ExpSineSquared, DotProduct,
                                              ConstantKernel)
%matplotlib inline

## Function to plot priors

We'll execute this code a few times, so it's a good idea to make a function because:

1) It saves copying the code over

2) It ensures the same code is run each time

3) It makes it easier to make changes if there is only one copy

In [None]:
def prior_plt(fig_index, kernel):
    # Specify Gaussian Process
    gp = GaussianProcessRegressor(kernel=kernel)

    # Plot prior
    plt.figure(fig_index, figsize=(8, 8))
#    plt.subplot(2, 1, 1)
    X_ = np.linspace(0, 5, 100)
    y_mean, y_std = gp.predict(X_[:, np.newaxis], return_std=True)
    plt.plot(X_, y_mean, 'k', lw=3, zorder=9)
    plt.fill_between(X_, y_mean - y_std, y_mean + y_std,
                     alpha=0.5, color='k')
    y_samples = gp.sample_y(X_[:, np.newaxis], 10)
    plt.plot(X_, y_samples, lw=1)
    plt.xlim(0, 5)
    plt.ylim(-3, 3)
    plt.title("Prior (kernel:  %s)" % kernel, fontsize=12)

## Plot posterior 

After model and data are combined

In [None]:
def post_plt(fig_index, kernel):
    # Generate data and fit GP
    rng = np.random.RandomState(4)
    X = rng.uniform(0, 5, 10)[:, np.newaxis]
    y = np.sin((X[:, 0] - 2.5) ** 2)
    gp.fit(X, y)

    # Plot posterior
#    plt.subplot(2, 1, 2)
    X_ = np.linspace(0, 5, 100)
    y_mean, y_std = gp.predict(X_[:, np.newaxis], return_std=True)
    plt.plot(X_, y_mean, 'k', lw=3, zorder=9)
    plt.fill_between(X_, y_mean - y_std, y_mean + y_std,
                     alpha=0.5, color='k')

    y_samples = gp.sample_y(X_[:, np.newaxis], 10)
    plt.plot(X_, y_samples, lw=1)
    plt.scatter(X[:, 0], y, c='r', s=50, zorder=10)
    plt.xlim(0, 5)
    plt.ylim(-3, 3)
    plt.title("Posterior (kernel: %s)\n Log-Likelihood: %.3f"
              % (gp.kernel_, gp.log_marginal_likelihood(gp.kernel_.theta)),
              fontsize=12)
#    plt.tight_layout()

## Radial basis function kernel

Also known as the squared exponential kernel.

If l is a scale factor,

In [None]:
%%latex
\[
  k(x_i,x_j) = \sigma^2\exp\left(-\frac{(x_i-x_j)^2}{2l^2}\right)  
\]

## Define a kernel object

In [None]:
kernel1 = 1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0))
print(type(kernel1))

## Create a Gaussian Process Regressor instance using the RBF kernel

In [None]:
    gp = GaussianProcessRegressor(kernel=kernel1)
    print(type(gp))

## Plot the prior distribution

This is a random collection of smooth functions

In [None]:
prior_plt(1,kernel1)

## Plot the posterior distribution

After combining the prior and the likelihood for this kernel.

The result is a smooth function fitted to the data, with confidence intervals

In [None]:
post_plt(2,kernel1)

## Rational quadratic kernel

This is equivalent to adding together a number of RBF kernels

In [None]:
%%latex
\[
  k(x_i,x_j) = \sigma^2\left(1+\frac{(x_i-x_j)^2}{2\alpha l^2}\right)^\alpha  
\]

## Define a kernel object

In [None]:
kernel2 = 1.0 * RationalQuadratic(length_scale=1.0, alpha=0.1)
print(type(kernel2))

## Create a GaussianProcessRegressor object

In [None]:
    gp = GaussianProcessRegressor(kernel=kernel2)
    print(type(gp))

## Plot the prior

In [None]:
prior_plt(3,kernel2)

## Plot the posterior

In [None]:
post_plt(4,kernel2)

## Same as before?

My guess is that this looks just like the previous one because the 'sum' of one squared exponential process is just a squared exponential process.

## Periodic kernel (sine squared)

Good for modeling periodic functions

In [None]:
%%latex
\[
  k(x_i,x_j) = \sigma^2\exp\left(\frac{2\sin^2\left(\frac{\pi|x_i-x_j|}{p}\right)}{l^2}\right)  
\]

## Define a kernel object

In [None]:
kernel3 = 1.0 * ExpSineSquared(length_scale=1.0, periodicity=3.0,
                                length_scale_bounds=(0.1, 10.0),
                                periodicity_bounds=(1.0, 10.0))
print(type(kernel3))

## Create a GaussianProcessRegressor object

In [None]:
    gp = GaussianProcessRegressor(kernel=kernel3)

## Plot the prior

In [None]:
prior_plt(5,kernel3)

## Plot the posterior

In [None]:
post_plt(6,kernel3)

## Constant kernel

Can be used as a multiplier to scale magnitudes of another kernel, or to modify the mean of the Guassian process as part of a sum kernel.

In [None]:
%%latex
\[
  k(x_i,x_j) = C  
\]

## Define a kernel object

In [None]:
kernel4 = ConstantKernel(0.1, (0.01, 10.0))* (DotProduct(sigma_0=1.0, sigma_0_bounds=(0.0, 10.0)) ** 2)

## Create a GaussianProcessRegressor object

In [None]:
    gp = GaussianProcessRegressor(kernel=kernel4)

## Plot the prior

In [None]:
prior_plt(7,kernel4)

## Plot the posterior

In [None]:
post_plt(8,kernel4)

## Matern kernel

Good for modeling spatial data

If the distance between two points x_i and x_j is d,

In [None]:
%%latex
\[
  k(x_i,x_j) = \sigma^2\frac{2^{1-\nu}}{\Gamma(\nu)}\left(\sqrt{2\nu}\frac{d}{\rho}\right)^\nu K_{\nu}\left(\sqrt{2\nu}\frac{d}{\rho}\right) 
\]

## Define a kernel object

In [None]:
kernel5 = 1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-1, 10.0),nu=1.5)

## Create a GaussianProcessRegressor object

In [None]:
    gp = GaussianProcessRegressor(kernel=kernel5)

## Plot the prior

In [None]:
prior_plt(9,kernel5)

## Plot the posterior

In [None]:
post_plt(10,kernel5)