This repository contains code necessary to not only reproduce the experiments presented in Scalable Gaussian Process Regression with Derivatives @ NIPS 2018, but also for any wishing to use our methods.
Directory Structure: our root directory contains the
demos/ folders, which contain the research code written for
our paper and the demo code to reproduce our results, respectively.
demos/ is contains a number of subdirectories, each
corresponding to an experiment run in our paper, and we will detail these later on in the readme.
code/ has the following subdirectories:
bayesopt/ contains code for our Bayesian optimization experiments.
estimator/ contains code for stochastic trace estimation detailed
in this paper.
gp/ is the primary GPML-like interface for training a Gaussian process.
contains implementations of various kernels, including the D-SKI and D-SKIP kernels we developed.
lik/ contains standard marginal log likelihood
mvm/ contains fast matrix-vector multiplication subroutines.
testfunctions/ contains standard global optimization test functions
used to test our Bayesian optimization routines. Lastly,
utils/ contains various other subroutines we use.
We use Matlab's new element-wise multiplication operators, and thus require Matlab R2017A at a minimum.
To start, clone the repository and mex
key routine extracting the diagonal of the D-SKI matrix needed when preconditioning. Mexing allows one to run c code
from matlab, but mexing
ski_diag.c is not necessary if you do not plan on using D-SKI.
Before attempting any experiments, make sure to run
simply adds all necessary directories and subdirectories to Matlab's path.
Training a GP
To train a basic GP with no gradients, run:
[mu, K] = gp(X, Y)
where the pair
X,Y represent the training points and values respectively.
mu is a function handle predicting on new points e.g.
y_predict = mu(x_predict), and
K is the dense
kernel matrix. The same idea holds when gradients are available
[mu, K] = gp_grad(X, Y, dY)
dY contains the respective gradients of points in
X. Similar interfaces are available for (D-)SKI and (D-)SKIP kernels, except
a function handle performing matrix-vector multiplication instead of a dense matrix.
Spectrum Comparison Experiment
Any approximate kernel's spectrum should closely match the spectrum of the original kernel. This experiment
shows that our approximate kernels D-SKI and D-SKIP indeed possess this property. Run
/Demos/Spectrum/ to generate the following plot (figure 2 of section 4).
MVM Scaling Experiment
The cost of an matrix-vector multiply with a dense kernel matrix is quadratic, whereas the cost of a matrix-vector multiply
with either D-SKI or D-SKIP is linear, which is verified in this experiment. Run
/Demos/Scaling/ to generate the following plot (figure 3 of section 4).
Dimensionality Reduction Experiment
Possessing gradients allows one to perform dimensionality reduction via active subspace sampling. Run
/Demos/ActiveSubspace to generate the following plot (figure 4 and table 2 of section 4). For quick runtimes, the
demo code uses far fewer points than the experiment presented in the paper. The conclusion remains the same; D-SKIP with active
subspace sampling performs far better than the dense code.
Preconditioning is key to increasing convergence of iterative methods; we found that pivoted Cholesky is an effective preconditioner.
In general, convergence of iterative methods depends on the spectrum of the kernel matrix, and if there is spectral clustering or a large spectral
gap, we expect fast convergence. In this experiment, we consider the SE Kernel and plot the effectiveness of pivoted Cholesky in terms of both
lengthscale and regularization parameters. Run
PrecondCompare.m located in
/Demos/Precond to recreate the following plot (figure 8 in the appendix).
Rough Terrain Reconstruction Experiments
We apply our method to rough terrain reconsutrction of Mount St. Helens from lidar
MountStHelens.m located in
/Demos/Maps to recreate the following plot (figure 5 in section 4).
Implicit Surface Reconstruction Experiment
We apply our method to implicit surface reconstruction. We reconstruct the Stanford bunny
from noisy surface normals (i.e. gradients), which is difficult to do using traditional spline methods. Note that, as we are using roughly 40 thousand
points and gradients, reconstruction will take a bit of time. Run
bunny.m located in
/Demos/ImplicitBunny to recreate the following plot (figure 8 in the appendix).
Bayesian Optimization Experiment
Dimensionality reduction with gradient information is possible estimating the dominant active subspace. We imbed two popular test functions for global
optimization, the Ackley and Rastragin functions, in higher dimensions and run scalable Bayesian optimization with dimensionality reduction and see
promising results. Run
boComparison.m located in
/Demos/BayesOpt to recreate the following plot (figure 8 in the appendix).