Scalable Gaussian Process Regression with Derivatives
Switch branches/tags
Nothing to show
Clone or download
Latest commit 83839d6 Nov 8, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
code added missing files Oct 4, 2018
demos added test function Oct 1, 2018
images added images to readme Sep 28, 2018 Update Nov 8, 2018
startup.m initial full commit after cleaning up a bit of code Sep 28, 2018


This repository contains code necessary to not only reproduce the experiments presented in Scalable Gaussian Process Regression with Derivatives @ NIPS 2018, but also for any wishing to use our methods.

Directory Structure: our root directory contains the code/ and demos/ folders, which contain the research code written for our paper and the demo code to reproduce our results, respectively. demos/ is contains a number of subdirectories, each corresponding to an experiment run in our paper, and we will detail these later on in the readme. code/ has the following subdirectories:

  • bayesopt/
  • estimator/
  • gp/
  • kernels/
  • lik/
  • mvm/
  • testfunctions/
  • utils/

bayesopt/ contains code for our Bayesian optimization experiments. estimator/ contains code for stochastic trace estimation detailed in this paper. gp/ is the primary GPML-like interface for training a Gaussian process. kernels/ contains implementations of various kernels, including the D-SKI and D-SKIP kernels we developed. lik/ contains standard marginal log likelihood functions and mvm/ contains fast matrix-vector multiplication subroutines. testfunctions/ contains standard global optimization test functions used to test our Bayesian optimization routines. Lastly, utils/ contains various other subroutines we use.


We use Matlab's new element-wise multiplication operators, and thus require Matlab R2017A at a minimum. To start, clone the repository and mex ski_diag.c, a key routine extracting the diagonal of the D-SKI matrix needed when preconditioning. Mexing allows one to run c code from matlab, but mexing ski_diag.c is not necessary if you do not plan on using D-SKI. Before attempting any experiments, make sure to run startup.m, which simply adds all necessary directories and subdirectories to Matlab's path.

Training a GP

To train a basic GP with no gradients, run:

[mu, K] = gp(X, Y)

where the pair X,Y represent the training points and values respectively. mu is a function handle predicting on new points e.g. y_predict = mu(x_predict), and K is the dense kernel matrix. The same idea holds when gradients are available

[mu, K] = gp_grad(X, Y, dY)

where dY contains the respective gradients of points in X. Similar interfaces are available for (D-)SKI and (D-)SKIP kernels, except K is a function handle performing matrix-vector multiplication instead of a dense matrix.

Spectrum Comparison Experiment

Any approximate kernel's spectrum should closely match the spectrum of the original kernel. This experiment shows that our approximate kernels D-SKI and D-SKIP indeed possess this property. Run SpectrumError.m located in /Demos/Spectrum/ to generate the following plot (figure 2 of section 4).

MVM Scaling Experiment

The cost of an matrix-vector multiply with a dense kernel matrix is quadratic, whereas the cost of a matrix-vector multiply with either D-SKI or D-SKIP is linear, which is verified in this experiment. Run ScalingComparison.m located in /Demos/Scaling/ to generate the following plot (figure 3 of section 4).

Dimensionality Reduction Experiment

Possessing gradients allows one to perform dimensionality reduction via active subspace sampling. Run DimReduction.m located in /Demos/ActiveSubspace to generate the following plot (figure 4 and table 2 of section 4). For quick runtimes, the demo code uses far fewer points than the experiment presented in the paper. The conclusion remains the same; D-SKIP with active subspace sampling performs far better than the dense code.

Preconditioning Experiment

Preconditioning is key to increasing convergence of iterative methods; we found that pivoted Cholesky is an effective preconditioner. In general, convergence of iterative methods depends on the spectrum of the kernel matrix, and if there is spectral clustering or a large spectral gap, we expect fast convergence. In this experiment, we consider the SE Kernel and plot the effectiveness of pivoted Cholesky in terms of both lengthscale and regularization parameters. Run PrecondCompare.m located in /Demos/Precond to recreate the following plot (figure 8 in the appendix).

Rough Terrain Reconstruction Experiments

We apply our method to rough terrain reconsutrction of Mount St. Helens from lidar data. Run MountStHelens.m located in /Demos/Maps to recreate the following plot (figure 5 in section 4).

Implicit Surface Reconstruction Experiment

We apply our method to implicit surface reconstruction. We reconstruct the Stanford bunny from noisy surface normals (i.e. gradients), which is difficult to do using traditional spline methods. Note that, as we are using roughly 40 thousand points and gradients, reconstruction will take a bit of time. Run bunny.m located in /Demos/ImplicitBunny to recreate the following plot (figure 8 in the appendix).

Bayesian Optimization Experiment

Dimensionality reduction with gradient information is possible estimating the dominant active subspace. We imbed two popular test functions for global optimization, the Ackley and Rastragin functions, in higher dimensions and run scalable Bayesian optimization with dimensionality reduction and see promising results. Run boComparison.m located in /Demos/BayesOpt to recreate the following plot (figure 8 in the appendix).