# Scalable Exact QEP Posterior Sampling using Contour Integral Quadrature

This notebook demonstrates the most simple usage of contour integral quadrature with msMINRES as described [here](https://arxiv.org/pdf/2006.11267.pdf) to sample from the predictive distribution of an exact QEP.

Note that to achieve results where Cholesky would run the GPU out of memory, you'll need to have KeOps installed (see our KeOps tutorial in this same folder). Despite this, on this relatively simple example with 1000 training points but seeing to sample at 20000 test points in 1D, we will achieve significant speed ups over Cholesky.

In [1]:
import math
import torch
import qpytorch
from matplotlib import pyplot as plt

import warnings
warnings.simplefilter("ignore", qpytorch.utils.warnings.NumericalWarning)

%matplotlib inline
%load_ext autoreload
%autoreload 2


/bin/sh: brew: command not found



In [2]:
# Training data is 11 points in [0,1] inclusive regularly spaced
train_x = torch.linspace(0, 1, 1000)
# True function is sin(2*pi*x) with Gaussian noise
train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2

### Are we running with KeOps?

If you have KeOps, change the below flag to `True` to run with a significantly larger test set.

In [3]:
HAVE_KEOPS = True

### Define an Exact QEP Model and train

In [6]:
POWER = 1.0
class ExactQEPModel(qpytorch.models.ExactQEP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactQEPModel, self).__init__(train_x, train_y, likelihood)
        self.power = torch.tensor(POWER)
        self.mean_module = qpytorch.means.ConstantMean()
        
        if HAVE_KEOPS:
            self.covar_module = qpytorch.kernels.ScaleKernel(qpytorch.kernels.keops.RBFKernel())
        else:
            self.covar_module = qpytorch.kernels.ScaleKernel(qpytorch.kernels.RBFKernel())
    
    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return qpytorch.distributions.MultivariateQExponential(mean_x, covar_x, power=self.power)

# initialize likelihood and model
likelihood = qpytorch.likelihoods.QExponentialLikelihood(power=torch.tensor(POWER))
model = ExactQEPModel(train_x, train_y, likelihood)

In [7]:
if torch.cuda.is_available():
    train_x = train_x.cuda()
    train_y = train_y.cuda()
    model = model.cuda()
    likelihood = likelihood.cuda()

In [8]:
# this is for running the notebook in our testing framework
import os
smoke_test = ('CI' in os.environ)
training_iter = 2 if smoke_test else 50

# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes QExponentialLikelihood parameters

# "Loss" for QEPs - the marginal log likelihood
mll = qpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()

[KeOps] Generating code for Sum_Reduction reduction (with parameters 0) of formula Exp(-1/2*(a-b)**2)*c with a=Var(0,1,0), b=Var(1,1,1), c=Var(2,11,1) ... OK
[pyKeOps] Compiling pykeops cpp 96bffbb672 module ... 

OK
[KeOps] Generating code for Sum_Reduction reduction (with parameters 0) of formula -(((d|c)*(a-b))*Exp(-1/2*(a-b)**2)) with a=Var(0,1,0), b=Var(1,1,1), c=Var(2,11,1), d=Var(3,11,0) ... OK
[pyKeOps] Compiling pykeops cpp 4c8081cd84 module ... 

OK
[KeOps] Generating code for Sum_Reduction reduction (with parameters 1) of formula ((d|c)*(a-b))*Exp(-1/2*(a-b)**2) with a=Var(0,1,0), b=Var(1,1,1), c=Var(2,11,1), d=Var(3,11,0) ... OK
[pyKeOps] Compiling pykeops cpp 5c10e3617c module ... 

OK
Iter 1/50 - Loss: 2.125   lengthscale: 0.693   noise: 0.693
Iter 2/50 - Loss: 2.067   lengthscale: 0.644   noise: 0.644
Iter 3/50 - Loss: 2.001   lengthscale: 0.598   noise: 0.598
Iter 4/50 - Loss: 1.925   lengthscale: 0.554   noise: 0.554
Iter 5/50 - Loss: 1.847   lengthscale: 0.513   noise

### Define test set

If we have KeOps installed, we'll test on 5000 points instead of 1000.

In [9]:
if HAVE_KEOPS:
    test_n = 5000
else:
    test_n = 1000

test_x = torch.linspace(0, 1, test_n)
if torch.cuda.is_available():
    test_x = test_x.cuda()
print(test_x.shape)

torch.Size([5000])


### Draw a sample with CIQ

To do this, we just add the `ciq_samples` setting to the rsample call. We additionally demonstrate all relevant settings for controlling Contour Integral Quadrature:

- The `ciq_samples` setting determines whether or not to use CIQ
- The `num_contour_quadrature` setting controls the number of quadrature sites (Q in the paper).
- The `minres_tolerance` setting controls the error we tolerate from minres (here, <0.01%).

Note that, of these settings, increase num_contour_quadrature is unlikely to improve performance. As Theorem 1 from the paper demonstrates, virtually all of the error in this method is controlled by minres_tolerance. Here, we use a quite tight tolerance for minres.

In [10]:
import time

model.train()
likelihood.train()

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood

test_x.requires_grad_(True)

with torch.no_grad():
    observed_pred = likelihood(model(test_x))
    
    # All relevant settings for using CIQ.
    #   ciq_samples(True) - Use CIQ for sampling
    #   num_contour_quadrature(10) -- Use 10 quadrature sites (Q in the paper)
    #   minres_tolerance -- error tolerance from minres (here, <0.01%).
    print("Running with CIQ")
    with qpytorch.settings.ciq_samples(True), qpytorch.settings.num_contour_quadrature(10), qpytorch.settings.minres_tolerance(1e-4):
        %time y_samples = observed_pred.rsample()
    
    print("Running with Cholesky")
    # Make sure we use Cholesky
    with qpytorch.settings.fast_computations(covar_root_decomposition=False):
        %time y_samples = observed_pred.rsample()

[KeOps] Generating code for Sum_Reduction reduction (with parameters 0) of formula c*Exp(-1/2*(a-b)**2) with a=Var(0,1,0), b=Var(1,1,1), c=Var(2,1,1) ... OK
[pyKeOps] Compiling pykeops cpp f3866a6969 module ... 

OK
[KeOps] Generating code for Sum_Reduction reduction (with parameters 0) of formula Exp(-1/2*(a-b)**2)*c with a=Var(0,1,0), b=Var(1,1,1), c=Var(2,1000,1) ... OK
[pyKeOps] Compiling pykeops cpp 093ce4c10c module ... 

OK
[KeOps] Generating code for Sum_Reduction reduction (with parameters 0) of formula Exp(-1/2*(a-b)**2)*c with a=Var(0,1,0), b=Var(1,1,1), c=Var(2,5000,1) ... OK
[pyKeOps] Compiling pykeops cpp 26f0a724f7 module ... 

OK
Running with CIQ
[KeOps] Generating code for Sum_Reduction reduction (with parameters 0) of formula c*Exp(-1/2*(a-b)**2) with a=Var(0,1,0), b=Var(1,1,1), c=Var(2,1,1) ... OK
[pyKeOps] Compiling pykeops cpp 4e43c3b37d module ... 
In file included from /Users/shiweilan/.cache/keops2.3/Darwin_SHIWEIs-iMac.local_24.5.0_p3.10.18/pykeops_cpp_4e43c3b3

ModuleNotFoundError: No module named 'pykeops_cpp_4e43c3b37d'

Running with Cholesky
CPU times: user 2min 36s, sys: 3.99 s, total: 2min 40s
Wall time: 27.4 s
