[performance issue] model.fantasize() is significantly slower on GPU #492

saitcakmak · 2020-07-23T19:54:40Z

Issue description

Generating fantasy models using model.fantasize() takes significantly longer on GPU compared to CPU. The example below is extracted from evaluation of raw_samples while optimizing qKnowledgeGradient. Running the code below, I get ~60 ms using CPU and ~10000 ms using GPU. I traced the issue down to gpytorch.models.exact_prediction_strategies.py line 220 Q, R = torch.qr(new_root). That line appears to be the bottleneck, however, I do not know what is happening beyond there.

Code example

Run the code below with device = torch.device('cuda') and device = torch.device('cpu').

import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_model
from botorch.models.transforms import Standardize
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.sampling.samplers import SobolQMCNormalSampler
from time import time

# set the device, 'cuda' or 'cpu'
device = torch.device('cuda')
print('Using device:', device)

# train data, obtained from Branin function projected to unit hypercube
train_X = torch.tensor([[0.5200, 0.0661],
                        [0.9702, 0.8459],
                        [0.0119, 0.3492],
                        [0.2245, 0.9323],
                        [0.7585, 0.4347],
                        [0.7473, 0.6529]],
                       device=device
                       )
train_Y = torch.tensor([[-2.4734],
                        [-102.1244],
                        [-139.2399],
                        [-34.9917],
                        [-48.7695],
                        [-94.5974]],
                       device=device
                       )
# initialize and fit the gp model
model = SingleTaskGP(train_X, train_Y, outcome_transform=Standardize(m=1))
mll = ExactMarginalLogLikelihood(model.likelihood, model)
fit_gpytorch_model(mll)

sampler = SobolQMCNormalSampler(64)

X = torch.rand((2000, 1, 2), device=device)

if device == torch.device('cuda'):
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)
    start.record()
else:
    start = time()

model.fantasize(X, sampler)

if device == torch.device('cuda'):
    end.record()
    torch.cuda.synchronize()
    print("time elapsed (ms): %f" % start.elapsed_time(end))
else:
    print("time elapsed (ms): %f" % (1000 * (time() - start)))

System Info

Please provide information about your setup, including

BoTorch Version 0.3.0
GPyTorch Version 1.1.1
PyTorch Version 1.5.1 with torchvision 0.6.1 and cudatoolkit 10.2.89
Python 3.8.3 on anaconda
Computer OS: Ubuntu 20.04

The text was updated successfully, but these errors were encountered:

Balandat · 2020-07-23T20:19:20Z

Thanks for raising this, this is an upstream issue that we are aware of: cornellius-gp/gpytorch#1157

Really, it is a pytorch issue with qr being slow here - we aim to find a workaround on the gpytorch end since the pytorch fix will likely take a while.

saitcakmak · 2020-07-23T21:10:12Z

Thanks for quick response Max!
It is a dirty fix but replacing line 220 of gpytorch.models.exact_prediction_strategies.py with

        device = new_root.device
        Q, R = torch.qr(new_root.cpu())
        Q = Q.to(device)
        R = R.to(device)

fixes the issue. It reduced the runtime from ~10000 ms to ~33 ms.

jacobrgardner · 2020-07-23T21:18:27Z

@Balandat new_root should always be reasonably skinny in cases where we ought to be doing QR here. Maybe we should indeed do it on the CPU upstream for now? That would be a pretty reasonably quick fix that would maintain the numerical stability of using QR instead of Woodbury.

Balandat · 2020-07-23T21:46:23Z

Yeah that makes sense to me. One thing I do want to do once #1102 goes in is to just check whether L is a TriangularLazTensor (e.g. always when using cholesky) - in that case we just need to do two successive triangular solves. We can use the CPU fix whenever L is not a TLT.

Balandat · 2020-07-23T22:08:18Z

cornellius-gp/gpytorch#1224

saitcakmak closed this as completed Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[performance issue] model.fantasize() is significantly slower on GPU #492

[performance issue] model.fantasize() is significantly slower on GPU #492

saitcakmak commented Jul 23, 2020

Balandat commented Jul 23, 2020

saitcakmak commented Jul 23, 2020

jacobrgardner commented Jul 23, 2020

Balandat commented Jul 23, 2020

Balandat commented Jul 23, 2020

[performance issue] model.fantasize() is significantly slower on GPU #492

[performance issue] model.fantasize() is significantly slower on GPU #492

Comments

saitcakmak commented Jul 23, 2020

Issue description

Code example

System Info

Balandat commented Jul 23, 2020

saitcakmak commented Jul 23, 2020

jacobrgardner commented Jul 23, 2020

Balandat commented Jul 23, 2020

Balandat commented Jul 23, 2020