This notebook is for a simple test to see whether LCEGP is faster on cuda or cpu.

There's some strange behavior going on with cuda. With fewer alternatives, it is slow.
But when we increase the number of alternatives, it gets faster.

In [7]:
import torch
from gpytorch import ExactMarginalLogLikelihood
from torch.distributions import MultivariateNormal
from contextual_rs.custom_fit import custom_fit_gpytorch_model
from contextual_rs.lce_gp import LCEGP


def main_run(
    num_alternatives: int,
    num_iterations: int,
    num_train: int,
    device: str,
    rho: float = 0.5,
) -> None:
    ckwargs = {"device": device}
    K = num_alternatives
    true_mean = torch.linspace(0, 1, K, **ckwargs)
    true_cov = torch.zeros(K, K, **ckwargs)
    for i in range(K):
        for j in range(K):
            true_cov[i, j] = torch.tensor(rho, **ckwargs).pow(abs(i - j))
    sampling_post = MultivariateNormal(true_mean, true_cov)

    all_Ys = sampling_post.rsample(
        torch.Size([num_train + num_iterations])
    ).detach()
    train_X = torch.tensor(
        range(num_alternatives), dtype=torch.float, **ckwargs
    ).repeat(num_train).view(-1, 1)
    train_Y = all_Ys[:num_train].view(-1, 1)

    random_X = torch.randint(0, num_alternatives, (num_iterations,1), **ckwargs)
    random_Y = all_Ys[num_train:].gather(
        dim=-1, index=random_X
    )

    for i in range(num_iterations):
        model = LCEGP(
            train_X, train_Y, [0]
        )
        mll = ExactMarginalLogLikelihood(model.likelihood, model)
        custom_fit_gpytorch_model(mll)

        train_X = torch.cat([train_X, random_X[i].view(-1, 1)], dim=0)
        train_Y = torch.cat([train_Y, random_Y[i].view(-1, 1)], dim=0)


Timing results below. Each cell runs the same setting, first on cpu, then on cuda.

In [12]:
%timeit main_run(5, 20, 3, "cpu")
%timeit main_run(5, 20, 3, "cuda")

3.35 s ± 836 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
11.5 s ± 1.61 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [13]:
%timeit main_run(10, 50, 10, "cpu")
%timeit main_run(10, 50, 10, "cuda")


27.2 s ± 4.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
38.9 s ± 2.95 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [10]:
%timeit main_run(25, 50, 10, "cpu")
%timeit main_run(25, 50, 10, "cuda")

1min 30s ± 16.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
11.8 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
%timeit main_run(40, 50, 10, "cpu")
%timeit main_run(40, 50, 10, "cuda")


3min 19s ± 20.7 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
12.4 s ± 917 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [14]:
%timeit main_run(5, 100, 3, "cpu")
%timeit main_run(5, 100, 3, "cuda")

20.1 s ± 1.96 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
1min 3s ± 8.63 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
