This notebook is for playing around with ModelListGP to understand how it can be used
to model each arm independently.

In [1]:
from time import time

import torch
from botorch import fit_gpytorch_model
from botorch.models import ModelListGP, SingleTaskGP
from gpytorch import ExactMarginalLogLikelihood

mlgp = ModelListGP(
    SingleTaskGP(torch.rand(10, 2), torch.randn(10, 1)),
    SingleTaskGP(torch.rand(20, 2), torch.randn(20, 1)),
    SingleTaskGP(torch.rand(30, 2), torch.randn(30, 1)),
    SingleTaskGP(torch.rand(15, 2), torch.randn(15, 1)),
    SingleTaskGP(torch.rand(50, 2), torch.randn(50, 1))
)
for model in mlgp.models:
    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    fit_gpytorch_model(mll)

In [2]:
test_X = torch.rand(4, 2)

posterior = mlgp.posterior(test_X)

In [3]:
posterior.mean

tensor([[-0.2621, -0.0446,  0.0125, -0.9178, -0.1116],
        [-0.5756, -0.0809, -0.1007,  1.3870,  0.0771],
        [-0.5253, -0.0190, -0.0712,  1.3117,  0.1402],
        [ 0.6169, -0.2835, -0.1900,  0.9450, -0.1993]],
       grad_fn=<CopyBackwards>)

In [4]:
posterior.mvn.covariance_matrix

tensor([[ 0.2991,  0.1191,  0.1332,  0.0010,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.1191,  0.2156,  0.1735,  0.0110,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.1332,  0.1735,  0.1948,  0.0027,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0010,  0.0110,  0.0027,  0.1538,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.1597,  0.0477,  0.0582,  0.0031,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
      

So, it essentially stitches together the outputs from multiple models.

An important question here is "Does this stitching together come with a performance
penalty?"

Let's try to time this quickly.

In [5]:
start = time()

for i in range(10):
    test_X = torch.rand(1000, 2)
    posterior = mlgp.posterior(test_X)
    samples = posterior.rsample(torch.Size([1000]))

print(f"MLGP sampling took {time()-start}")



MLGP sampling took 2.7713494300842285


In [6]:
start = time()

for i in range(10):
    test_X = torch.rand(1000, 2)
    for model in mlgp.models:
        posterior = model.posterior(test_X)
        samples = posterior.rsample(torch.Size([1000]))

print(f"Individual sampling took {time()-start}")



Individual sampling took 2.908242702484131


Everything looks ok, there doesn't seem to be any penalty for using this directly.

The next step is to make IKG work with MLGP. This is being done under `finite_ikg` code.