# BoTorch notes

Here we present several experiments done with BoTorch with an aim to understand the usage and behaviors of its components. 

# Preparation of toy data

In [3]:
import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_mll
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.acquisition import UpperConfidenceBound, qUpperConfidenceBound
from botorch.optim import optimize_acqf

from autobl.steering.measurement import *
from autobl.util import *

Define a measurement simulator that returns values from the function $(x - 0.6)^2 + \epsilon$, where $\epsilon \sim U(0, 0.001)$ is additive noise. 

In [4]:
def measurement_func(x: torch.Tensor):
    y = -(x - 0.6) ** 2
    y = y + 0.001 * torch.randn_like(y)
    return y

measurement = SimulatedMeasurement(f=measurement_func)

train_x = torch.rand(10, 1).double()
train_y = measurement.measure(train_x)

Define and fit model. 

In [5]:
gp = SingleTaskGP(train_x, train_y)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_mll(mll)



ExactMarginalLogLikelihood(
  (likelihood): GaussianLikelihood(
    (noise_covar): HomoskedasticNoise(
      (noise_prior): GammaPrior()
      (raw_noise_constraint): GreaterThan(1.000E-04)
    )
  )
  (model): SingleTaskGP(
    (likelihood): GaussianLikelihood(
      (noise_covar): HomoskedasticNoise(
        (noise_prior): GammaPrior()
        (raw_noise_constraint): GreaterThan(1.000E-04)
      )
    )
    (mean_module): ConstantMean()
    (covar_module): ScaleKernel(
      (base_kernel): MaternKernel(
        (lengthscale_prior): GammaPrior()
        (raw_lengthscale_constraint): Positive()
      )
      (outputscale_prior): GammaPrior()
      (raw_outputscale_constraint): Positive()
    )
  )
)

## Monte-Carlo acquisition functions (q-acquisition functions)

BoTorch has both analytical acquisition functions and Monte-Carlo acquisition functions. 

Analytical functions expect an input of shape `[b, d]` or `[b, 1, d]`, where `b` is the batch size (or "b-batch" size as denoted in the official documentation). The `1` in the latter case explicitly specifies that the input has a "q-batch" size of 1. 

The q-batch can only be greater than 1 when a Monte-Carlo acquisition function is used. For all points given in a q-batch, the function evaluates the values at these points, then returns the maximum value among them. 

In [6]:
UCB = UpperConfidenceBound(gp, beta=0.1)
x = torch.tensor([[[0.6]], [[0.4]], [[0.8]]])
print(x.shape)
print(UCB(x))

torch.Size([3, 1, 1])
tensor([ 0.0028, -0.0315, -0.0392], dtype=torch.float64,
       grad_fn=<AddBackward0>)


Below is an example of evaluating the q-UCB acquisition function. The input x has a b-batch size of 3 and a q-batch size of 4. The acquisition function returns 3 values for each b-batch. Each value is the maximum of the function values calculated at all points in the q-batch. 

In [9]:
qUCB = qUpperConfidenceBound(gp, beta=0.1)
x = torch.tensor([[[0.6], [0.61], [0.59], [0.58]], [[0.4], [0.39], [0.41], [0.42]], [[0.8], [0.79], [0.81], [0.82]]])
print(x.shape)
print(qUCB(x))

torch.Size([3, 4, 1])
tensor([ 0.0030, -0.0243, -0.0351], dtype=torch.float64,
       grad_fn=<MeanBackward1>)


When the q-batch size is 1, MC acquisition functions behave the same as their analytical counterparts. 

In [10]:
x = torch.tensor([[[0.6]], [[0.4]], [[0.8]]])
print(x.shape)
print(qUCB(x))

torch.Size([3, 1, 1])
tensor([ 0.0028, -0.0315, -0.0392], dtype=torch.float64,
       grad_fn=<MeanBackward1>)


### Getting multiple candidates through `optimize_acqf` using MC acquisition functions

When using an MC acquisition function, `optimize_acqf` returns `q` candidates at a time. 

In [12]:
bounds = torch.stack([torch.zeros(1), torch.ones(1)])
candidate, acq_value = optimize_acqf(
    qUCB, bounds=bounds, q=3, num_restarts=5, raw_samples=20,
)
candidate

tensor([[0.5799],
        [0.8673],
        [0.6048]])