# Gradient Descent GP Selection

A Python notebook regarding Gaussian Processes based primarily on two pre-prints: *Computation-Aware Gaussian Processes* and *Approximation-Aware Bayesian Optimization*

In [2]:
import math
import matplotlib.pyplot as plt
import numpy as np
import torch

try:
    import jaxtyping
except ImportError:
    %pip install jaxtyping

from typing import Optional
# Type hints are strictly optional, but personally I find that they make code more reasonable

from jaxtyping import Float, Integer
# This package allows type annotations that include the size of torch Tensors/numpy arrays
# It's not necessary, but it helps with understanding what each function does

from torch import Tensor

%matplotlib inline

# Set DTYPE and DEVICE variables for torch tensors
DTYPE = torch.float32
DEVICE = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# Set a seed (for reproducibility)
torch.manual_seed(2024)

<torch._C.Generator at 0x2353f7e3030>

Initially, we have some existing dataset $\mathcal{D}_{0} = \{(x_{i}, y_{i})\}_{i=1}^{n}$, with $x_{i} \in \mathbb{R}^{d}, y_{i} \in \mathbb{R}$. Equivalently, we let $\mathcal{D}_{0} = (\mathbf{X}, \mathbf{y})$, with $\mathbf{X} \in \mathbb{R}^{n \times d}, \mathbf{y} \in \mathbb{R}^{n}$. 

We want to use Gaussian Process regression to perform Bayesian optimization to find $x^{*} = \arg\max_{\mathbb{R}^{d}}f(\cdot)$, for the unknown objective function $f(\cdot): \mathbb{R}^{d} \to \mathbb{R}$.

Unfortunately, the standard `BayesOpt` formulation has $\mathcal{O}(n^3)$ time complexity, as the "proper" mathematical formulation requires a matrix inversion. To reduce the computational complexity, we include an "action matrix" $\mathbf{S} \in \mathbb{R}^{n \times k}$ for $k \ll n$, which yields $\mathcal{O}(n^2{k})$ time complexity.

In [None]:
def mu(X: Float[Tensor, "N D"]) -> Float[Tensor, "N"]:
    r"""
    Computes the (very lame) zero mean function mu(X) = 0
    """

    return torch.zeros(*X.shape[:-1], dtype=X.dtype, device=X.device)

    # This return statement might seem like it's a pedantic way just to return the number 0 :)
    # It's not:
    # - if we want to compute a batch of GPs, the batch size of the returned zero
    #   tensor will match the batch size of X
    # - if X is a float64 tensor rather than float32, the returned zero tensor will match the correct dtype
    # - if X is on the GPU rather than the CPU, the returned zero tensor will also be on the same device

    # You don't always have to be this pedantic, but it's not a bad habit to get into

In [105]:
def hartmann_six(X: Float[Tensor, "N 6"]) -> Float[Tensor, "N"]:
    r"""
    Computes the value of the Hartmann six-dimensional test function on N rows of input data
    More info on this test function at: https://www.sfu.ca/~ssurjano/hart6.html
    """
    alpha = torch.tensor([1.0, 1.2, 3.0, 3.2], dtype = DTYPE, device = X.device)
    A = torch.tensor([[10, 3, 17, 3.5, 1.7, 8],
                      [0.05, 10, 17, 0.1, 8, 14],
                      [3, 3.5, 1.7, 10, 17, 8],
                      [17, 8, 0.05, 10, 0.1, 14]],
                     dtype = DTYPE, device = X.device)
    P = 1e-4 * torch.tensor([[1312, 1696, 5569, 124, 8283, 5886],
                             [2329, 4135, 8307, 3736, 1004, 9991],
                             [2348, 1451, 3522, 2883, 3047, 6650],
                             [4047, 8828, 8732, 5743, 1091, 381]], 
                            dtype = DTYPE, device = X.device)

    # Calculate "inner sums" 
    inner_sums: Float[Tensor, "N 4"] = torch.sum(A * (X.unsqueeze(-2) - P).pow(2), -1)

    # Exponentiate and compute "outer sums"
    outer_sums: Float[Tensor, "N"] = -alpha @ torch.exp(-inner_sums).mT
    
    return(outer_sums)

# Just checking the function works as desired
test_vecs = torch.tensor([[0., 0., 0., 0., 0., 0.],
                          [0.20169, 0.150011, 0.476874, 0.275332, 0.311652, 0.657300]])
print(hartmann_six(test_vecs).size())

torch.Size([2])


In [98]:
# r"""
# alpha <- c(1.0, 1.2, 3.0, 3.2)
#   A <- c(10, 3, 17, 3.5, 1.7, 8,
#          0.05, 10, 17, 0.1, 8, 14,
#          3, 3.5, 1.7, 10, 17, 8,
#          17, 8, 0.05, 10, 0.1, 14)
#   A <- matrix(A, 4, 6, byrow=TRUE)
#   P <- 10^(-4) * c(1312, 1696, 5569, 124, 8283, 5886,
#                    2329, 4135, 8307, 3736, 1004, 9991,
#                    2348, 1451, 3522, 2883, 3047, 6650,
#                    4047, 8828, 8732, 5743, 1091, 381)
#   P <- matrix(P, 4, 6, byrow=TRUE)
# """
# P: Float[Tensor, "4 6"] = 1e-4 * torch.tensor([[1312, 1696, 5569, 124, 8283, 5886],
#                                                    [2329, 4135, 8307, 3736, 1004, 9991],
#                                                    [2348, 1451, 3522, 2883, 3047, 6650],
#                                                    [4047, 8828, 8732, 5743, 1091, 381]])

# test1 = 1e-2 * torch.tensor([[1, 2, 3, 4, 5, 6], [1, 2, 4, 8, 16, 32], [20.169, 15.0011, 47.6874, 27.5332, 31.1652, 65.7300]])
# test2 = (torch.unsqueeze(test1, -2)).pow(2)
# test2.sum(-1)
# hartmann_six(test1)

torch.Size([2, 6])