# Function 3 - Drug Discovery

### First Inspection

According to the function description provided, the black-box function outputs negative values that quantify the bad side effects of a drug. The dosage of each compound in the drug are represented by the features $x_1$, $x_2$ and $x_3$. We want to maximise the black-box function such that we minmise the negative side effects of the drug. Let us visualise the known evaluations,

In [26]:
# Depedencies,
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
%matplotlib notebook

# SKOPT imports,
from skopt import gp_minimize
from skopt.space import Real
from skopt.learning import GaussianProcessRegressor
from skopt.learning.gaussian_process.kernels import Matern, ConstantKernel, WhiteKernel
from skopt import Optimizer
from joblib import dump, load

# Loading known evaluations,
X, y = np.load("initial_inputs.npy"), np.load("initial_outputs.npy")

# Visualising,
fig = go.Figure(
    data=go.Scatter3d(
        x=X.T[0],
        y=X.T[1],
        z=X.T[2],
        mode="markers",
        marker=dict(
            size=4,
            color=y,
            colorscale="Hot",
            colorbar=dict(title="y")
        )
    )
)

fig.update_layout(
    scene=dict(
        xaxis_title="$x_1$",
        yaxis_title="$x_2$",
        zaxis_title="$x_3$"
    )
)

fig.show()

Despite the sparsity of data points, we can see that the black-box function is likely to have many maxima. Similiar to all the previous functions, its domain is $[0, 1]^3$.

### Optimiser Configuration

From the discription of the black-box function, we know,

- $f(\mathbf{x})$ has many local minima. 
- The black-box function is not noisy (problem discription has not mentioned noise).

However, we do NOT know,

- If black-box function $f(\mathbf{x})$ is very smooth, generally smooth or rough.

Since we do not know how smooth $f(\mathbf{x})$ is, we opt for the Mat√©rn 5/2 kernel (with ARD) as a safe option. Selecting LCB (we are minmising $-f(x)$) as our acquistion function is appropriate since we want a notable level of exploration as to not get stuck in shallow local minimia.

$$
\text{LCB}(\mathbf{x}) = \mu(\mathbf{x}) + \kappa \sigma(\mathbf{x})
$$

We use $\kappa = 2$ which corresponds to a confidence interval of roughly 95% for a balance between exploration and exploitation [1]. Given that we have decided not account for noise, the elements of our kernel $\mathbf{K}$ are given by,

$$
K(\mathbf{x}_i, \mathbf{x}_j)
=
\sigma^2
\frac{2^{1-\nu}}{\Gamma(\nu)}
\left(
\sqrt{2\nu}\, r_{ij}
\right)^{\nu}
K_{\nu}
\left(
\sqrt{2\nu}\, r_{ij}
\right),
$$

$$
r_{ij}
=
\sqrt{
\frac{(x_{i1} - x_{j1})^2}{\ell_1^2}
+
\frac{(x_{i2} - x_{j2})^2}{\ell_2^2}
+
\frac{(x_{i3} - x_{j3})^2}{\ell_3^2}
}.

$$

where we have set the lengthscales as $\ell_1 = \ell_2 = \ell_3 = 0.1$ holistically and used $\nu = 5/2$ as well as $\sigma = 1$.


### Optimiser Initialisation

In [27]:
"""INITALISING THE OPTIMISATION MODEL."""

# Inputting the given evaluations provided by the problem,
X_supplied = X.tolist()
y_supplied = y.tolist()

"""OPTIMISER SETTINGS."""

# We define the domain of the black-box function (or the range of the parameter values we want to consider),
space = [Real(0, 1, name="x1"),
         Real(0, 1, name="x2"),
         Real(0, 1, name="x3")
         ]

# Creating the kernel for the GPR,
kernel = ConstantKernel(1.0) * Matern(
    length_scale=(0.1, 0.1, 0.1),
    length_scale_bounds=(1e-2, 1.0),
    nu = 5/2)

# GPR settings,
gpr = GaussianProcessRegressor(
    kernel=kernel,
    normalize_y=True,
    n_restarts_optimizer=10
)

# Creating optimisier,
opt = Optimizer(
    dimensions=space,
    base_estimator=gpr,
    acq_func="LCB",
    acq_func_kwargs={"kappa": 2.0},
    random_state=0
)

"""CREATING INTIAL OPTIMISER STATE."""

# Supplying given points to optimiser,
opt.tell(X_supplied, (-np.array(y_supplied)).tolist()) # <-- We flip the values since we are trying to maximise the black-box function.

# Asking for the next point to evaluate the black-box function,
point_query = opt.ask()

# Saving optimiser state (zero-th iteration),
dump(opt, "bayes_opt_state_iter0.joblib")

# Printing point query,
print(f"Point Query: {point_query}")

Point Query: [1.0, 0.0, 0.7577049265429194]


### Next Query 

In [28]:
current_query = 1

# Input the new evaluation,
X_new = [[1.000000, 0.000000, 0.757704]]
y_new = [-0.18261050571644452]

# Loading the previous state of the optimiser,
opt = load(f"bayes_opt_state_iter{current_query - 1}.joblib")

# Supplying the new query to the optimiser,
opt.tell(X_new, (-np.array(y_new)).tolist()) # <-- We flip the values since we are trying to maximise the black-box function.

# Asking for the next point to evaluate the black-box function,
point_query = opt.ask()

# Saving optimiser state,
dump(opt, f"bayes_opt_state_iter{current_query}.joblib")

# Printing point query,
print(f"Point Query {current_query + 1}: {point_query}")

Point Query 2: [1.0, 1.0, 0.4161200816250763]


### Visualisation

In [1]:
# Depedencies,
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
%matplotlib notebook

# SKOPT imports,
from skopt import gp_minimize
from skopt.space import Real
from skopt.learning import GaussianProcessRegressor
from skopt.learning.gaussian_process.kernels import Matern, ConstantKernel, WhiteKernel
from skopt import Optimizer
from joblib import dump, load

# Loading known evaluations,
X, y = np.load("initial_inputs.npy"), np.load("initial_outputs.npy")

# New evaluations,
X_new = [[1.000000, 0.000000, 0.757704]]
y_new = [-0.18261050571644452]

# Concatenating,
X = np.concatenate((X, X_new), axis=0)
y = np.concatenate((y, y_new), axis=0)

# Visualising,
fig = go.Figure(
    data=go.Scatter3d(
        x=X.T[0],
        y=X.T[1],
        z=X.T[2],
        mode="markers",
        marker=dict(
            size=4,
            color=y,
            colorscale="Hot",
            colorbar=dict(title="y")
        )
    )
)

fig.update_layout(
    scene=dict(
        xaxis_title="$x_1$",
        yaxis_title="$x_2$",
        zaxis_title="$x_3$"
    )
)

fig.add_trace(
    go.Scatter3d(
        x=[x[0] for x in X_new],
        y=[x[1] for x in X_new],
        z=[x[2] for x in X_new],
        mode="markers+text",
        text=[str(i+1) for i in range(len(X_new))],
        textposition="top center",
        marker=dict(
            size=6,
            color=y_new
        ),
        showlegend=False
    )
)

fig.show()

### Updates

Week 1: The optimiser sampled a point on the edge of feature space. When supplied with this evaluation result, it again selected an edge point to sample. Due to the sparse points, the optimiser appears to heavily prioritise exploration. If the third suggested sample point is near the edge of the feature space, we may consider employing a decay $\kappa$ parameter.

Week 2:

### References

[1] https://pyro.ai/examples/bo.html#:~:text=A%20large%20value%20of%20%CE%BA%20means%20that,We%20will%20use%20%CE%BA%20=%202%20.