# Conditional features in Bayesian optimization

When optimizing chemical processes, we often have some inputs that are dependent on others.
For example, the value of a `catalyst_concentration` input feature is only relevant depending
on another feature `use_catalyst==True`. Whilst it may seem that `use_catalyst==False` is
equivalent to just setting `catalyst_concentration==0`, there may be some limitations to 
this approach:
- If a catalyst is used, there may be some minimum amount required. It is difficult to model
 the disjoint bounds of a continuous feature.
- It may be the case that some tiny presence of catalyst enables a side reaction that completely
changes the reaction. We therefore have a step change at 0, with smoother behaviour everywhere
else in the domain, which Gaussian process surrogates cannot model well.

For a some examples of literature on these problems, see [Swersky2014Arc] and [Horn2019Wedge].

[Swersky2014Arc] Swersky et al. 2014, "Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces" arXiv    
[Horn2019Wedge] Horn et al. "Surrogates for hierarchical search spaces: the wedge-kernel and an automated analysis", GECCO

We consider a test problem as described above, where we wish to optimize the yield of
a reaction by controlling the temperature and catalyst concentration.

In [None]:
from bofire.data_models.constraints.api import SelectionCondition
from bofire.data_models.domain.api import Domain
from bofire.data_models.features.api import (
    ConditionalContinuousInput,
    ContinuousInput,
    ContinuousOutput,
    DiscreteInput,
)


domain = Domain.from_lists(
    inputs=[
        ContinuousInput(key="temperature", unit="Â°C", bounds=(50, 100)),
        DiscreteInput(key="use_catalyst", values=[0, 1]),
        ConditionalContinuousInput(
            key="catalyst_concentration",
            unit="mol/L",
            bounds=(0.0, 1.0),
            indicator_feature="use_catalyst",
            indicator_condition=SelectionCondition(selection=[1]),
        ),
    ],
    outputs=[ContinuousOutput(key="yield")],
)

After defining the domain, we can then build the wedge kernel for our GP surrogate.

In [None]:
from bofire.data_models.kernels.api import RBFKernel, WedgeKernel


conditions = [
    (feat.key, feat.indicator_feature, feat.indicator_condition)
    for feat in domain.inputs.get(includes=ConditionalContinuousInput)
    if isinstance(feat, ConditionalContinuousInput)  # included for type hint
]

wedge_kernel_data_model = WedgeKernel(base_kernel=RBFKernel(), conditions=conditions)

In [None]:
import torch

import bofire.kernels.api as kernels


def features_to_idx_mapper(feats: list[str]) -> list[int]:
    return domain.inputs.get_feature_indices({}, feats)


wedge_kernel = kernels.map(
    wedge_kernel_data_model,
    batch_shape=torch.Size([]),
    active_dims=list(range(3)),
    features_to_idx_mapper=features_to_idx_mapper,
)

In [None]:
wedge_kernel

WedgeKernel(
  (raw_lengthscale_constraint): Positive()
  (raw_angle_constraint): Interval(1.000E-04, 9.999E-01)
  (raw_radius_constraint): Positive()
  (base_kernel): RBFKernel(
    (raw_lengthscale_constraint): Positive()
  )
)

Now that we have defined our kernel, we can evaluate some different design points.
Note that we are not yet training the kernel (because it doesn't interact with 
`SingleTaskGP` yet), so we use the default values.

Inspecting the kernel matrix, you can see some desirable properties:
- `K(x1, x2) == K(x1, x1)`. That is, changing the catalyst concentration has no effect
on the kernel when not using a catalyst.
- `K(x3, x4) < K(x3, x3)`. Changing the catalyst concentration *does* have an effect
 when using a catalyst.
- `K(x1, x3) == K(x2, x3) != K(x2, x4) == K(x1, x4)`. When comparing two experiments, where
one uses a catalyst but the other does not, the kernel only takes into account the concentration
for the experiment with a catalyst.

In [None]:
# check the order of dimensions in X
assert features_to_idx_mapper(
    ["catalyst_concentration", "temperature", "use_catalyst"]
) == [0, 1, 2]
X = torch.tensor(
    [
        [0.0, 0.0, 0.0],
        [0.8, 0.0, 0.0],
        [0.0, 0.0, 1.0],
        [0.8, 0.0, 1.0],
    ]
)

wedge_kernel(X, X).to_dense()

tensor([[1.0000, 1.0000, 0.6063, 0.7161],
        [1.0000, 1.0000, 0.6063, 0.7161],
        [0.6063, 0.6063, 1.0000, 0.5137],
        [0.7161, 0.7161, 0.5137, 1.0000]], grad_fn=<ExpBackward0>)