# Classification Surrogate Tests

We are interested in testing whether or not a surrogate model can correctly identify unknown constraints based on binary feasibility/infeasibility. This involves new models which produce `CategoricalOutput`s rather than continuous outputs. Mathematically, instead of multiplying the objective by $\sigma(x)\in(0,1)$, we multiply by $I(x)$ which is 1 if $x\in X$ otherwise it is 0. Since currently BoTorch does not offer support for discrete feasibility constraints (see: [here](https://github.com/pytorch/botorch/blob/main/botorch/utils/objective.py#L122)), we will instead always multiply our objective directly by the feasibility value

In our toy example, the feasible points satisfy $x_1+x_2<= 1.0$.

In [1]:
# Import packages
import bofire.strategies.api as strategies
from bofire.data_models.api import Domain, Outputs, Inputs
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput, CategoricalOutput, CategoricalInput
from bofire.data_models.objectives.api import MinimizeObjective
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


## Manual setup of the optimization domain

The following cell shows how to manually setup the optimization problem in BoFire for didactic purposes. We design a feasible set and output constraints for example.

In [2]:
# Set-up the inputs and outputs, use categorical domain just as an example
input_features = Inputs(features=[ContinuousInput(key=f"x_{i}", bounds=(0, 1)) for i in range(5)] + [CategoricalInput(key=f"x_5", categories=(0.5, 0.0))])

# here the minimize objective is used, if you want to maximize you have to use the maximize objective.
output_features = Outputs(features=[
        ContinuousOutput(key=f"f_{0}", objective=MinimizeObjective(w=1.)),
        CategoricalOutput(key=f"f_{1}", categories=["infeasible", "feasible"], objective=[0, 1]) # This function will be associated with learning the feasibility/infeasibility
    ]
)

# Create domain
domain1 = Domain(inputs=input_features, outputs=output_features)

# Sample random points
sample_df = domain1.inputs.sample(20).astype(float) # Sample x's

# Write a function which outputs one continuous variable and another discrete based on some logic
# Here, feasible points are points whose first two components sum to less then 1.0 - in real experiments, these would not be known
sample_df["f_0"] = np.cos(sample_df.values.sum(1))
sample_df["f_1"] = "infeasible"
sample_df.loc[sample_df["x_0"]+sample_df["x_1"] <= 1.0, "f_1"] = "feasible"

sample_df.head(5)

Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0,f_1
0,0.934601,0.924862,0.223239,0.658141,0.394967,0.0,-0.999983,infeasible
1,0.705769,0.14025,0.989253,0.156419,0.347286,0.0,-0.694829,feasible
2,0.528549,0.967869,0.653419,0.40121,0.822478,0.0,-0.973224,infeasible
3,0.539549,0.005963,0.673214,0.911884,0.672387,0.0,-0.943222,feasible
4,0.40403,0.046633,0.628572,0.763645,0.952251,0.5,-0.988236,feasible


In [3]:
sample_df.round()

Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0,f_1
0,1.0,1.0,0.0,1.0,0.0,0.0,-1.0,infeasible
1,1.0,0.0,1.0,0.0,0.0,0.0,-1.0,feasible
2,1.0,1.0,1.0,0.0,1.0,0.0,-1.0,infeasible
3,1.0,0.0,1.0,1.0,1.0,0.0,-1.0,feasible
4,0.0,0.0,1.0,1.0,1.0,0.0,-1.0,feasible
5,0.0,0.0,1.0,1.0,0.0,0.0,-1.0,feasible
6,0.0,1.0,1.0,0.0,1.0,0.0,-1.0,feasible
7,0.0,1.0,0.0,0.0,0.0,0.0,-0.0,infeasible
8,1.0,1.0,1.0,1.0,0.0,0.0,-1.0,infeasible
9,1.0,1.0,1.0,0.0,1.0,0.0,-1.0,infeasible


## Setup of the Strategy and ask for Candidates



In [4]:
from bofire.data_models.acquisition_functions.api import qNEI, qUCB, qSR, qEI
from bofire.data_models.strategies.api import QparegoStrategy, MultiplicativeSoboStrategy, SoboStrategy
from bofire.data_models.surrogates.api import BotorchSurrogates, MLPEnsemble
from bofire.data_models.domain.api import Outputs

strategy_data = SoboStrategy(domain=domain1, 
                             acquisition_function=qEI(), 
                             surrogate_specs=BotorchSurrogates(surrogates=
                                    [
                                        MLPEnsemble(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs[1]]), lr=1.0, n_epochs=100, hidden_layer_sizes=(20,))
                                    ]
                                )
                            )

strategy = strategies.map(strategy_data)

strategy.tell(sample_df)
candidates = strategy.ask(1)

candidates



CategoricalMethodEnum.EXHAUSTIVE




ValueError: values present are not in ('infeasible', 'feasible')

In [None]:
import torch
import pandas as pd
# strategy.surrogates.surrogates[0].model.posterior(torch.tensor(domain1.inputs.sample(20).astype(float).values))
t = torch.tensor(domain1.inputs.transform(domain1.inputs.sample(1), strategy.surrogates.surrogates[0].input_preprocessing_specs).values)
strategy.surrogates.surrogates[0].model.posterior(t).mean
# domain1.outputs[1](pd.Series([1.0]))
# domain1.inputs.transform(domain1.inputs.sample(20), domain1.inputs.Config)
# strategy.surrogates.surrogates[0].input_preprocessing_specs



tensor([[-19.4762]], dtype=torch.float64, grad_fn=<MeanBackward1>)

# Add Classification Models for Surrogates

Updating the surrogates to allow for classification of output values (i.e. 'feasible' or 'infeasible').

### Housekeeping changes

1. Update the categorical input/outputs ('bofire/data_models/features/categorical.py') to always return a tuple instead of a list for `categories` and attribute (to prevent mutation)
    - Associated test are changed in 'tests/bofire/data_models/specs/features.py'
2. 

### Classification Models

Initially, we are only interested in checking whether or not certain points are feasible or infeasible, hence this is a binary classification problem. 


### Questions

1. Should we force `allowed` to be a tuple for the categorical input/outputs? If so, we need to refactor indexing for Pandas DFs...