# Classification Surrogate Tests

We are interested in testing whether or not a surrogate model can correctly identify unknown constraints based on binary feasibility/infeasibility. This involves new models which produce `CategoricalOutput`s rather than continuous outputs. Mathematically, instead of multiplying the objective by $\sigma(x)\in(0,1)$, we multiply by $I(x)$ which is 1 if $x\in X$ otherwise it is 0. Since currently BoTorch does not offer support for discrete feasibility constraints (see: [here](https://github.com/pytorch/botorch/blob/main/botorch/utils/objective.py#L122)), we will instead always multiply our objective directly by the feasibility value

In our toy example, the feasible points satisfy $x_1+x_2<= 1.0$.

In [1]:
# Import packages
import bofire.strategies.api as strategies
from bofire.data_models.api import Domain, Outputs, Inputs
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput, CategoricalOutput, CategoricalInput
from bofire.data_models.objectives.api import MinimizeObjective, MinimizeSigmoidObjective
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


## Manual setup of the optimization domain

The following cell shows how to manually setup the optimization problem in BoFire for didactic purposes. We design a feasible set and output constraints for example.

In [2]:
# Set-up the inputs and outputs, use categorical domain just as an example
input_features = Inputs(features=[ContinuousInput(key=f"x_{i}", bounds=(0, 1)) for i in range(5)] + [CategoricalInput(key=f"x_5", categories=(0.5, 0.0))])

# here the minimize objective is used, if you want to maximize you have to use the maximize objective.
output_features = Outputs(features=[
        ContinuousOutput(key=f"f_{0}", objective=MinimizeObjective(w=1.)),
        # ContinuousOutput(key=f"f_{2}", objective=MinimizeSigmoidObjective(w=1., tp=0.0, steepness=0.5)),
        CategoricalOutput(key=f"f_{1}", categories=["infeasible", "feasible"], objective=[0, 1]) # This function will be associated with learning the feasibility/infeasibility
    ]
)

# Create domain
domain1 = Domain(inputs=input_features, outputs=output_features)

# Sample random points
sample_df = domain1.inputs.sample(20).astype(float) # Sample x's

# Write a function which outputs one continuous variable and another discrete based on some logic
# Here, feasible points are points whose first two components sum to less then 1.0 - in real experiments, these would not be known
sample_df["f_0"] = np.cos(sample_df.values.sum(1))
sample_df["f_1"] = "infeasible"
sample_df.loc[sample_df["x_0"]+sample_df["x_1"] <= 1.0, "f_1"] = "feasible"
# sample_df["f_2"] = np.random.uniform(size=(len(sample_df),))

sample_df.head(5)

Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0,f_1
0,0.502015,0.556355,0.184407,0.135966,0.19129,0.0,0.000763,infeasible
1,0.400085,0.351817,0.848917,0.924594,0.657947,0.5,-0.856798,feasible
2,0.694559,0.481801,0.239983,0.528414,0.642616,0.0,-0.850312,infeasible
3,0.678207,0.840033,0.298988,0.925851,0.740847,0.5,-0.665724,infeasible
4,0.006785,0.787913,0.778813,0.861638,0.290225,0.5,-0.996492,feasible


## Setup of the Strategy and ask for Candidates



In [3]:
from bofire.data_models.acquisition_functions.api import qNEI, qUCB, qSR, qEI
from bofire.data_models.strategies.api import QparegoStrategy, MultiplicativeSoboStrategy, SoboStrategy
from bofire.data_models.surrogates.api import BotorchSurrogates, MLPClassifierEnsemble, MixedSingleTaskGPSurrogate
from bofire.data_models.domain.api import Outputs

strategy_data = SoboStrategy(domain=domain1, 
                             acquisition_function=qEI(), 
                             surrogate_specs=BotorchSurrogates(surrogates=
                                    [
                                        MLPClassifierEnsemble(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs[-1]]), lr=1.0, n_epochs=100, hidden_layer_sizes=(20,)),
                                        # MixedSingleTaskGPSurrogate(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs[1]]))
                                    ]
                                )
                            )

strategy = strategies.map(strategy_data)

strategy.tell(sample_df)

In [4]:
candidates = strategy.ask(2)

candidates



Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0_pred,f_1_pred,f_0_sd,f_1_sd,f_0_des,f_1_des
0,0.723341,1.0,0.703635,0.426167,0.328014,0.5,-1.267539,feasible,0.042117,0.547723,1.267539,1.0
1,0.0,1.0,0.312203,0.450019,0.388926,0.5,-1.250402,feasible,0.042963,0.547723,1.250402,1.0


# Add Classification Models for Surrogates

Updating the surrogates to allow for classification of output values (i.e. 'feasible' or 'infeasible').

### Housekeeping changes

1. Update the categorical input/outputs ('bofire/data_models/features/categorical.py') to always return a tuple instead of a list for `categories` and attribute (to prevent mutation)
    - Associated test are changed in 'tests/bofire/data_models/specs/features.py'
2. 

### Classification Models

Initially, we are only interested in checking whether or not certain points are feasible or infeasible, hence this is a binary classification problem. 


### Questions

1. Should we force `allowed` to be a tuple for the categorical input/outputs? If so, we need to refactor indexing for Pandas DFs...