# Classification Surrogate Tests

We are interested in testing whether or not a surrogate model can correctly identify unknown constraints based on categorical criteria with classification surrogates. Essentially, we want to account for scenarios where specialists can look at a set of experiments and label outcomes as 'acceptable', 'unacceptable', 'ideal', etc. 

This involves new models that produce `CategoricalOutput`'s rather than continuous outputs. Mathematically, if $g_{\theta}:\mathbb{R}^d\to[0,1]^c$ represents the function governed by learnable parameters $\theta$ which outputs a probability vector over $c$ potential classes (i.e. for input $x\in\mathbb{R}^d$, $g_{\theta}(x)^\top\mathbf{1}=1$ where $\mathbf{1}$ is the vector of all 1's) and we have acceptibility criteria for the corresponding classes given by $a\in[0,1]^c$, we can compute a scalar output as $g_{\theta}(x)^\top a\in[0,1]$ as an objective value to be passed in as a constrained function.

In this script, we look at a modified and constrained version of the optimization problem associated with the [Levy function](https://www.sfu.ca/~ssurjano/levy.html), which has a global minima at $x^*=\mathbf{1}$. We classify constraints for three classes: 'acceptable', 'unacceptable', and 'ideal' based on how close we are to the optimal decision variable; obviously, this value is unknown in a real-world setting, but this serves as a reasonable example.

In [1]:
# Import packages
import bofire.strategies.api as strategies
from bofire.data_models.api import Domain, Outputs, Inputs
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput, CategoricalOutput, CategoricalInput
from bofire.data_models.objectives.api import MinimizeObjective, MinimizeSigmoidObjective, CategoricalObjective
import numpy as np
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## Manual setup of the optimization domain

The following cells show how to manually setup the optimization problem in BoFire for didactic purposes.

In [2]:
# Write a function which scales the inputs according to the Levy function - i.e. computes $w_i$
def scale_inputs(x: pd.Series) -> pd.Series:
    return 1 + (x - 1) / 4

In [3]:
# Set-up the inputs and outputs, use categorical domain just as an example
input_features = Inputs(features=[ContinuousInput(key=f"x_{i}", bounds=(-2, 2)) for i in range(5)] + [CategoricalInput(key=f"x_5", categories=(0.0, 1.0))])

# here the minimize objective is used, if you want to maximize you have to use the maximize objective.
output_features = Outputs(features=[
        ContinuousOutput(key=f"f_{0}", objective=MinimizeObjective(w=1.)),
        CategoricalOutput(key=f"f_{1}", categories=("unacceptable", "acceptable", "ideal"), objective=CategoricalObjective(desirability=(0, 0.5, 1))), # This function will be associated with learning the categories
        ContinuousOutput(key=f"f_{2}", objective=MinimizeSigmoidObjective(w=1., tp=0.0, steepness=0.5)),
    ]
)

# Create domain
domain1 = Domain(inputs=input_features, outputs=output_features)

# Sample random points
sample_df = domain1.inputs.sample(50).astype(float) # Sample x's

# Write a function which outputs one continuous variable and another discrete based on some logic
sample_df["f_0"] = np.sin(np.pi * scale_inputs(sample_df["x_0"])) ** 2 + sum([(scale_inputs(sample_df[col]) - 1) ** 2 * (1 + 10 * np.sin(np.pi * scale_inputs(sample_df[col]) + 1) ** 2 if ind < len(sample_df.columns) else 1 + np.sin(2 * np.pi * scale_inputs(sample_df[col])) ** 2) for ind, col in enumerate(sample_df.columns)])
sample_df["f_1"] = "unacceptable"
sample_df.loc[sample_df[input_features.get_keys()].sum(1) >= 1.0, "f_1"] = "acceptable"
sample_df.loc[sample_df[input_features.get_keys()].sum(1) >= 2.0, "f_1"] = "ideal"
sample_df["f_2"] = sample_df["x_0"] + 1e-2 * np.random.uniform(size=(len(sample_df),))

sample_df.head(20)

Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0,f_1,f_2
0,-1.607257,0.066965,1.956921,-1.206965,-1.726151,0.0,11.168473,unacceptable,-1.599521
1,0.576852,-0.826301,0.188427,0.228371,-0.548498,0.0,1.232865,unacceptable,0.583328
2,0.426233,-0.353617,-1.501839,-1.336984,1.611887,1.0,5.894178,unacceptable,0.432236
3,0.791177,0.517637,1.05484,1.702119,-0.133859,1.0,0.538519,ideal,0.794129
4,-0.212858,1.738505,0.047741,0.785163,-0.386461,1.0,1.373196,ideal,-0.204777
5,-1.34381,0.174728,-0.742157,0.828275,-0.383426,0.0,3.939393,unacceptable,-1.341086
6,0.427134,-1.622157,1.390718,-0.325319,-0.767477,0.0,4.739526,unacceptable,0.431781
7,0.314917,1.36796,0.926581,1.710552,-0.843306,0.0,1.48758,ideal,0.316096
8,-1.891133,-1.125407,1.468634,-0.906353,0.456227,1.0,8.206412,unacceptable,-1.888861
9,-0.281061,-0.167345,0.589614,1.681104,1.800321,0.0,1.807627,ideal,-0.272137


## Setup of the Strategy and ask for Candidates



In [4]:
from bofire.data_models.acquisition_functions.api import qEI
from bofire.data_models.strategies.api import SoboStrategy
from bofire.data_models.surrogates.api import BotorchSurrogates, MLPClassifierEnsemble, MixedSingleTaskGPSurrogate
from bofire.data_models.domain.api import Outputs

strategy_data = SoboStrategy(domain=domain1, 
                             acquisition_function=qEI(), 
                             surrogate_specs=BotorchSurrogates(surrogates=
                                    [
                                        MLPClassifierEnsemble(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_1")]), lr=1.0, n_epochs=50, hidden_layer_sizes=(20,)),
                                        MixedSingleTaskGPSurrogate(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_2")]))
                                    ]
                                )
                            )

strategy = strategies.map(strategy_data)

strategy.tell(sample_df)

In [5]:
candidates = strategy.ask(2)
candidates



Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0_pred,f_2_pred,f_1_pred,f_1_pred_unacceptable,f_1_pred_acceptable,f_1_pred_ideal,f_0_sd,f_2_sd,f_1_sd_unacceptable,f_1_sd_acceptable,f_1_sd_ideal,f_0_des,f_2_des,f_1_des
0,0.287768,-0.159834,1.117793,-0.432811,0.891773,1.0,-1.219115,0.292788,unacceptable,0.458888,0.11366,0.427452,0.473508,0.003124,0.101463,0.037833,0.098314,1.219115,0.463467,0.484282
1,-1.768118,0.077809,2.0,-1.189829,-2.0,0.0,12.989883,-1.764363,unacceptable,0.536629,0.119498,0.343873,0.177809,0.003669,0.214828,0.036302,0.215846,-12.989883,0.707274,0.403622
