# Classification Surrogate Tests

We are interested in testing whether or not a surrogate model can correctly identify unknown constraints based on categorical criteria with classification surrogates. Essentially, we want to account for scenarios where specialists can look at a set of experiments and label outcomes as 'acceptable', 'unacceptable', 'ideal', etc. 

This involves new models that produce `CategoricalOutput`'s rather than continuous outputs. Mathematically, if $g_{\theta}:\mathbb{R}^d\to[0,1]^c$ represents the function governed by learnable parameters $\theta$ which outputs a probability vector over $c$ potential classes (i.e. for input $x\in\mathbb{R}^d$, $g_{\theta}(x)^\top\mathbf{1}=1$ where $\mathbf{1}$ is the vector of all 1's) and we have acceptibility criteria for the corresponding classes given by $a\in\{0,1\}^c$, we can compute the scalar output $g_{\theta}(x)^\top a\in[0,1]$ which represents the expected value of acceptance as an objective value to be passed in as a constrained function.

In this script, we look at a modified and constrained version of the optimization problem associated with the [Levy function](https://www.sfu.ca/~ssurjano/levy.html), which has a global minima at $x^*=\mathbf{1}$. We classify constraints for three classes: 'acceptable', 'unacceptable', and 'ideal' based on how close we are to the optimal decision variable; obviously, this value is unknown in a real-world setting, but this serves as a reasonable example.

Initially, this script contains an example of JUST training the classification surrogate on the generated data.

In [1]:
# Import packages
import bofire.strategies.api as strategies
from bofire.data_models.api import Domain, Outputs, Inputs
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput, CategoricalOutput, CategoricalInput
from bofire.data_models.objectives.api import MinimizeObjective, MinimizeSigmoidObjective, ConstrainedCategoricalObjective
import numpy as np
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## Manual setup of the optimization domain

The following cells show how to manually setup the optimization problem in BoFire for didactic purposes.

In [2]:
# Write a function which scales the inputs according to the Levy function - i.e. computes $w_i$
def scale_inputs(x: pd.Series) -> pd.Series:
    return 1 + (x - 1) / 4

In [3]:
# Set-up the inputs and outputs, use categorical domain just as an example
input_features = Inputs(features=[ContinuousInput(key=f"x_{i}", bounds=(-2, 2)) for i in range(5)] + [CategoricalInput(key=f"x_5", categories=["0", "1"], allowed=[True, True])])

# here the minimize objective is used, if you want to maximize you have to use the maximize objective.
output_features = Outputs(features=[
        ContinuousOutput(key=f"f_{0}", objective=MinimizeObjective(w=1.)),
        CategoricalOutput(key=f"f_{1}", categories=["unacceptable", "acceptable", "ideal"], objective=ConstrainedCategoricalObjective(categories=["unacceptable", "acceptable", "ideal"], desirability=[False, True, True])), # This function will be associated with learning the categories
        ContinuousOutput(key=f"f_{2}", objective=MinimizeSigmoidObjective(w=1., tp=0.0, steepness=0.5)),
    ]
)

# Create domain
domain1 = Domain(inputs=input_features, outputs=output_features)

# Sample random points
sample_df = domain1.inputs.sample(50)

# Write a function which outputs one continuous variable and another discrete based on some logic
sample_df["f_0"] = np.sin(np.pi * scale_inputs(sample_df["x_0"])) ** 2 + sum([(scale_inputs(sample_df[col]) - 1) ** 2 * (1 + 10 * np.sin(np.pi * scale_inputs(sample_df[col]) + 1) ** 2 if ind < len(sample_df.columns) else 1 + np.sin(2 * np.pi * scale_inputs(sample_df[col])) ** 2) for ind, col in enumerate(sample_df.columns) if not sample_df[col].dtype == "O"])
sample_df["f_1"] = "unacceptable"
sample_df.loc[(sample_df[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) <= 6.5) * (sample_df[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) >= 3.5), "f_1"] = "acceptable"
sample_df.loc[(sample_df[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) <= 5.5) * (sample_df[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) >= 4.5), "f_1"] = "ideal"
sample_df["f_2"] = sample_df["x_0"] + 1e-2 * np.random.uniform(size=(len(sample_df),))

sample_df.head(20)

Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0,f_1,f_2
0,-0.267947,1.021106,1.62306,0.627026,1.206148,0,1.1407,ideal,-0.26567
1,-0.988891,-0.747924,-0.07738,-1.366909,0.122713,0,4.933869,unacceptable,-0.981222
2,0.171629,-0.592879,0.188483,-0.706858,-0.710738,0,1.585117,unacceptable,0.173409
3,-1.745543,1.718025,-0.866392,-0.236673,1.936513,1,6.809027,unacceptable,-1.740944
4,0.46121,-0.087926,0.883033,-0.319482,-1.813779,1,5.272989,acceptable,0.461377
5,1.386029,-0.098083,0.174276,0.804043,0.321967,0,0.470671,unacceptable,1.391563
6,-1.946305,-0.055122,1.468864,-1.514258,-1.641203,0,13.303294,unacceptable,-1.938876
7,0.381642,1.842479,-1.500184,0.35371,0.508349,0,3.959427,ideal,0.390396
8,-0.428826,-1.958059,-0.221027,-1.372467,1.275357,0,9.170777,ideal,-0.419537
9,-0.47331,-0.402426,-0.608333,-0.40341,1.495413,1,1.713585,unacceptable,-0.465823


## Evaluate the classification model performance (outside of the optimization procedure)

In [4]:
# Import packages
import bofire.surrogates.api as surrogates
from bofire.data_models.surrogates.api import ClassificationMLPEnsemble
from bofire.surrogates.diagnostics import ClassificationMetricsEnum

# Instantiate the surrogate model 
model = ClassificationMLPEnsemble(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_1")]), lr=0.01, n_epochs=100, hidden_layer_sizes=(20,10,))
surrogate = surrogates.map(model)

# Fit the model to the classification data
cv_df = sample_df.drop(["f_0", "f_2"], axis=1)
cv_df["valid_f_1"] = 1
cv = surrogate.cross_validate(cv_df, folds=5)




In [5]:
# Print results
cv[0].get_metrics(metrics=ClassificationMetricsEnum, combine_folds=True) # print training set performance

Unnamed: 0,ACCURACY,F1
0,0.745,0.745


In [6]:
cv[1].get_metrics(metrics=ClassificationMetricsEnum, combine_folds=True) # print test set performance

Unnamed: 0,ACCURACY,F1
0,0.32,0.32


## Setup strategy and ask for candidates



In [7]:
from bofire.data_models.acquisition_functions.api import qEI
from bofire.data_models.strategies.api import SoboStrategy
from bofire.data_models.surrogates.api import BotorchSurrogates, ClassificationMLPEnsemble, MixedSingleTaskGPSurrogate
from bofire.data_models.domain.api import Outputs

strategy_data = SoboStrategy(domain=domain1, 
                             acquisition_function=qEI(), 
                             surrogate_specs=BotorchSurrogates(surrogates=
                                    [
                                        ClassificationMLPEnsemble(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_1")]), lr=0.01, n_epochs=100, hidden_layer_sizes=(20,10,)),
                                        MixedSingleTaskGPSurrogate(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_2")]))
                                    ]
                                )
                            )

strategy = strategies.map(strategy_data)

strategy.tell(sample_df)

In [8]:
candidates = strategy.ask(10)
candidates



Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_1_pred,f_1_sd,f_0_pred,f_2_pred,...,f_1_acceptable_prob,f_1_ideal_prob,f_0_sd,f_2_sd,f_1_unacceptable_sd,f_1_acceptable_sd,f_1_ideal_sd,f_0_des,f_2_des,f_1_des
0,0.744661,-0.014262,-0.451975,2.0,0.264316,1,acceptable,0.0,-0.794483,0.751355,...,0.781416,0.197786,0.74973,0.003478,0.011921,0.436942,0.438615,0.794483,0.40717,0.979202
1,-0.025971,0.538385,0.445088,-0.029734,0.207444,1,acceptable,0.0,-0.061202,-0.021137,...,0.878746,0.119988,0.471483,0.003391,0.001982,0.270151,0.268299,0.061202,0.502642,0.998734
2,0.643,0.170813,2.0,2.0,0.250742,1,acceptable,0.0,-0.536501,0.649062,...,0.94775,0.050431,1.06992,0.003621,0.004066,0.11679,0.112724,0.536501,0.419572,0.998181
3,1.046492,1.009127,1.693894,2.0,2.0,1,acceptable,0.0,1.483338,1.05434,...,0.79997,0.199863,1.958267,0.003926,0.000285,0.447182,0.446908,-1.483338,0.371177,0.999833
4,0.249543,0.448522,2.0,-0.021518,0.143814,1,acceptable,0.0,-0.015632,0.253437,...,0.997194,0.002804,0.752782,0.003477,4e-06,0.006273,0.006269,0.015632,0.468363,0.999998
5,0.854194,0.464335,1.579664,0.782255,1.068275,1,acceptable,0.0,0.071247,0.860109,...,0.801522,0.197997,0.378974,0.003551,0.001074,0.443808,0.442734,-0.071247,0.394113,0.999519
6,0.884042,0.406672,1.494174,2.0,0.157301,1,acceptable,0.0,-0.616233,0.891074,...,0.860195,0.136536,0.916442,0.003627,0.007275,0.312579,0.305304,0.616233,0.390422,0.996731
7,0.143312,0.679986,-0.116402,0.081383,0.247956,1,acceptable,0.0,-0.067516,0.147784,...,0.795419,0.189967,0.4702,0.003365,0.009657,0.426349,0.424642,0.067516,0.481535,0.985387
8,-0.293373,0.876231,0.524355,0.052708,0.143248,1,acceptable,0.0,0.195204,-0.286897,...,0.914762,0.084724,0.575528,0.003441,0.000992,0.189307,0.189448,-0.195204,0.535801,0.999486
9,0.395178,0.358173,1.4053,-0.040535,0.22336,1,acceptable,0.0,-0.125187,0.39909,...,0.973413,0.026521,0.356968,0.003403,0.000143,0.059446,0.059303,0.125187,0.450279,0.999934


## Check classification of proposed candidates

Use the logic from above to verify the classification values

In [9]:
# Append to the candidates
candidates["f_1_true"] = "unacceptable"
candidates.loc[(candidates[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) <= 6.5) * (candidates[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) >= 3.5), "f_1_true"] = "acceptable"
candidates.loc[(candidates[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) <= 5.5) * (candidates[input_features.get_keys(includes=ContinuousInput, excludes=CategoricalInput)].abs().sum(1) >= 4.5), "f_1_true"] = "ideal"

In [10]:
# Print results
candidates[["f_1_pred", "f_1_true"]]

Unnamed: 0,f_1_pred,f_1_true
0,acceptable,unacceptable
1,acceptable,unacceptable
2,acceptable,ideal
3,acceptable,unacceptable
4,acceptable,unacceptable
5,acceptable,ideal
6,acceptable,ideal
7,acceptable,unacceptable
8,acceptable,unacceptable
9,acceptable,unacceptable
