# Classification Surrogate Tests

We are interested in testing whether or not a surrogate model can correctly identify unknown constraints based on categorical criteria with classification surrogates. Essentially, we want to account for scenarios where specialists can look at a set of experiments and label outcomes as 'acceptable', 'unacceptable', 'ideal', etc. 

This involves new models that produce `CategoricalOutput`'s rather than continuous outputs. Mathematically, if $g_{\theta}:\mathbb{R}^d\to[0,1]^c$ represents the function governed by learnable parameters $\theta$ which outputs a probability vector over $c$ potential classes (i.e. for input $x\in\mathbb{R}^d$, $g_{\theta}(x)^\top\mathbf{1}=1$ where $\mathbf{1}$ is the vector of all 1's) and we have acceptibility criteria for the corresponding classes given by $a\in[0,1]^c$, we can compute a scalar output as $g_{\theta}(x)^\top a\in[0,1]$ as an objective value to be passed in as a constrained function.

In this script, we look at a modified and constrained version of the optimization problem associated with the [Levy function](https://www.sfu.ca/~ssurjano/levy.html), which has a global minima at $x^*=\mathbf{1}$. We classify constraints for three classes: 'acceptable', 'unacceptable', and 'ideal' based on how close we are to the optimal decision variable; obviously, this value is unknown in a real-world setting, but this serves as a reasonable example.

In [1]:
# Import packages
import bofire.strategies.api as strategies
from bofire.data_models.api import Domain, Outputs, Inputs
from bofire.data_models.features.api import ContinuousInput, ContinuousOutput, CategoricalOutput, CategoricalInput
from bofire.data_models.objectives.api import MinimizeObjective, MinimizeSigmoidObjective, CategoricalObjective
import numpy as np
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## Manual setup of the optimization domain

The following cells show how to manually setup the optimization problem in BoFire for didactic purposes.

In [2]:
# Write a function which scales the inputs according to the Levy function - i.e. computes $w_i$
def scale_inputs(x: pd.Series) -> pd.Series:
    return 1 + (x - 1) / 4

In [3]:
# Set-up the inputs and outputs, use categorical domain just as an example
input_features = Inputs(features=[ContinuousInput(key=f"x_{i}", bounds=(-2, 2)) for i in range(5)] + [CategoricalInput(key=f"x_5", categories=(0.0, 1.0))])

# here the minimize objective is used, if you want to maximize you have to use the maximize objective.
output_features = Outputs(features=[
        ContinuousOutput(key=f"f_{0}", objective=MinimizeObjective(w=1.)),
        CategoricalOutput(key=f"f_{1}", categories=("unacceptable", "acceptable", "ideal"), objective=CategoricalObjective(desirability=(0, 0.5, 1))), # This function will be associated with learning the categories
        ContinuousOutput(key=f"f_{2}", objective=MinimizeSigmoidObjective(w=1., tp=0.0, steepness=0.5)),
    ]
)

# Create domain
domain1 = Domain(inputs=input_features, outputs=output_features)

# Sample random points
sample_df = domain1.inputs.sample(50).astype(float) # Sample x's

# Write a function which outputs one continuous variable and another discrete based on some logic
sample_df["f_0"] = np.sin(np.pi * scale_inputs(sample_df["x_0"])) ** 2 + sum([(scale_inputs(sample_df[col]) - 1) ** 2 * (1 + 10 * np.sin(np.pi * scale_inputs(sample_df[col]) + 1) ** 2 if ind < len(sample_df.columns) else 1 + np.sin(2 * np.pi * scale_inputs(sample_df[col])) ** 2) for ind, col in enumerate(sample_df.columns)])
sample_df["f_1"] = "unacceptable"
sample_df.loc[sample_df[input_features.get_keys()].sum(1) >= 1.0, "f_1"] = "acceptable"
sample_df.loc[sample_df[input_features.get_keys()].sum(1) >= 2.0, "f_1"] = "ideal"
sample_df["f_2"] = sample_df["x_0"] + 1e-2 * np.random.uniform(size=(len(sample_df),))

sample_df.head(20)

Unnamed: 0,x_0,x_1,x_2,x_3,x_4,x_5,f_0,f_1,f_2
0,0.435454,-1.788437,-1.988452,-0.421175,-1.708184,0.0,15.230583,unacceptable,0.437269
1,-0.839497,-1.317481,-0.621791,1.02946,-0.029063,1.0,4.09289,unacceptable,-0.829668
2,0.457487,-1.63237,-0.219897,-0.589791,-0.543485,0.0,4.653823,unacceptable,0.466307
3,-0.545863,1.068268,-0.840059,-0.156614,0.532427,0.0,1.944284,unacceptable,-0.544115
4,1.294465,-1.086366,-1.44469,-1.005573,-0.743576,1.0,5.516022,unacceptable,1.303366
5,-0.826168,-1.615002,1.624282,0.041478,0.413251,1.0,5.652505,unacceptable,-0.8198
6,0.213634,-0.469763,0.950611,1.945084,-0.78382,1.0,1.69562,ideal,0.214467
7,-0.570017,0.019903,-1.137774,-0.417314,1.457074,1.0,2.90937,unacceptable,-0.567218
8,-0.75636,-0.905112,1.53684,-0.081425,0.350125,0.0,2.621606,unacceptable,-0.74713
9,1.6159,-0.951538,1.450884,0.62333,0.232692,0.0,1.690015,ideal,1.622131


## Setup of the Strategy and ask for Candidates



In [4]:
from bofire.data_models.acquisition_functions.api import qEI
from bofire.data_models.strategies.api import SoboStrategy
from bofire.data_models.surrogates.api import BotorchSurrogates, MLPClassifierEnsemble, MixedSingleTaskGPSurrogate, GPClassifier
from bofire.data_models.domain.api import Outputs

# strategy_data = SoboStrategy(domain=domain1, 
#                              acquisition_function=qEI(), 
#                              surrogate_specs=BotorchSurrogates(surrogates=
#                                     [
#                                         MLPClassifierEnsemble(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_1")]), lr=1.0, n_epochs=50, hidden_layer_sizes=(20,)),
#                                         MixedSingleTaskGPSurrogate(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_2")]))
#                                     ]
#                                 )
#                             )
strategy_data = SoboStrategy(domain=domain1, 
                             acquisition_function=qEI(), 
                             surrogate_specs=BotorchSurrogates(surrogates=
                                    [
                                        GPClassifier(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_1")])),
                                        MixedSingleTaskGPSurrogate(inputs=domain1.inputs, outputs=Outputs(features=[domain1.outputs.get_by_key("f_2")]))
                                    ]
                                )
                            )

strategy = strategies.map(strategy_data)

strategy.tell(sample_df)

TypeError: __init__() missing 1 required positional argument: 'task_feature'

In [None]:
candidates = strategy.ask(2)
candidates