## The Ax Benchmarking Suite

Ax makes it easy to evaluate performance of Bayesian optimization methods on synthetic problems through the use of benchmarking tools. This notebook illustrates how the benchmark suite can be used to easy test new methods on custom problems.

#### Defining a problem

The first step is to define the benchmark problem. There are a collection of built-in useful benchmark problems, such as this constrained and noisy version of the Hartmann 6 problem as used in Letham et al. (2018):

In [1]:
from ax.benchmark.benchmark_problem import hartmann6_constrained

Custom problems can be defined by creating a [`BenchmarkProblem`](link to API ref) object, as is done here for the constrained problem from Gramacy et al. (2016):

In [2]:
import numpy as np

from ax.benchmark.benchmark_problem import BenchmarkProblem
from ax.core.objective import Objective
from ax.core.optimization_config import OptimizationConfig
from ax.core.outcome_constraint import ComparisonOp, OutcomeConstraint
from ax.core.parameter import ParameterType, RangeParameter
from ax.core.search_space import SearchSpace
from ax.metrics.noisy_function import NoisyFunctionMetric

# Create a Metric object for each function used in the problem
class GramacyObjective(NoisyFunctionMetric):
    def f(self, x: np.ndarray) -> float:
        return x.sum()

class GramacyConstraint1(NoisyFunctionMetric):
    def f(self, x: np.ndarray) -> float:
        return 1.5 - x[0] - 2 * x[1] - 0.5 * np.sin(2 * np.pi * (x[0] ** 2 - 2 * x[1]))

class GramacyConstraint2(NoisyFunctionMetric):
    def f(self, x: np.ndarray) -> float:
        return x[0] ** 2 + x[1] ** 2 - 1.5

# Create the search space and optimization config
search_space = SearchSpace(
    parameters=[
        RangeParameter(name="x1", parameter_type=ParameterType.FLOAT, lower=0.0, upper=1.0),
        RangeParameter(name="x2", parameter_type=ParameterType.FLOAT, lower=0.0, upper=1.0),
    ]
)

optimization_config=OptimizationConfig(
    objective=Objective(
        metric=GramacyObjective(
            name="objective", param_names=["x1", "x2"], noise_sd=0.2
        ),
        minimize=True,
    ),
    outcome_constraints=[
        OutcomeConstraint(
            metric=GramacyConstraint1(name="constraint_1", param_names=["x1", "x2"], noise_sd=0.2),
            op=ComparisonOp.LEQ,
            bound=0,
            relative=False,
        ),
        OutcomeConstraint(
            metric=GramacyConstraint2(name="constraint_2", param_names=["x1", "x2"], noise_sd=0.2),
            op=ComparisonOp.LEQ,
            bound=0,
            relative=False,
        ),
    ],
)

# Create a BenchmarkProblem object
gramacy_problem = BenchmarkProblem(
    name="Gramacy",
    fbest=0.5998,
    optimization_config=optimization_config,
    search_space=search_space,
)

#### Defining optimization methods

The Bayesian optimization methods to be used in benchmark runs are defined as a [`GenerationStrategy`](link to API ref), which essentially is a list of model factory functions and a specification of how many iterations to use each model for.

A GenerationStrategy can be defined using the built-in factory functions, like here for a strategy that begins with 5 points from a (non-scrambled) Sobol sequence and then switches to Bayesian optimization with a GP and EI for an additional 20 iterations:

In [3]:
from ax.modelbridge.factory import get_sobol, get_GPEI
from ax.modelbridge.generation_strategy import GenerationStrategy

def unscrambled_sobol(search_space):
    return get_sobol(search_space, scramble=False)

strategy1 = GenerationStrategy(
    model_factories=[unscrambled_sobol, get_GPEI],
    arms_per_model=[5, 15],
)

We can also easily create purely (quasi-)random strategies for comparison:

In [4]:
strategy2 = GenerationStrategy(
    model_factories=[unscrambled_sobol, get_sobol],
    arms_per_model=[5, 15],
)

We can benchmark custom methods by creating a factory that returns a ModelBridge object for the custom model ([see documentation here](link to "Deeper Dive" section on model documentation)). Here we create a custom model factory function that uses botorch's implementation of EI with a plug-in estimate for the incumbent best (rather than the noisy EI used by default in `get_GPEI`).

In [5]:
from ax.modelbridge.torch import TorchModelBridge
from ax.modelbridge.transforms.unit_x import UnitX
from ax.modelbridge.transforms.standardize_y import StandardizeY

def get_plugin_EI(experiment, data, search_space):
    botorch_model = BotorchModel(acquisition_function_name="qEI")  # This can be any implementation of TorchModel
    return TorchModelBridge(
        experiment=experiment,
        search_space=search_space,
        data=data,
        model=botorch_model,
        transforms=[UnitX, StandardizeY],
    )

strategy3 = GenerationStrategy(
    model_factories=[unscrambled_sobol, get_plugin_EI],
    arms_per_model=[5, 15],
)

### Running the benchmarks

We now run the benchmarks, which using the BOBenchmarkingSuite object will run each of the supplied methods on each of the supplied problems. Note that this runs a real set of benchmarks and so will take about XX minutes to complete:

In [6]:
from ax.benchmark.benchmark_suite import BOBenchmarkingSuite

b = BOBenchmarkingSuite()

b.run(
    num_trials=10,  # Each benchmark task is repeated this many times
    total_iterations=20,  # The total number of iterations in each optimization
    batch_size=5,  # Number of synchronous parallel evaluations
    #bo_strategies=[strategy1, strategy2, strategy3], # TODO include all methods once botorch is faster
    bo_strategies=[strategy2],
    bo_problems=[hartmann6_constrained, gramacy_problem],
)

[INFO 03-26 10:05:23] ax.benchmark.benchmark_runner: Testing unscrambled_sobol+sobol on hartmann6:
[INFO 03-26 10:05:23] ax.benchmark.benchmark_runner: Run 0
[INFO 03-26 10:05:23] ax.benchmark.benchmark_runner: Run 1
[INFO 03-26 10:05:23] ax.benchmark.benchmark_runner: Run 2
[INFO 03-26 10:05:24] ax.benchmark.benchmark_runner: Run 3
[INFO 03-26 10:05:24] ax.benchmark.benchmark_runner: Run 4
[INFO 03-26 10:05:24] ax.benchmark.benchmark_runner: Run 5
[INFO 03-26 10:05:24] ax.benchmark.benchmark_runner: Run 6
[INFO 03-26 10:05:25] ax.benchmark.benchmark_runner: Run 7
[INFO 03-26 10:05:25] ax.benchmark.benchmark_runner: Run 8
[INFO 03-26 10:05:25] ax.benchmark.benchmark_runner: Run 9
[INFO 03-26 10:05:25] ax.benchmark.benchmark_runner: Testing unscrambled_sobol+sobol on Gramacy:
[INFO 03-26 10:05:25] ax.benchmark.benchmark_runner: Run 0
[INFO 03-26 10:05:26] ax.benchmark.benchmark_runner: Run 1
[INFO 03-26 10:05:26] ax.benchmark.benchmark_runner: Run 2
[INFO 03-26 10:05:26] ax.benchmark.be

<ax.benchmark.benchmark_runner.BOBenchmarkRunner at 0x7f7819be8780>

Once the benchmark is finished running, we can geneate a report that shows the optimization performance for each method, as well as the wall time spent in model fitting and in candidate generation by each method.

In [None]:
# TODO: Is it possible to shrink the width of the report below so you don't have to scroll to see the whole
# thing?

In [9]:
from IPython.core.display import HTML

report = b.generate_report(include_individual=False)
HTML(report)

In [1]:
# TODO add reference section