[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/vizier/blob/main/docs/guides/benchmarks/running_benchmarks.ipynb)

# Running Benchmarks
We will demonstrate below how to use our benchmark runner pipeline.

## Installation and reference imports

In [None]:
!pip install google-vizier

In [None]:
from vizier import algorithms as vza
from vizier import benchmarks
from vizier._src.algorithms.designers import grid

Example experimenter and designer factory which we will use later.

In [None]:
experimenter = benchmarks.NumpyExperimenter(
    benchmarks.bbob.Sphere, benchmarks.bbob.DefaultBBOBProblemStatement(5))

designer_factory = grid.GridSearchDesigner.from_problem

## Algorithms and Experimenters
Every study can be seen conceptually as a simple loop between an algorithm and objective. In terms of code, the algorithm corresponds to a `Designer`/`Policy` and objective to an `Experimenter`.

Below is a simple sequential loop.

In [None]:
designer = designer_factory(experimenter.problem_statement)

for _ in range(100):
  suggestion = designer.suggest()[0]
  trial = suggestion.to_trial()
  experimenter.evaluate([trial])
  completed_trials = vza.CompletedTrials([trial])
  designer.update(completed_trials)

As seen above however, one modification we can make is to use variable batch
sizes, rather than only suggesting and evaluating one-by-one. More generally,
certain implementation details may arise:

*   How many parallel suggestions should the algorithm generate?
*   How many suggestions can be evaluated at once?
*   Should we use early stopping on certain unpromising trials?
*   Should we use a custom stopping condition instead of a fixed for-loop?
*   Can we swap in a different algorithm mid-loop?
*   Can we swap in a different objective mid-loop?

## API
The code flexibility needed to simulate these real-life scenarios may cause
complications as the evaluation benchmark may no longer be stateless. In order
to broadly cover such scenarios, our [API](https://github.com/google/vizier/blob/main/vizier/benchmarks/__init__.py) introduces the `BenchmarkSubroutine`:

In [None]:
class BenchmarkSubroutine(Protocol):
  """Abstraction for core benchmark routines.

  Benchmark protocols are modular alterations of BenchmarkState by reference.
  """

  def run(self, state: BenchmarkState) -> None:
    """Abstraction to alter BenchmarkState by reference."""

All routines use and potentially modify a `BenchmarkState`, which holds information about the objective via an `Experimenter` and the algorithm itself wrapped by an `AlgorithmRunnerProtocol`.

In [None]:
class BenchmarkState:
  """State of a benchmark run. It is altered via benchmark protocols."""

  experimenter: Experimenter
  algorithm: runner_protocol.AlgorithmRunnerProtocol

  @classmethod
  def from_designer_factory(cls, designer_factory: DesignerFactory,
                            experimenter: Experimenter) -> 'BenchmarkState':

  @classmethod
  def from_policy_factory(cls, policy_factory: PolicyFactory,
                          experimenter: Experimenter) -> 'BenchmarkState':

To wrap multiple `BenchmarkSubRoutines` together, we can use the `BenchmarkRunner`:

In [None]:
class BenchmarkRunner(BenchmarkSubroutine):
  """Run a sequence of subroutines, all repeated for a few iterations."""

  # A sequence of benchmark subroutines that alter BenchmarkState.
  benchmark_subroutines: Sequence[BenchmarkSubroutine]
  # Number of times to repeat applying benchmark_subroutines.
  num_repeats: int

  def run(self, state: BenchmarkState) -> None:
    """Run algorithm with benchmark subroutines with repetitions."""

## Example usage
Below is a typical example of simple suggestion and evaluation:

In [None]:
runner = benchmarks.BenchmarkRunner(
    benchmark_subroutines=[
        benchmark_runner.GenerateSuggestions(),
        benchmark_runner.EvaluateActiveTrials()
    ],
    num_repeats=100)

benchmark_state = benchmarks.BenchmarkState.from_designer_factory(
    designer_factory=designer_factory, experimenter=experimenter)

runner.run(benchmark_state)

We may obtain the evaluated trials via the `benchmark_state`, which contains a
`PolicySupporter` via its `algorithm` field:

In [None]:
all_trials = benchmark_state.algorithm.supporter.GetTrials()
print(all_trials)

Note that this design is maximally informative on everything that has happened
so far in the study. For instance, we may also query incomplete/unused
suggestions using the `PolicySupporter`.

## References
*   Runner Protocols can be found in [`runner_protocol.py`](https://github.com/google/vizier/blob/main/vizier/_src/benchmarks/runners/runner_protocol.py).
*   Benchmark Runners can be found in [`benchmark_runner.py`](https://github.com/google/vizier/blob/main/vizier/_src/benchmarks/runners/benchmark_runner.py).

