# Using YAHPO Gym: A quick introduction

With YAHPO Gym we can benchmark a new hyperparameter optimization method on a large amount of problems in a very short time-frame.

This tutorial walks us through the core concepts and functionality of `yahpo_gym` and showcases a practical example.

YAHPO Gym consists of several collections of `instances`, so-called `scenarios`.

The `instances` within a `scenario` reflect different datasets on which hyperparameter optimization is performed on, but share the same hyperparameter optimization task. Thus, they share the same search space and the same targets.

To provide a more concrete example, the collection of all instances in `"lcbench"` is a `scenario`,
while a single task (e.g. task `"3945"`) is called an `instance`.

An `instance` thus defines a single HPO problem for a given ML algorithm and task.

## Core functionality: BenchmarkSet
A `BenchmarkSet` can be instantiated using a `scenario` and an `instance`. 

It contains all logic required to evaluate the surrogate mdoel for a fiven hyperparameter configuration.


In [1]:
from yahpo_gym import *
b = BenchmarkSet(scenario="lcbench")

This allows us to query several important properties of the benchmark problem:

- scenario : The scenario f the configuration
- y_names  : The names of the target variables included in the surrogate model
- hp_names: The names of all hyperparameters
- cat_names : The names of categorical hyperparameters
- cont_names  :  The names of continuous hyperparameters
- fidelity_params  : The name of the fidelity parameter(s)
- instance_names : The column pertaining to the available instances in a dataset
- runtime_name : The name of parameters remeasuring runtime of  the model. 
- data : A `pandas` `DataFrame` containing the data used to train the surrogates. Only available if the data was downloaded.

In [5]:
# We can for example query the target outputs of our surrogate:
b.targets

['time',
 'val_accuracy',
 'val_cross_entropy',
 'val_balanced_accuracy',
 'test_cross_entropy',
 'test_balanced_accuracy']

In [2]:
# Or the available instances:
b.instances

['3945',
 '7593',
 '34539',
 '126025',
 '126026',
 '126029',
 '146212',
 '167083',
 '167104',
 '167149',
 '167152',
 '167161',
 '167168',
 '167181',
 '167184',
 '167185',
 '167190',
 '167200',
 '167201',
 '168329',
 '168330',
 '168331',
 '168335',
 '168868',
 '168908',
 '168910',
 '189354',
 '189862',
 '189865',
 '189866',
 '189873',
 '189905',
 '189906',
 '189908',
 '189909']

A list of all available scenarios can be obtained using `list_scenarios()`:

In [3]:
list_scenarios()

['lcbench',
 'fcnet',
 'nb301',
 'rbv2_svm',
 'rbv2_ranger',
 'rbv2_rpart',
 'rbv2_glmnet',
 'rbv2_xgboost',
 'rbv2_aknn',
 'rbv2_super',
 'iaml_ranger',
 'iaml_rpart',
 'iaml_glmnet',
 'iaml_xgboost',
 'iaml_super']

We can now set an instance, this defines the instance (i.e. concrete dataset) to be evaluated.

We can furthermore obtain the search space (a `ConfigSpace`) using `get_opt_space()`.

Sample a concrete configuration and evaluate it using `objective_function`.

In [4]:
# Set an instance
b.set_instance("3945")
# Sample a point from the configspace
xs = b.get_opt_space().sample_configuration(1)
# Evaluate the configurattion
b.objective_function(xs)

[{'time': 10.814257,
  'val_accuracy': 78.32861,
  'val_cross_entropy': 0.62682784,
  'val_balanced_accuracy': 0.59657735,
  'test_cross_entropy': 0.5863595,
  'test_balanced_accuracy': 0.59338015}]

The input to `objective_function` can be a `Dictionary` or a `ConfigSpace.Configuration`:

In [5]:
xs

Configuration:
  OpenML_task_id, Constant: '3945'
  batch_size, Value: 171
  epoch, Value: 17
  learning_rate, Value: 0.00048312752361389624
  max_dropout, Value: 0.7456459659315888
  max_units, Value: 157.55779409778975
  momentum, Value: 0.11712091337258465
  num_layers, Value: 2
  weight_decay, Value: 0.03826906311299458

## Setup (One Time)

Before first use, `yahpo_gym` requires a simple one-time setup step to
download all meta-data required for prediction using surrogates.

This **metadata** can be downloaded (or cloned) from GitHub:
(https://github.com/slds-lmu/yahpo_data)

Once downloaded, we can run the chunk below to set up the path to the downloaded metadata.
The following chunk assumes, you downloaded the **metadata** to the "~/yahpo_data" directory.

In [17]:
# Initialize the local config & set path for surrogates and metadata
from yahpo_gym import local_config
local_config.init_config()
local_config.set_data_path("~/yahpo_data")

## Separate Fidelity Space
For some scenarios we require the search space / configuration without
the *fidelity parameters*. 
This can be achieved using `drop_fidelity_params':

In [10]:
b = BenchmarkSet("lcbench", instance = "3945")
# Sample a point from the configspace
xs = b.get_opt_space(drop_fidelity_params=True).sample_configuration(1)

In [11]:
# Drop the fidelity param 'epoch':
xs

Configuration:
  OpenML_task_id, Constant: '3945'
  batch_size, Value: 33
  learning_rate, Value: 0.08676686454385556
  max_dropout, Value: 0.6256452001406675
  max_units, Value: 644.4361804106277
  momentum, Value: 0.5468432946724163
  num_layers, Value: 1
  weight_decay, Value: 0.08029711547491447

In [12]:
# Convert to dictionary and add epoch
xs = xs.get_dictionary()
xs.update({'epoch':52})
xs

{'OpenML_task_id': '3945',
 'batch_size': 33,
 'learning_rate': 0.08676686454385556,
 'max_dropout': 0.6256452001406675,
 'max_units': 644.4361804106277,
 'momentum': 0.5468432946724163,
 'num_layers': 1,
 'weight_decay': 0.08029711547491447,
 'epoch': 52}

In [13]:
b.objective_function(xs)

[{'time': 529.00366,
  'val_accuracy': 95.132614,
  'val_cross_entropy': 0.49781695,
  'val_balanced_accuracy': 0.8299375,
  'test_cross_entropy': 0.50857186,
  'test_balanced_accuracy': 0.82421905}]