# Using YAHPO Gym: A quick introduction

Using YAHPO GYM we can benchmark a new Hyperparameter optimization method on a large amount of problems in a very short time-frame.
This tutorial walks us through the core concepts and functionality of `yahpo_gym` and showscases a practical example.

YAHPO GYM consists of several collections of `instances`, so-called `scenarios`.

All `instances` within a `scenario` share the same search space and targets.

To provide a more concrete example, the collection of all tasks in `"lcbench"` is a `scenario`,
while a single task (e.g. task `"3945"`) is called an `instance`.

An `instance` thus defines a single HPO problem for a given ML algorithm and task.

## Core functionality: BenchmarkSet
A `BenchmarkSet` can be instantiated using a `scenario` and an `instance. 

It contains all logic required to evaluate tthe surrogate mdoel for a fiven hyperparameter configuration.


In [3]:
from yahpo_gym import *
b = BenchmarkSet(scenario="lcbench")

This allows us to query several important properties of the benchmark problem:

- scenario : The scenario f the configuration
- y_names  : The names of the target variables included in the surrogate model
- hp_names: The names of all hyperparameters
- cat_names : The names of categorical hyperparameters
- cont_names  :  The names of continuous hyperparameters
- fidelity_params  : The name of the fidelity parameter(s)
- instance_names : The column pertaining to the available instances in a dataset
- runtime_name : The name of parameters remeasuring runtime of  the model. 
- data : A `pandas` `DataFrame` containing the data used to train the surrogates. Only available if the data was downloaded.

In [18]:
# We can for example query the target outputs of our surrogate:
b.targets

['time',
 'val_accuracy',
 'val_cross_entropy',
 'val_balanced_accuracy',
 'test_cross_entropy',
 'test_balanced_accuracy']

In [19]:
# Or the available instances:
b.instances

['3945',
 '7593',
 '34539',
 '126025',
 '126026',
 '126029',
 '146212',
 '167083',
 '167104',
 '167149',
 '167152',
 '167161',
 '167168',
 '167181',
 '167184',
 '167185',
 '167190',
 '167200',
 '167201',
 '168329',
 '168330',
 '168331',
 '168335',
 '168868',
 '168908',
 '168910',
 '189354',
 '189862',
 '189865',
 '189866',
 '189873',
 '189905',
 '189906',
 '189908',
 '189909']

A list of all available scenarios can be obtained using `list_scenarios()`:

In [20]:
list_scenarios()

['lcbench',
 'fcnet',
 'nb301',
 'rbv2_svm',
 'rbv2_ranger',
 'rbv2_rpart',
 'rbv2_glmnet',
 'rbv2_xgboost',
 'rbv2_aknn',
 'rbv2_super',
 'iaml_ranger',
 'iaml_rpart',
 'iaml_glmnet',
 'iaml_xgboost',
 'iaml_super']

We can now set an instance, this defines the instance (i.e. concrete dataset) to be evaluated.

We can furthermore obtain the search space (a `ConfigSpace`) using `get_opt_space()`.

Sample a concrete configuration and evaluate it using `objective_function`.

In [29]:
# Set an instance
b.set_instance("3945")
# Sample a point from the configspace
xs = b.get_opt_space().sample_configuration(1)
# Evaluate the configurattion
b.objective_function(xs)

[{'time': 1.017689,
  'val_accuracy': 85.35962,
  'val_cross_entropy': 0.406623,
  'val_balanced_accuracy': 0.7048986,
  'test_cross_entropy': 0.5105626,
  'test_balanced_accuracy': 0.69233507}]

The input to `objective_function` can be a `Dictionary` or a `ConfigSpace.Configuration`:

In [30]:
xs

Configuration:
  OpenML_task_id, Constant: '3945'
  batch_size, Value: 56
  epoch, Value: 44
  learning_rate, Value: 0.004042817033852988
  max_dropout, Value: 0.6580329526348683
  max_units, Value: 528.9344115500733
  momentum, Value: 0.7264631114797273
  num_layers, Value: 1
  weight_decay, Value: 0.047354860575576266

## Setup (One Time)

Before first use, `yahpo_gym` requires a simple one-time setup step to
download all meta-data required for prediction using surrogates.

This **metadata** can be downloaded (or cloned) from GitHub:
(https://github.com/slds-lmu/yahpo_data)

Once downloaded, we can run the chunk below to set up the path to the downloaded metadata.
The following chunk assumes, you downloaded the **metadata** to the "~/yahpo_data" directory.

In [None]:
# Initialize the local config & set path for surrogates and metadata
from yahpo_gym import local_config
local_config.init_config()
local_config.set_data_path("~/yahpo_data")

## Separate Fidelity Space
For some scenarios we require the search space / configuration without
the *fidelity parameters*. 
This can be achieved using `drop_fidelity_params':

In [23]:
b = BenchmarkSet("lcbench", instance = "3945")
# Sample a point from the configspace
xs = b.get_opt_space(drop_fidelity_params=True).sample_configuration(1)

In [24]:
# Drop the fidelity param 'epoch':
xs

Configuration:
  OpenML_task_id, Constant: '3945'
  batch_size, Value: 76
  learning_rate, Value: 0.006285128775917201
  max_dropout, Value: 0.2810167283785543
  max_units, Value: 454.16949608235694
  momentum, Value: 0.44264088364496557
  num_layers, Value: 1
  weight_decay, Value: 0.027351429383097828

In [25]:
# Convert to dictionary and add epoch
xs = xs.get_dictionary()
xs.update({'epoch':52})
xs

{'OpenML_task_id': '3945',
 'batch_size': 76,
 'learning_rate': 0.006285128775917201,
 'max_dropout': 0.2810167283785543,
 'max_units': 454.16949608235694,
 'momentum': 0.44264088364496557,
 'num_layers': 1,
 'weight_decay': 0.027351429383097828,
 'epoch': 52}

In [26]:
b.objective_function(xs)

[{'time': 1.017689,
  'val_accuracy': 83.04666,
  'val_cross_entropy': 0.40995848,
  'val_balanced_accuracy': 0.7308816,
  'test_cross_entropy': 0.52149105,
  'test_balanced_accuracy': 0.7200676}]