# Introduction
This notebook includes toy examples to demonstrate how to tune User Defined Functions with `flaml.tune`.

FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `notebook` option:
```bash
pip install flaml[notebook]
```

In [None]:
%pip install flaml[notebook]
# from v0.6.6, catboost is made an optional dependency to build conda package.
# to install catboost without installing the notebook option, you can run:
# %pip install flaml[catboost]

# Basic tuning procedure
## 1. A basic tuning example

In [1]:
'''Set a search space'''
from flaml import tune
config_search_space = {
    "x": tune.lograndint(lower=1, upper=100000),
    "y": tune.randint(lower=1, upper=100000)
}  

In [2]:
'''Write a evaluation function'''
import time
def evaluate_config(config: dict):
    """evaluate a hyperparameter configuration"""
    score = (config["x"] - 85000) ** 2 - config["x"] / config["y"]
    # usually the evaluation takes an non-neglible cost
    # and the cost could be related to certain hyperparameters
    # here we simulate this cost by calling the time.sleep() function
    # here we assume the cost is proportional to x
    faked_evaluation_cost = config["x"] / 100000
    time.sleep(faked_evaluation_cost)
    # we can return a single float as a score on the input config:
    # return score
    # or, we can return a dictionary that maps metric name to metric value:
    return {"score": score, "evaluation_cost": faked_evaluation_cost, "constraint_metric": config["x"] * config["y"]}

In [3]:
''''Performs tuning'''
# require: pip install flaml[blendsearch]
analysis = tune.run(
    evaluate_config,  # the function to evaluate a config
    config=config_search_space,  # the search space defined
    metric="score",
    mode="min",  # the optimization mode, "min" or "max"
    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
    time_budget_s=10,  # the time budget in seconds
)

You passed a `space` parameter to OptunaSearch that contained unresolved search space definitions. OptunaSearch should however be instantiated with fully configured search spaces only. To use Ray Tune's automatic search space conversion, pass the space definition as part of the `config` argument to `tune.run()` instead.
[32m[I 2022-08-15 01:50:56,441][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 08-15 01:50:56] {506} INFO - trial 1 config: {'x': 3, 'y': 13184}
[flaml.tune.tune: 08-15 01:50:56] {506} INFO - trial 2 config: {'x': 6134, 'y': 2076}
[flaml.tune.tune: 08-15 01:50:56] {506} INFO - trial 3 config: {'x': 1143, 'y': 74880}
[flaml.tune.tune: 08-15 01:50:56] {506} INFO - trial 4 config: {'x': 5539, 'y': 1}
[flaml.tune.tune: 08-15 01:50:56] {506} INFO - trial 5 config: {'x': 6793, 'y': 16190}
[flaml.tune.tune: 08-15 01:50:56] {506} INFO - trial 6 config: {'x': 220, 'y': 22480}
[flaml.tune.tune: 08-15 01:50:56] {506} INFO - trial 7 config: {'x': 6, 'y':

In [4]:
'''Investigate results'''
print(analysis.best_result)

{'score': 138344643.26761267, 'evaluation_cost': 0.73238, 'constraint_metric': 7323726762, 'training_iteration': 0, 'config': {'x': 73238, 'y': 99999}, 'config/x': 73238, 'config/y': 99999, 'experiment_tag': 'exp', 'time_total_s': 0.7344884872436523}


print(analysis.result)

## Hierarchical search space 
Hierarchical search space is supported.

In [5]:
'''Set a hierarchical search space'''
from flaml import tune
gbtree_hp_space = {
        "booster": "gbtree",
        "n_estimators": tune.lograndint(lower=4, upper=64),
        "max_leaves": tune.lograndint(lower=4, upper=64),
        "learning_rate": tune.loguniform(lower=1 / 1024, upper=1.0),
    }
gblinear_hp_space = {
    "booster": "gblinear",
    "lambda": tune.uniform(0, 1),
    "alpha": tune.loguniform(0.0001, 1),
}

full_space = {
    "xgb_config": tune.choice([gbtree_hp_space, gblinear_hp_space]),
}

In [6]:
'''Write a evaluation function'''
import xgboost as xgb
def xgb_obj(X_train, X_test, y_train, y_test, config):
    config = config["xgb_config"]
    params = config2params(config)
    dtrain = xgb.DMatrix(X_train, label=y_train)
    booster_type = config.get("booster")

    if booster_type == "gblinear":
        model = xgb.train(
            params,
            dtrain,
        )
    else:
        _n_estimators = params.pop("n_estimators")
        model = xgb.train(params, dtrain, _n_estimators)

    # get validation loss
    from sklearn.metrics import r2_score

    dtest = xgb.DMatrix(X_test)
    y_test_predict = model.predict(dtest)
    test_loss = 1.0 - r2_score(y_test, y_test_predict)
    return {"loss": test_loss}

def config2params(config: dict) -> dict:
    params = config.copy()
    max_depth = params["max_depth"] = params.get("max_depth", 0)
    if max_depth == 0:
        params["grow_policy"] = params.get("grow_policy", "lossguide")
        params["tree_method"] = params.get("tree_method", "hist")
    # params["booster"] = params.get("booster", "gbtree")
    params["use_label_encoder"] = params.get("use_label_encoder", False)
    if "n_jobs" in config:
        params["nthread"] = params.pop("n_jobs")
    return params

In [7]:
'''Tune xgb_obj with configs from the hierarchical search space'''
from flaml.data import load_openml_dataset
from functools import partial

X_train, X_test, y_train, y_test = load_openml_dataset(
    dataset_id=537, data_dir="./"
)
analysis = tune.run(
    partial(xgb_obj, X_train, X_test, y_train, y_test),
    config=full_space,
    metric="loss",
    mode="min",
    num_samples=5,
)
print("analysis", analysis.results)

[32m[I 2022-08-15 01:51:08,271][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 08-15 01:51:08] {506} INFO - trial 1 config: {'xgb_config': {'booster': 'gblinear', 'lambda': 0.6472660813321921, 'alpha': 0.0028264214081400044}}
[flaml.tune.tune: 08-15 01:51:08] {506} INFO - trial 2 config: {'xgb_config': {'n_estimators': 22, 'max_leaves': 31, 'learning_rate': 0.0309282737630552, 'booster': 'gbtree'}}
[flaml.tune.tune: 08-15 01:51:08] {506} INFO - trial 3 config: {'xgb_config': {'n_estimators': 32, 'max_leaves': 6, 'learning_rate': 0.0018014797394283806, 'booster': 'gbtree'}}


load dataset from ./openml_ds537.pkl
Dataset name: houses
X_train.shape: (15480, 8), y_train.shape: (15480,);
X_test.shape: (5160, 8), y_test.shape: (5160,)
Parameters: { grow_policy, max_depth, tree_method, use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip thro

[flaml.tune.tune: 08-15 01:51:08] {506} INFO - trial 4 config: {'xgb_config': {'lambda': 0.003948266327914451, 'alpha': 0.011188427539040417, 'booster': 'gblinear'}}
[flaml.tune.tune: 08-15 01:51:08] {506} INFO - trial 5 config: {'xgb_config': {'n_estimators': 28, 'max_leaves': 8, 'learning_rate': 0.5655557791092936, 'booster': 'gbtree'}}



Parameters: { grow_policy, max_depth, tree_method, use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


analysis {'bb341b44': {'loss': 0.573919882488896, 'training_iteration': 0, 'config': {'xgb_config': {'booster': 'gblinear', 'lambda': 0.6472660813321921, 'alpha': 0.0028264214081400044}}, 'config/xgb_config': {'booster': 'gblinear', 'lambda': 0.6472660813321921, 'alpha': 0.0028264214081400044}, 'experiment_tag': 'exp', 'time_total_s': 0.025920391082763672}, 'bb387e14': {'lo

# Advanced Tuning Options

## 1. Constraints on the tuning

1. A user can specify constraints on the configurations to be satisfied via the argument `config_constraints`.

In [8]:
from flaml import tune
def area(config):
    return config["width"] * config["length"]

cube_search_space = {
    "width": tune.lograndint(lower=1, upper=1000),
    "height": tune.randint(lower=1, upper=1000),
    "length": tune.randint(lower=1, upper=1000),
}

def cube_volume(config: dict):
    """evaluate a hyperparameter configuration"""
    score = config["width"] * config["height"] * config["length"]
    return {"volume": score}

analysis = tune.run(evaluation_function=cube_volume,
         mode="min",
         metric="volume",
         config=cube_search_space,
         config_constraints=[(area, "<=", 1000)],
         num_samples=5,
        )
print(analysis.best_result)

You passed a `space` parameter to OptunaSearch that contained unresolved search space definitions. OptunaSearch should however be instantiated with fully configured search spaces only. To use Ray Tune's automatic search space conversion, pass the space definition as part of the `config` argument to `tune.run()` instead.
[32m[I 2022-08-15 01:51:09,993][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 08-15 01:51:10] {506} INFO - trial 1 config: {'width': 1, 'height': 132, 'length': 647}
[flaml.tune.tune: 08-15 01:51:10] {506} INFO - trial 2 config: {'width': 2, 'height': 760, 'length': 169}
[flaml.tune.tune: 08-15 01:51:10] {506} INFO - trial 3 config: {'width': 1, 'height': 685, 'length': 953}
[flaml.tune.tune: 08-15 01:51:10] {506} INFO - trial 4 config: {'width': 1, 'height': 512, 'length': 812}
[flaml.tune.tune: 08-15 01:51:10] {506} INFO - trial 5 config: {'width': 1, 'height': 373, 'length': 674}


{'volume': 85404, 'training_iteration': 0, 'config': {'width': 1, 'height': 132, 'length': 647}, 'config/width': 1, 'config/height': 132, 'config/length': 647, 'experiment_tag': 'exp', 'time_total_s': 0.006061077117919922}


2.  You can also specify a list of metric constraints to be satisfied via the argument `metric_constraints`.

In [9]:
from flaml import tune
tune.run(evaluation_function=evaluate_config,
         mode="min",
         metric="score",
         config=config_search_space,
         metric_constraints=[("evaluation_cost", "<=", 0.1)],
         num_samples=5,)

You passed a `space` parameter to OptunaSearch that contained unresolved search space definitions. OptunaSearch should however be instantiated with fully configured search spaces only. To use Ray Tune's automatic search space conversion, pass the space definition as part of the `config` argument to `tune.run()` instead.
[32m[I 2022-08-15 01:51:11,386][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 08-15 01:51:11] {506} INFO - trial 1 config: {'x': 3, 'y': 13184}
[flaml.tune.tune: 08-15 01:51:11] {506} INFO - trial 2 config: {'x': 6134, 'y': 2076}
[flaml.tune.tune: 08-15 01:51:11] {506} INFO - trial 3 config: {'x': 1143, 'y': 74880}
[flaml.tune.tune: 08-15 01:51:11] {506} INFO - trial 4 config: {'x': 5539, 'y': 1}
[flaml.tune.tune: 08-15 01:51:11] {506} INFO - trial 5 config: {'x': 6793, 'y': 16190}


<flaml.tune.tune.ExperimentAnalysis at 0x7f066a190a00>

### config_constraints vs metric_constraints

In [10]:
'''Write a evaluation function'''
import xgboost as xgb
from flaml import tune
import time
def xgb_simple_obj(X_train, X_test, y_train, y_test, config):
    params = config2params(config)
    dtrain = xgb.DMatrix(X_train, label=y_train)
    start_time = time.time()
    _n_estimators = params.pop("n_estimators")
    model = xgb.train(params, dtrain, _n_estimators)
    end_time = time.time()
    # get validation loss
    from sklearn.metrics import r2_score
    dtest = xgb.DMatrix(X_test)
    y_test_predict = model.predict(dtest)
    test_loss = 1.0 - r2_score(y_test, y_test_predict)
    return {"loss": test_loss, "training_cost": end_time-start_time}

def config2params(config: dict) -> dict:
    params = config.copy()
    max_depth = params["max_depth"] = params.get("max_depth", 0)
    if max_depth == 0:
        params["grow_policy"] = params.get("grow_policy", "lossguide")
        params["tree_method"] = params.get("tree_method", "hist")
    # params["booster"] = params.get("booster", "gbtree")
    params["use_label_encoder"] = params.get("use_label_encoder", False)
    if "n_jobs" in config:
        params["nthread"] = params.pop("n_jobs")
    return params

def my_model_size(config):
    return config["n_estimators"] * config["max_leaves"]

'''Tune xgb_obj with configs from the hierarchical search space'''
from flaml.data import load_openml_dataset
from functools import partial

xgb_space = {
     "n_estimators": tune.randint(lower=4, upper=64),
      "max_leaves": tune.randint(lower=4, upper=64),
      "learning_rate": tune.loguniform(lower=1 / 1024, upper=1.0),
}
X_train, X_test, y_train, y_test = load_openml_dataset(
    dataset_id=537, data_dir="./"
)
analysis = tune.run(
    partial(xgb_simple_obj, X_train, X_test, y_train, y_test),
    config=xgb_space,
    metric="loss",
    mode="min",
    config_constraints = [(my_model_size, "<=", 40)],
    metric_constraints = [("training_cost", "<=", 1)],
    num_samples=20,
)
print("analysis", analysis.results)

You passed a `space` parameter to OptunaSearch that contained unresolved search space definitions. OptunaSearch should however be instantiated with fully configured search spaces only. To use Ray Tune's automatic search space conversion, pass the space definition as part of the `config` argument to `tune.run()` instead.
[32m[I 2022-08-15 01:51:12,028][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 08-15 01:51:12] {506} INFO - trial 1 config: {'n_estimators': 39, 'max_leaves': 9, 'learning_rate': 0.08672915197219133}


load dataset from ./openml_ds537.pkl
Dataset name: houses
X_train.shape: (15480, 8), y_train.shape: (15480,);
X_test.shape: (5160, 8), y_test.shape: (5160,)
Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




[flaml.tune.tune: 08-15 01:51:12] {506} INFO - trial 2 config: {'n_estimators': 7, 'max_leaves': 5, 'learning_rate': 0.6111947006764871}
Received additional result for trial bda97d2e, but it already finished. Result: {'loss': 0.3552178044511305, 'training_cost': 0.029860258102416992, 'training_iteration': 0, 'config': {'n_estimators': 7, 'max_leaves': 5, 'learning_rate': 0.6111947006764871}, 'config/n_estimators': 7, 'config/max_leaves': 5, 'config/learning_rate': 0.6111947006764871, 'experiment_tag': 'exp', 'time_total_s': 0.037056922912597656, 'loss_lagrange': 0.3552178044511305}
Received additional completion for trial bda97d2e, but it already finished. Result: {'loss': 0.3552178044511305, 'training_cost': 0.029860258102416992, 'training_iteration': 0, 'config': {'n_estimators': 7, 'max_leaves': 5, 'learning_rate': 0.6111947006764871}, 'config/n_estimators': 7, 'config/max_leaves': 5, 'config/learning_rate': 0.6111947006764871, 'experiment_tag': 'exp', 'time_total_s': 0.037056922912

Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




[flaml.tune.tune: 08-15 01:51:13] {506} INFO - trial 3 config: {'n_estimators': 6, 'max_leaves': 5, 'learning_rate': 0.18074443349590638}
Received additional result for trial be41014e, but it already finished. Result: {'loss': 0.8238569381974518, 'training_cost': 0.028520584106445312, 'training_iteration': 0, 'config': {'n_estimators': 6, 'max_leaves': 5, 'learning_rate': 0.18074443349590638}, 'config/n_estimators': 6, 'config/max_leaves': 5, 'config/learning_rate': 0.18074443349590638, 'experiment_tag': 'exp', 'time_total_s': 0.03594350814819336, 'loss_lagrange': 0.8238569381974518}
Received additional completion for trial be41014e, but it already finished. Result: {'loss': 0.8238569381974518, 'training_cost': 0.028520584106445312, 'training_iteration': 0, 'config': {'n_estimators': 6, 'max_leaves': 5, 'learning_rate': 0.18074443349590638}, 'config/n_estimators': 6, 'config/max_leaves': 5, 'config/learning_rate': 0.18074443349590638, 'experiment_tag': 'exp', 'time_total_s': 0.03594350

Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




[flaml.tune.tune: 08-15 01:51:13] {506} INFO - trial 5 config: {'n_estimators': 4, 'max_leaves': 8, 'learning_rate': 0.24184523333348865}
Received additional result for trial be74d082, but it already finished. Result: {'loss': 0.8420746064984069, 'training_cost': 0.02660083770751953, 'training_iteration': 0, 'config': {'n_estimators': 4, 'max_leaves': 8, 'learning_rate': 0.24184523333348865}, 'config/n_estimators': 4, 'config/max_leaves': 8, 'config/learning_rate': 0.24184523333348865, 'experiment_tag': 'exp', 'time_total_s': 0.033454179763793945, 'loss_lagrange': 0.8420746064984069}
Received additional completion for trial be74d082, but it already finished. Result: {'loss': 0.8420746064984069, 'training_cost': 0.02660083770751953, 'training_iteration': 0, 'config': {'n_estimators': 4, 'max_leaves': 8, 'learning_rate': 0.24184523333348865}, 'config/n_estimators': 4, 'config/max_leaves': 8, 'config/learning_rate': 0.24184523333348865, 'experiment_tag': 'exp', 'time_total_s': 0.033454179

Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 6 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.6006787986201269}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 7 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.5793237833265791}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 8 config: {'n_estimators': 8, 'max_leaves': 5, 'learning_rate': 0.6139350165706452}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 9 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.5823407285827188}


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in 

[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 10 config: {'n_estimators': 7, 'max_leaves': 4, 'learning_rate': 0.6153425666105765}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 11 config: {'n_estimators': 10, 'max_leaves': 4, 'learning_rate': 0.0058671903833274665}


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 12 config: {'n_estimators': 10, 'max_leaves': 4, 'learning_rate': 0.005472055643063294}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 13 config: {'n_estimators': 10, 'max_leaves': 4, 'learning_rate': 0.005482932727232923}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 14 config: {'n_estimators': 10, 'max_leaves': 4, 'learning_rate': 0.00451003340688015}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 15 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.004559264656132888}


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in 

[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 16 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.005491692419583242}
[flaml.tune.tune: 08-15 01:51:14] {506} INFO - trial 17 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.005400819560603306}
[flaml.tune.tune: 08-15 01:51:15] {506} INFO - trial 18 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.0056842152617478025}
[flaml.tune.tune: 08-15 01:51:15] {506} INFO - trial 19 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.005370232296413302}


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in 

[flaml.tune.tune: 08-15 01:51:15] {506} INFO - trial 20 config: {'n_estimators': 8, 'max_leaves': 4, 'learning_rate': 0.005273310814672741}


Parameters: { use_label_encoder } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


analysis {'bd7170b4': {'loss': 0.2932019932272869, 'training_cost': 0.30817413330078125, 'training_iteration': 0, 'config': {'n_estimators': 39, 'max_leaves': 9, 'learning_rate': 0.08672915197219133}, 'config/n_estimators': 39, 'config/max_leaves': 9, 'config/learning_rate': 0.08672915197219133, 'experiment_tag': 'exp', 'time_total_s': 0.32521629333496094, 'loss_lagrange': 0.2932019932272869}, 'bda97d2e': {'loss': 0.3552178044511305, 'training_cost': 0.029860258102416992, 'training_iteration': 0, 'config': {'n_estimators': 7, 'max_leaves': 5, 'learning_rate': 0.6111947006764871}, 'config/n_estimators': 7, 'config/max_leaves': 5, 'config/learning_rate': 0.6111947006764871, 'experiment_tag': 'exp', '

## Parallel tuning

In [11]:
%pip install flaml[ray]

Note: you may need to restart the kernel to use updated packages.


In [12]:
# require: pip install flaml[ray]
analysis = tune.run(
    evaluate_config,  # the function to evaluate a config
    config=config_search_space,  # the search space defined
    metric="score",
    mode="min",  # the optimization mode, "min" or "max"
    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
    time_budget_s=10,  # the time budget in seconds
    use_ray=True,
    resources_per_trial={"cpu": 2}  # limit resources allocated per trial
)
print(analysis.best_trial.last_result)  # the best trial's result
print(analysis.best_config)  # the best config

You passed a `space` parameter to OptunaSearch that contained unresolved search space definitions. OptunaSearch should however be instantiated with fully configured search spaces only. To use Ray Tune's automatic search space conversion, pass the space definition as part of the `config` argument to `tune.run()` instead.
[32m[I 2022-08-15 01:51:18,917][0m A new study created in memory with name: optuna[0m
2022-08-15 01:51:20,536	INFO services.py:1470 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


Trial evaluate_config_c536df28 reported score=7224490009.00 with parameters={'x': 3, 'y': 13184}.
Trial evaluate_config_c536df28 completed. Last result: score=7224490008.999772,evaluation_cost=3e-05,constraint_metric=39552
Trial evaluate_config_c65a5d62 reported score=6219845953.05 with parameters={'x': 6134, 'y': 2076}.
Trial evaluate_config_c65a5d62 completed. Last result: score=6219845953.0452795,evaluation_cost=0.06134,constraint_metric=12734184
Trial evaluate_config_c65dd12c reported score=7031996448.98 with parameters={'x': 1143, 'y': 74880}.
Trial evaluate_config_c65dd12c completed. Last result: score=7031996448.9847355,evaluation_cost=0.01143,constraint_metric=85587840
Trial evaluate_config_c669f7d6 reported score=7187648399.99 with parameters={'x': 220, 'y': 22480}.
Trial evaluate_config_c669f7d6 completed. Last result: score=7187648399.990213,evaluation_cost=0.0022,constraint_metric=4945600
Trial evaluate_config_c66cf5f8 reported score=6314044982.00 with parameters={'x': 5539

Trial evaluate_config_c7bdcc34 completed. Last result: score=1980784035.353379,evaluation_cost=0.40494,constraint_metric=2535896256
Trial evaluate_config_c858d602 reported score=594628223.34 with parameters={'x': 60615, 'y': 36580}.
Trial evaluate_config_c858d602 completed. Last result: score=594628223.342947,evaluation_cost=0.60615,constraint_metric=2217296700
Trial evaluate_config_c89b5202 reported score=224969999.43 with parameters={'x': 99999, 'y': 63494}.
Trial evaluate_config_c89b5202 completed. Last result: score=224969999.4250638,evaluation_cost=0.99999,constraint_metric=6349336506
Trial evaluate_config_c8fa5338 reported score=2230483983.38 with parameters={'x': 37772, 'y': 61372}.
Trial evaluate_config_c8fa5338 completed. Last result: score=2230483983.38454,evaluation_cost=0.37772,constraint_metric=2318143184
Trial evaluate_config_c9957e08 reported score=224969998.42 with parameters={'x': 99999, 'y': 38702}.
Trial evaluate_config_c9957e08 completed. Last result: score=22496999

2022-08-15 01:51:35,841	INFO stopper.py:363 -- Reached timeout of 10 seconds. Stopping all trials.


Trial evaluate_config_caa8a734 reported score=224969997.73 with parameters={'x': 99999, 'y': 30598}.


Trial name,status,loc,x,y,iter,total time (s),score,evaluation_cost,constraint_metric
evaluate_config_c536df28,TERMINATED,172.31.46.142:32472,3,13184,1.0,0.00030303,7224490000.0,3e-05,39552.0
evaluate_config_c65a5d62,TERMINATED,172.31.46.142:32472,6134,2076,1.0,0.0616226,6219850000.0,0.06134,12734184.0
evaluate_config_c65dd12c,TERMINATED,172.31.46.142:32472,1143,74880,1.0,0.0119874,7032000000.0,0.01143,85587840.0
evaluate_config_c669f7d6,TERMINATED,172.31.46.142:32472,220,22480,1.0,0.00262713,7187650000.0,0.0022,4945600.0
evaluate_config_c66cf5f8,TERMINATED,172.31.46.142:32472,5539,1,1.0,0.0560231,6314040000.0,0.05539,5539.0
evaluate_config_c66f1acc,TERMINATED,172.31.46.142:32472,6,76053,1.0,0.00024724,7223980000.0,6e-05,456318.0
evaluate_config_c679a672,TERMINATED,172.31.46.142:32472,4,8834,1.0,0.000227928,7224320000.0,4e-05,35336.0
evaluate_config_c67c90c6,TERMINATED,172.31.46.142:32472,6793,16190,1.0,0.0681667,6116330000.0,0.06793,109978670.0
evaluate_config_c67eddea,TERMINATED,172.31.46.142:32472,2148,95339,1.0,0.021688,6864450000.0,0.02148,204788172.0
evaluate_config_c68b508e,TERMINATED,172.31.46.142:32472,1,51219,1.0,0.000186205,7224830000.0,1e-05,51219.0


2022-08-15 01:51:36,123	INFO tune.py:747 -- Total run time: 13.49 seconds (10.84 seconds for the tuning loop).


{'score': 224969997.73184523, 'evaluation_cost': 0.99999, 'constraint_metric': 3059769402, 'time_this_iter_s': 1.0003881454467773, 'done': False, 'timesteps_total': None, 'episodes_total': None, 'training_iteration': 1, 'trial_id': 'caa8a734', 'experiment_id': '6b71e8bc54c0420c8c26b0c528a6d9b6', 'date': '2022-08-15_01-51-35', 'timestamp': 1660528295, 'time_total_s': 1.0003881454467773, 'pid': 32472, 'hostname': 'ip-172-31-46-142.us-east-2.compute.internal', 'node_ip': '172.31.46.142', 'config': {'x': 99999, 'y': 30598}, 'time_since_restore': 1.0003881454467773, 'timesteps_since_restore': 0, 'iterations_since_restore': 1, 'warmup_time': 0.0036840438842773438, 'experiment_tag': '25_x=99999,y=30598'}
{'x': 99999, 'y': 30598}


## Warm start

In [13]:
from flaml import tune

config_search_space = {
    "a": tune.uniform(lower=0, upper=0.99),
    "b": tune.uniform(lower=0, upper=3),
}

def simple_obj(config):
    return config["a"] + config["b"]

points_to_evaluate = [
    {"b": .99, "a": 3},
    {"b": .99, "a": 2},
    {"b": .80, "a": 3},
    {"b": .80, "a": 2},
]
evaluated_rewards = [3.99, 2.99]

analysis = tune.run(
    simple_obj,
    config=config_search_space,
    mode="max",
    points_to_evaluate=points_to_evaluate,
    evaluated_rewards=evaluated_rewards,
    num_samples=10,
)

[32m[I 2022-08-15 01:51:37,168][0m A new study created in memory with name: optuna[0m
  trial = ot.trial.create_trial(
[32m[I 2022-08-15 01:51:37,173][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial 1 config: {'b': 0.8, 'a': 3.0}
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial 2 config: {'b': 0.8, 'a': 2.0}
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial 3 config: {'a': 0.7636074368340785, 'b': 0.0622558480782045}
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial 4 config: {'a': 0.6273117525770127, 'b': 2.246411647615836}
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial 5 config: {'a': 0.4935219421795645, 'b': 0.674389936592543}
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial 6 config: {'a': 0.19608223611202774, 'b': 2.2815921365968763}
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial 7 config: {'a': 0.1674197281969101, 'b': 0.2650194425220308}
[flaml.tune.tune: 08-15 01:51:37] {506} INFO - trial

## Trial scheduling

###  An authentic scheduler implemented in FLAML (`scheduler='flaml'`).

In [14]:
search_space = {
        "n_estimators": tune.lograndint(lower=4, upper=32768),
        "max_leaves": tune.lograndint(lower=4, upper=32768),
        "learning_rate": tune.loguniform(lower=1 / 1024, upper=1.0),
    }

In [15]:
'''Set a evaluation function with resource dimension'''
def obj_from_resource_attr(resource_attr, X_train, X_test, y_train, y_test, config):
    from lightgbm import LGBMClassifier
    from sklearn.metrics import accuracy_score

    # in this example sample size is our resource dimension
    resource = int(config[resource_attr])
    sampled_X_train = X_train.iloc[:resource]
    sampled_y_train = y_train[:resource]

    # construct a LGBM model from the config
    # note that you need to first remove the resource_attr field
    # from the config as it is not part of the original search space
    model_config = config.copy()
    del model_config[resource_attr]
    model = LGBMClassifier(**model_config)

    model.fit(sampled_X_train, sampled_y_train)
    y_test_predict = model.predict(X_test)
    test_loss = 1.0 - accuracy_score(y_test, y_test_predict)
    return {"loss": test_loss}

In [16]:
from flaml import tune
from functools import partial
from flaml.data import load_openml_task
    
X_train, X_test, y_train, y_test = load_openml_task(task_id=7592, data_dir="")
max_resource = len(y_train)
resource_attr = "sample_size"
min_resource = 1000
analysis = tune.run(
    partial(
        obj_from_resource_attr, resource_attr, X_train, X_test, y_train, y_test
    ),
    config=search_space,
    metric="loss",
    mode="min",
    resource_attr=resource_attr,
    scheduler="flaml",
    max_resource=max_resource,
    min_resource=min_resource,
    reduction_factor=2,
    time_budget_s=10,
    num_samples=-1,
)
print("best result w/ flaml scheduler (in 10s): ", analysis.best_result)

[32m[I 2022-08-15 01:51:39,900][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 08-15 01:51:39] {506} INFO - trial 1 config: {'n_estimators': 9, 'max_leaves': 1364, 'learning_rate': 0.012074374674294664, 'sample_size': 1000}


download dataset from openml
X_train.shape: (43957, 14), y_train.shape: (43957,),
X_test.shape: (4885, 14), y_test.shape: (4885,)


Error: Canceled future for execute_request message before replies were done

###  ASHA scheduler (`scheduler='asha'`) or a custom scheduler of the  [`TrialScheduler`](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers) class from `ray.tune`.

In [None]:
def obj_w_intermediate_report(
        resource_attr,
        X_train,
        X_test,
        y_train,
        y_test,
        min_resource,
        max_resource,
        config,
    ):
    from lightgbm import LGBMClassifier
    from sklearn.metrics import accuracy_score

    # a customized schedule to perform the evaluation
    eval_schedule = [res for res in range(min_resource, max_resource, 5000)] + [
        max_resource
    ]
    for resource in eval_schedule:
        sampled_X_train = X_train.iloc[:resource]
        sampled_y_train = y_train[:resource]

        # construct a LGBM model from the config
        model = LGBMClassifier(**config)

        model.fit(sampled_X_train, sampled_y_train)
        y_test_predict = model.predict(X_test)
        test_loss = 1.0 - accuracy_score(y_test, y_test_predict)
        # need to report the resource attribute used and the corresponding intermediate results
        try:
            tune.report(sample_size=resource, loss=test_loss)
        except StopIteration:
            return

In [None]:
X_train, X_test, y_train, y_test = load_openml_task(task_id=7592, data_dir="")
resource_attr = "sample_size"
min_resource = 1000
max_resource = len(y_train)
analysis = tune.run(
    partial(
        obj_w_intermediate_report,
        resource_attr,
        X_train,
        X_test,
        y_train,
        y_test,
        min_resource,
        max_resource,
    ),
    config=search_space,
    metric="loss",
    mode="min",
    resource_attr=resource_attr,
    scheduler="asha",
    max_resource=max_resource,
    min_resource=min_resource,
    reduction_factor=2,
    time_budget_s=10,
    num_samples=-1,
)
print("best result w/ asha scheduler (in 10s): ", analysis.best_result)

[32m[I 2022-07-07 04:19:32,321][0m A new study created in memory with name: optuna[0m
[flaml.tune.tune: 07-07 04:19:32] {506} INFO - trial 1 config: {'n_estimators': 9, 'max_leaves': 1364, 'learning_rate': 0.012074374674294664}


load dataset from openml_task7592.pkl
X_train.shape: (43957, 14), y_train.shape: (43957,),
X_test.shape: (4885, 14), y_test.shape: (4885,)


[flaml.tune.tune: 07-07 04:19:33] {506} INFO - trial 2 config: {'n_estimators': 4048, 'max_leaves': 4, 'learning_rate': 0.07891713267442702}


best result w/ asha scheduler (in 10s):  {'sample_size': 43957, 'loss': 0.13920163766632554, 'training_iteration': 9, 'config': {'n_estimators': 4048, 'max_leaves': 4, 'learning_rate': 0.07891713267442702}, 'config/n_estimators': 4048, 'config/max_leaves': 4, 'config/learning_rate': 0.07891713267442702, 'experiment_tag': 'exp', 'time_total_s': 66.68751931190491}
