# Micro tutorial on how to run and scale hyperparameter optimization with LightGBM and Tune
<img src="https://docs.ray.io/en/latest/_images/tune_overview.png" alt="Tune and integrations" width="500">

Aug 2022. San Francisco, CA

## Part 1: single LightGBM training session
<img src="https://lightgbm.readthedocs.io/en/latest/_images/LightGBM_logo_black_text.svg" alt="LightGBM Logo" width="500">

[LightGBM](https://lightgbm.readthedocs.io) is a gradient boosting framework that uses tree based learning algorithms. It has Python API for model training and evaluation. Trained model can be inspected in multiple ways including visualizations like feature importance or trees plotting.

### Preliminaries

In [1]:
# Imports
import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [2]:
# Prepare dataset
X, y = load_digits(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=7707)

train_data = lgb.Dataset(data=X_train, label=y_train, free_raw_data=False)

Here, we use [digits dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) (classification) and create LightGBM Dataset object that will be used for training.

### Set training parameters for single training run

In [3]:
training_parameters = {
    "objective": "multiclass",
    "metric": "multi_logloss",
    "num_class": 10,
    "num_leaves": 5,          # max number of leaves in one tree
    "learning_rate": 0.001,   # boosting learning rate
    "feature_fraction": 0.5,  # fraction of features on each iteration
    "bagging_fraction": 0.5,  # like "feature_fraction", but this will randomly select part of data without resampling
    "bagging_freq": 50,       # frequency for bagging
    "max_depth": 2,           # max depth of the tree
    "verbose": -1,
}

### Initialize and train LightGBM model

In [4]:
# Initialize booster
gbm = lgb.Booster(params=training_parameters, train_set=train_data)

# Train booster for 200 iterations
for i in range(200):
    gbm = lgb.train(
        params=training_parameters,
        train_set=train_data,
        num_boost_round=1,
        init_model=gbm,
        keep_training_booster=True,
    )

### Report accuracy on validation data

In [5]:
y_pred = np.argmax(gbm.predict(X_valid), axis=1)
acc = accuracy_score(y_true=y_valid, y_pred=y_pred)
print(f"Accuracy on valid set: {acc:.4f}, after {gbm.current_iteration()} iterations.")

Accuracy on valid set: 0.7694, after 200 iterations.


### Summary
We just ran single LightGBM training session. To do that we prepared dataset and training hyperparameters.

#### Next
Let's have a closer look at Tune.

## Part 2: Tune quickstart
<img src="https://docs.ray.io/en/latest/_images/tune.png" alt="Tune logo" width="500">

### Introduction to Tune
#### Key concepts

<img src="https://docs.ray.io/en/latest/_images/tune_flow.png" alt="Tune key concepts" width="800">

Learn more about it from the [Key concepts](https://docs.ray.io/en/latest/tune/key-concepts.html) docs page.

#### Scaling of the tuning jobs

<img src="https://miro.medium.com/max/700/0*EZKV8RTgDt0NfL49" alt="scaling" width="600">

Learn more from the Richard Liaw et al. [paper](https://arxiv.org/abs/1807.05118) introducing Tune.

### Initialize Ray cluster

In [6]:
import ray

if ray.is_initialized:
    ray.shutdown()
cluster_info = ray.init(num_cpus=8)
cluster_info.address_info

2022-08-08 12:29:01,753	INFO services.py:1470 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


{'node_ip_address': '127.0.0.1',
 'raylet_ip_address': '127.0.0.1',
 'redis_address': None,
 'object_store_address': '/tmp/ray/session_2022-08-08_12-28-59_999398_3126/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2022-08-08_12-28-59_999398_3126/sockets/raylet',
 'webui_url': '127.0.0.1:8265',
 'session_dir': '/tmp/ray/session_2022-08-08_12-28-59_999398_3126',
 'metrics_export_port': 61988,
 'gcs_address': '127.0.0.1:52563',
 'address': '127.0.0.1:52563',
 'node_id': '3c651efebd6f7425d1b5019fbb27b67fe6aaec433c28d62b62ebc459'}

* `ray.init()` starts Ray runtime on a single machine. By default it will utilize all cores available on the machine. Here, we parametrized it to use `num_cpus=8`.
* Check [configuring ray](https://docs.ray.io/en/latest/ray-core/configure.html#configuring-ray) page for more in depth analysis of available options.
* This runtime will be used for all tuning jobs.

### Import Tune

In [7]:
from ray import tune

### Define search space

In [8]:
search_space = {
    "objective": "multiclass",
    "metric": "multi_logloss",
    "num_class": 10,
    "num_leaves": tune.choice([2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 100]),
    "learning_rate": tune.loguniform(1e-4, 1e-1),
    "feature_fraction": tune.uniform(0.5, 0.999),
    "bagging_fraction": 0.5,
    "bagging_freq": tune.randint(1, 50),
    "max_depth": tune.randint(1, 11),
    "verbose": -1,
}

* Notice that you can freely mix tune functions for defining search space (i.e. `tune.randint(1, 11)`) with fixed values (i.e. `"num_class": 10`).
* [Search space API](https://docs.ray.io/en/latest/tune/api_docs/search_space.html) has variety of functions that you can use to define your search space in a way that suits your needs. Function used above are just few examples.

### Define trainable

In [9]:
def train_lgbm(training_params, checkpoint_dir=None):
    train_data = lgb.Dataset(data=X_train, label=y_train, free_raw_data=False)

    # Initialize booster
    gbm = lgb.Booster(params=training_params, train_set=train_data)

    # Train booster for 200 iterations
    for i in range(200):
        gbm = lgb.train(
            params=training_params,
            train_set=train_data,
            num_boost_round=1,
            init_model=gbm,
            keep_training_booster=True,
        )

        y_pred = np.argmax(gbm.predict(X_valid), axis=1)
        acc = accuracy_score(y_true=y_valid, y_pred=y_pred)

        # Send accuracy back to Tune
        tune.report(valid_acc=acc)

* Trainable (`train_lgbm`) is a function that will be evaluated multiples times during tuning.
* LightGBM model training logic is the same as in the "vanilla" example above.
* It is executed on a separate Ray Actor (process), so we need to communicate the performance of the model back to Tune (which is on the main Python process). Here, `tune.report()` comes into play - it sends the performance value back to Tune. In this case it is `acc`.

### Run hyperparameter tuning, single trial

In [10]:
analysis = tune.run(train_lgbm, config=search_space)

Trial name,status,loc,bagging_freq,feature_fraction,learning_rate,max_depth,num_leaves
train_lgbm_5c1d5_00000,RUNNING,127.0.0.1:3160,7,0.887462,0.00176246,7,20


Result for train_lgbm_5c1d5_00000:
  date: 2022-08-08_12-29-04
  done: false
  experiment_id: d8826c2afcf148dc93bc8a79a0075820
  hostname: MacBook.local
  iterations_since_restore: 1
  node_ip: 127.0.0.1
  pid: 3160
  time_since_restore: 0.010824918746948242
  time_this_iter_s: 0.010824918746948242
  time_total_s: 0.010824918746948242
  timestamp: 1659986944
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: 5c1d5_00000
  valid_acc: 0.825
  warmup_time: 0.0021009445190429688
  


Trial name,status,loc,bagging_freq,feature_fraction,learning_rate,max_depth,num_leaves,iter,total time (s),valid_acc
train_lgbm_5c1d5_00000,RUNNING,127.0.0.1:3160,7,0.887462,0.00176246,7,20,99,4.01442,0.886111


Result for train_lgbm_5c1d5_00000:
  date: 2022-08-08_12-29-09
  done: false
  experiment_id: d8826c2afcf148dc93bc8a79a0075820
  hostname: MacBook.local
  iterations_since_restore: 112
  node_ip: 127.0.0.1
  pid: 3160
  time_since_restore: 5.012823820114136
  time_this_iter_s: 0.08013081550598145
  time_total_s: 5.012823820114136
  timestamp: 1659986949
  timesteps_since_restore: 0
  training_iteration: 112
  trial_id: 5c1d5_00000
  valid_acc: 0.8888888888888888
  warmup_time: 0.0021009445190429688
  


Trial name,status,loc,bagging_freq,feature_fraction,learning_rate,max_depth,num_leaves,iter,total time (s),valid_acc
train_lgbm_5c1d5_00000,RUNNING,127.0.0.1:3160,7,0.887462,0.00176246,7,20,154,9.03635,0.883333


Result for train_lgbm_5c1d5_00000:
  date: 2022-08-08_12-29-14
  done: false
  experiment_id: d8826c2afcf148dc93bc8a79a0075820
  hostname: MacBook.local
  iterations_since_restore: 163
  node_ip: 127.0.0.1
  pid: 3160
  time_since_restore: 10.048327922821045
  time_this_iter_s: 0.11434602737426758
  time_total_s: 10.048327922821045
  timestamp: 1659986954
  timesteps_since_restore: 0
  training_iteration: 163
  trial_id: 5c1d5_00000
  valid_acc: 0.8833333333333333
  warmup_time: 0.0021009445190429688
  


Trial name,status,loc,bagging_freq,feature_fraction,learning_rate,max_depth,num_leaves,iter,total time (s),valid_acc
train_lgbm_5c1d5_00000,RUNNING,127.0.0.1:3160,7,0.887462,0.00176246,7,20,195,14.0686,0.888889


Result for train_lgbm_5c1d5_00000:
  date: 2022-08-08_12-29-18
  done: true
  experiment_id: d8826c2afcf148dc93bc8a79a0075820
  experiment_tag: 0_bagging_freq=7,feature_fraction=0.8875,learning_rate=0.0018,max_depth=7,num_leaves=20
  hostname: MacBook.local
  iterations_since_restore: 200
  node_ip: 127.0.0.1
  pid: 3160
  time_since_restore: 14.761805772781372
  time_this_iter_s: 0.13914799690246582
  time_total_s: 14.761805772781372
  timestamp: 1659986958
  timesteps_since_restore: 0
  training_iteration: 200
  trial_id: 5c1d5_00000
  valid_acc: 0.8888888888888888
  warmup_time: 0.0021009445190429688
  


Trial name,status,loc,bagging_freq,feature_fraction,learning_rate,max_depth,num_leaves,iter,total time (s),valid_acc
train_lgbm_5c1d5_00000,TERMINATED,127.0.0.1:3160,7,0.887462,0.00176246,7,20,200,14.7618,0.888889


2022-08-08 12:29:18,949	INFO tune.py:747 -- Total run time: 16.10 seconds (15.97 seconds for the tuning loop).


* When you call `tune.run()`, the trainable (`train_lgbm`) is evaluated with hyperparameters sampled from the search space (`search_space`).
* Tune handles sampling and executing the trainable.

### Display info about this trial

In [11]:
df = analysis.dataframe(metric="valid_acc")
df

Unnamed: 0,valid_acc,time_this_iter_s,done,timesteps_total,episodes_total,training_iteration,trial_id,experiment_id,date,timestamp,...,config/bagging_freq,config/feature_fraction,config/learning_rate,config/max_depth,config/metric,config/num_class,config/num_leaves,config/objective,config/verbose,logdir
0,0.888889,0.139148,False,,,200,5c1d5_00000,d8826c2afcf148dc93bc8a79a0075820,2022-08-08_12-29-18,1659986958,...,7,0.887462,0.001762,7,multi_logloss,10,20,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...


### Summary
We just ran tuning job with Tune 🚀.

#### Key concepts in this section
* Search space
* Trainable
* Trial

#### Key API elements in this section
* `ray.init()` -> start ray runtime.
* `tune.report()` -> log the performance values. Called in the trainable function.
* `tune.run()` -> execute tuning.

#### Next
We will modify `tune.run()` in order to run tuning with 100 trials.

## Part 3: Execute 100 tuning runs with Tune

### Run hyperparameter tuning

In [12]:
analysis = tune.run(
    train_lgbm,
    config=search_space,
    num_samples=100,
    metric="valid_acc",
    resources_per_trial={"cpu": 1},
    verbose=1,
)

2022-08-08 12:31:28,637	INFO tune.py:747 -- Total run time: 129.66 seconds (129.54 seconds for the tuning loop).


* When `tune.run()` is called, trainable (`train_lgbm`) is evaluated `num_samples` times (100 trials) in parallel (subject to available compute resources).
* Each trial has hyperparameters sampled from the search space (`search_space`).
* Tune handles parallel execution, sampling from the search space and collecting the results.

### Display info about best trials

In [13]:
df = analysis.dataframe(metric="valid_acc")
df.sort_values(by=["valid_acc"], ascending=False).head(n=5)

Unnamed: 0,valid_acc,time_this_iter_s,done,timesteps_total,episodes_total,training_iteration,trial_id,experiment_id,date,timestamp,...,config/bagging_freq,config/feature_fraction,config/learning_rate,config/max_depth,config/metric,config/num_class,config/num_leaves,config/objective,config/verbose,logdir
2,0.963889,0.09463,False,,,200,65b96_00002,c398ad3d7dfe445cbb36b9080edec8c9,2022-08-08_12-29-33,1659986973,...,4,0.618751,0.089884,7,multi_logloss,10,9,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
49,0.963889,0.162196,False,,,200,65b96_00049,c597dea2c42e466f9fbff1db67b6cc7b,2022-08-08_12-30-36,1659987036,...,31,0.965862,0.050306,6,multi_logloss,10,15,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
33,0.961111,0.132065,False,,,200,65b96_00033,76b7af995a5f48abaec7b3460afb045f,2022-08-08_12-30-11,1659987011,...,29,0.623361,0.049349,5,multi_logloss,10,8,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
20,0.958333,0.126503,False,,,200,65b96_00020,2b3318b14da9471f829ded3e19198ac8,2022-08-08_12-29-52,1659986992,...,38,0.789032,0.051331,4,multi_logloss,10,10,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
15,0.958333,0.11344,False,,,200,65b96_00015,c398ad3d7dfe445cbb36b9080edec8c9,2022-08-08_12-29-45,1659986985,...,46,0.957914,0.049396,9,multi_logloss,10,6,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...


Optionally you can use parallel coordinates plot to visualise results from all tuning runs. You can use [Plotly](https://plotly.com/python/parallel-coordinates-plot/) or [HiPlot](https://github.com/facebookresearch/hiplot).

### Summary
We optimized hyperparameters by executing 100 tuning trials.

#### Key API elements in this section
* `tune.run(num_samples=...)` -> specify number of trials.

#### Next
We will introduce `scheduler` to early stop unpromising trials and as a result save compute time.

## Part 4: ASHA with Tune

### Introduction to ASHA (Asynchronous Successive Halving Algorithm)
<img src="https://lh4.googleusercontent.com/E6KJ-5KQgfYVleJEXxaldICsEXm-dRUlsiD9AFbckXov0uaYfnIBKskLT6z1eLfptdKjxTCF05LBAz0W9evXbyWAViA5qYFGOaIYCuoz-h9n8rluHkl3ZOj-0IPKrdA4ES34Ybpo" alt="synchronous promotions" width="1000">

<img src="https://lh6.googleusercontent.com/ncYQXlFoVzhEsun2I-0LfTySEySc-uwEAd2vdPXGHvwprwXApuHuU4o17uJ1ITgHw9_sxId0995xOdfs-r7K3lWB4QQ7v9s33GnBs-EZ7cECIqj9Cq_eDQapJSAEG6P6A0oLZxm6" alt="asynchronous promotions" width="1000">

* Promote configurations whenever possible, hence utilize resources.
* Asynchronous SHA utilizes resources efficiently. Workers are always busy by expanding the base rung if no configurations can be promoted to higher rungs.
* Read more about ASHA in the CMU ML [blogpost](https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/).

_(Visualization is from the same [blogpost](https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/). Date accessed: 2022.08.04)_

### Import ASHA from Tune schedulers

In [14]:
from ray.tune.schedulers import ASHAScheduler

### Create ASHA scheduler

In [15]:
asha = ASHAScheduler(
    time_attr="training_iteration",
    mode="max",
    grace_period=50,
)

### Run hyperparameter tuning with ASHA scheduler

In [16]:
analysis = tune.run(
    train_lgbm,
    config=search_space,
    num_samples=100,
    metric="valid_acc",
    resources_per_trial={"cpu": 1},
    scheduler=asha,
    verbose=1,
)

2022-08-08 12:31:51,279	INFO tune.py:747 -- Total run time: 22.49 seconds (22.24 seconds for the tuning loop).


### Display info about best trials

In [17]:
df = analysis.dataframe(metric="valid_acc")
df.sort_values(by=["valid_acc"], ascending=False).head(n=5)

Unnamed: 0,valid_acc,time_this_iter_s,done,timesteps_total,episodes_total,training_iteration,trial_id,experiment_id,date,timestamp,...,config/bagging_freq,config/feature_fraction,config/learning_rate,config/max_depth,config/metric,config/num_class,config/num_leaves,config/objective,config/verbose,logdir
20,0.963889,0.056654,True,,,100,b3189_00020,2cb8b1d634134602aa1daeaf09e0073f,2022-08-08_12-31-38,1659987098,...,40,0.724555,0.087994,4,multi_logloss,10,9,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
44,0.961111,0.059068,True,,,100,b3189_00044,2cb8b1d634134602aa1daeaf09e0073f,2022-08-08_12-31-43,1659987103,...,4,0.721172,0.090719,4,multi_logloss,10,9,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
59,0.961111,0.075981,True,,,100,b3189_00059,34e6c26a02284636901e639767d25b36,2022-08-08_12-31-47,1659987107,...,17,0.659131,0.086681,10,multi_logloss,10,10,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
3,0.955556,0.069032,True,,,100,b3189_00003,34e6c26a02284636901e639767d25b36,2022-08-08_12-31-35,1659987095,...,36,0.71807,0.096898,9,multi_logloss,10,8,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...
17,0.95,0.03568,True,,,100,b3189_00017,059e756bb1f7418793145d52c9e3766c,2022-08-08_12-31-37,1659987097,...,35,0.935216,0.033308,3,multi_logloss,10,9,multiclass,-1,/Users/kamil/ray_results/train_lgbm_2022-08-08...


### Summary
We ran hyperparameter tuning with 100 trials. ASHA scheduler terminated unpromising trials early. Saving compute resources.

#### Key concepts in this section
* Scheduler
* Early stopping (of the unpromising trials)

#### Key API elements in this section
* `ASHAScheduler` -> [Async Successive Halving](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler) scheduler.
* `tune.run(scheduler=...)` -> specify scheduler to use for tuning.

## Shutdown Ray runtime

In [18]:
ray.shutdown()

Disconnect the worker, and terminate processes started by `ray.init()`.

## Where to go next?

Congrats!

You just finished the micro tutorial on how to run and scale hyperparameter optimization with LightGBM and Tune.

Now, please go to the [micro tutorial README](https://github.com/kamil-kaczmarek/ray-tune-micro-tutorial/blob/kk/dev/README.md), to learn more about next steps, and options to reach out and connect with the community.