# Micro tutorial on how to run and scale hyperparameter optimization with LightGBM and Tune
<img src="https://docs.ray.io/en/latest/_images/tune_overview.png" alt="Tune and integrations" width="500">

Aug 2022. San Francisco, CA

## Part 1: single LightGBM training session
<img src="https://lightgbm.readthedocs.io/en/latest/_images/LightGBM_logo_black_text.svg" alt="LightGBM Logo" width="500">

[LightGBM](https://lightgbm.readthedocs.io) is a gradient boosting framework that uses tree based learning algorithms. It has Python API for model training and evaluation. Trained model can be inspected in multiple ways including visualizations like feature importance or trees plotting.

### Preliminaries

In [None]:
# Imports
import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [None]:
# Prepare dataset
X, y = load_digits(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=7707)

train_data = lgb.Dataset(data=X_train, label=y_train, free_raw_data=False)

Here, we use [digits dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) (classification) and create LightGBM Dataset object that will be used for training.

### Set training parameters for single training run

In [None]:
training_parameters = {
    "objective": "multiclass",
    "metric": "multi_logloss",
    "num_class": 10,
    "num_leaves": 5,          # max number of leaves in one tree
    "learning_rate": 0.001,   # boosting learning rate
    "feature_fraction": 0.5,  # fraction of features on each iteration
    "bagging_fraction": 0.5,  # like "feature_fraction", but this will randomly select part of data without resampling
    "bagging_freq": 50,       # frequency for bagging
    "max_depth": 2,           # max depth of the tree
    "verbose": -1,
}

### Initialize and train LightGBM model

In [None]:
# Initialize booster
gbm = lgb.Booster(params=training_parameters, train_set=train_data)

# Train booster for 200 iterations
for i in range(200):
    gbm = lgb.train(
        params=training_parameters,
        train_set=train_data,
        num_boost_round=1,
        init_model=gbm,
        keep_training_booster=True,
    )

### Report accuracy on validation data

In [None]:
y_pred = np.argmax(gbm.predict(X_valid), axis=1)
acc = accuracy_score(y_true=y_valid, y_pred=y_pred)
print(f"Accuracy on valid set: {acc:.4f}, after {gbm.current_iteration()} iterations.")

### Summary
We just ran single LightGBM training session. To do that we prepared dataset and training hyperparameters.

#### Next
Let's have a closer look at Tune.

## Part 2: Tune quickstart
<img src="https://docs.ray.io/en/latest/_images/tune.png" alt="Tune logo" width="500">

### Introduction to Tune
#### Key concepts

<img src="https://docs.ray.io/en/latest/_images/tune_flow.png" alt="Tune key concepts" width="800">

Learn more about it from the [Key concepts](https://docs.ray.io/en/latest/tune/key-concepts.html) docs page.

#### Scaling of the tuning jobs

<img src="https://miro.medium.com/max/700/0*EZKV8RTgDt0NfL49" alt="scaling" width="600">

Learn more from the Richard Liaw et al. [paper](https://arxiv.org/abs/1807.05118) introducing Tune.

### Initialize Ray cluster

In [None]:
import ray

if ray.is_initialized:
    ray.shutdown()
cluster_info = ray.init(num_cpus=8)
cluster_info.address_info

* `ray.init()` starts Ray runtime on a single machine. By default it will utilize all cores available on the machine. Here, we parametrized it to use `num_cpus=8`.
* Check [configuring ray](https://docs.ray.io/en/latest/ray-core/configure.html#configuring-ray) page for more in depth analysis of available options.
* This runtime will be used for all tuning jobs.

### Import Tune

In [None]:
from ray import tune

### Define search space

In [None]:
search_space = {
    "objective": "multiclass",
    "metric": "multi_logloss",
    "num_class": 10,
    "num_leaves": tune.choice([2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 100]),
    "learning_rate": tune.loguniform(1e-4, 1e-1),
    "feature_fraction": tune.uniform(0.5, 0.999),
    "bagging_fraction": 0.5,
    "bagging_freq": tune.randint(1, 50),
    "max_depth": tune.randint(1, 11),
    "verbose": -1,
}

* Notice that you can freely mix tune functions for defining search space (i.e. `tune.randint(1, 11)`) with fixed values (i.e. `"num_class": 10`).
* [Search space API](https://docs.ray.io/en/latest/tune/api_docs/search_space.html) has variety of functions that you can use to define your search space in a way that suits your needs. Function used above are just few examples.

### Define trainable

In [None]:
def train_lgbm(training_params, checkpoint_dir=None):
    train_data = lgb.Dataset(data=X_train, label=y_train, free_raw_data=False)

    # Initialize booster
    gbm = lgb.Booster(params=training_params, train_set=train_data)

    # Train booster for 200 iterations
    for i in range(200):
        gbm = lgb.train(
            params=training_params,
            train_set=train_data,
            num_boost_round=1,
            init_model=gbm,
            keep_training_booster=True,
        )

        y_pred = np.argmax(gbm.predict(X_valid), axis=1)
        acc = accuracy_score(y_true=y_valid, y_pred=y_pred)

        # Send accuracy back to Tune
        tune.report(valid_acc=acc)

* Trainable (`train_lgbm`) is a function that will be evaluated multiples times during tuning.
* LightGBM model training logic is the same as in the "vanilla" example above.
* It is executed on a separate Ray Actor (process), so we need to communicate the performance of the model back to Tune (which is on the main Python process). Here, `tune.report()` comes into play - it sends the performance value back to Tune. In this case it is `acc`.

### Run hyperparameter tuning, single trial

In [None]:
analysis = tune.run(train_lgbm, config=search_space)

* When you call `tune.run()`, the trainable (`train_lgbm`) is evaluated with hyperparameters sampled from the search space (`search_space`).
* Tune handles sampling and executing the trainable.

### Display info about this trial

In [None]:
df = analysis.dataframe(metric="valid_acc")
df

### Summary
We just ran tuning job with Tune 🚀.

#### Key concepts in this section
* Search space
* Trainable
* Trial

#### Key API elements in this section
* `ray.init()` -> start ray runtime.
* `tune.report()` -> log the performance values. Called in the trainable function.
* `tune.run()` -> execute tuning.

#### Next
We will modify `tune.run()` in order to run tuning with 100 trials.

## Part 3: Execute 100 tuning runs with Tune

### Run hyperparameter tuning

In [None]:
analysis = tune.run(
    train_lgbm,
    config=search_space,
    num_samples=100,
    metric="valid_acc",
    resources_per_trial={"cpu": 1},
    verbose=1,
)

* When `tune.run()` is called, trainable (`train_lgbm`) is evaluated `num_samples` times (100 trials) in parallel (subject to available compute resources).
* Each trial has hyperparameters sampled from the search space (`search_space`).
* Tune handles parallel execution, sampling from the search space and collecting the results.

### Display info about best trials

In [None]:
df = analysis.dataframe(metric="valid_acc")
df.sort_values(by=["valid_acc"], ascending=False).head(n=5)

Optionally you can use parallel coordinates plot to visualise results from all tuning runs. You can use [Plotly](https://plotly.com/python/parallel-coordinates-plot/) or [HiPlot](https://github.com/facebookresearch/hiplot).

### Summary
We optimized hyperparameters by executing 100 tuning trials.

#### Key API elements in this section
* `tune.run(num_samples=...)` -> specify number of trials.

#### Next
We will introduce `scheduler` to early stop unpromising trials and as a result save compute time.

## Part 4: ASHA with Tune

### Introduction to ASHA (Asynchronous Successive Halving Algorithm)
<img src="https://lh4.googleusercontent.com/E6KJ-5KQgfYVleJEXxaldICsEXm-dRUlsiD9AFbckXov0uaYfnIBKskLT6z1eLfptdKjxTCF05LBAz0W9evXbyWAViA5qYFGOaIYCuoz-h9n8rluHkl3ZOj-0IPKrdA4ES34Ybpo" alt="synchronous promotions" width="1000">

<img src="https://lh6.googleusercontent.com/ncYQXlFoVzhEsun2I-0LfTySEySc-uwEAd2vdPXGHvwprwXApuHuU4o17uJ1ITgHw9_sxId0995xOdfs-r7K3lWB4QQ7v9s33GnBs-EZ7cECIqj9Cq_eDQapJSAEG6P6A0oLZxm6" alt="asynchronous promotions" width="1000">

* Promote configurations whenever possible, hence utilize resources.
* Asynchronous SHA utilizes resources efficiently. Workers are always busy by expanding the base rung if no configurations can be promoted to higher rungs.
* Read more about ASHA in the CMU ML [blogpost](https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/).

_(Visualization is from the same [blogpost](https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/). Date accessed: 2022.08.04)_

### Import ASHA from Tune schedulers

In [None]:
from ray.tune.schedulers import ASHAScheduler

### Create ASHA scheduler

In [None]:
asha = ASHAScheduler(
    time_attr="training_iteration",
    mode="max",
    grace_period=50,
)

### Run hyperparameter tuning with ASHA scheduler

In [None]:
analysis = tune.run(
    train_lgbm,
    config=search_space,
    num_samples=100,
    metric="valid_acc",
    resources_per_trial={"cpu": 1},
    scheduler=asha,
    verbose=1,
)

### Display info about best trials

In [None]:
df = analysis.dataframe(metric="valid_acc")
df.sort_values(by=["valid_acc"], ascending=False).head(n=5)

### Summary
We ran hyperparameter tuning with 100 trials. ASHA scheduler terminated unpromising trials early. Saving compute resources.

#### Key concepts in this section
* Scheduler
* Early stopping (of the unpromising trials)

#### Key API elements in this section
* `ASHAScheduler` -> [Async Successive Halving](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler) scheduler.
* `tune.run(scheduler=...)` -> specify scheduler to use for tuning.

## Shutdown Ray runtime

In [None]:
ray.shutdown()

Disconnect the worker, and terminate processes started by `ray.init()`.

## Where to go next?

Congrats!

You just finished the micro tutorial on how to run and scale hyperparameter optimization with LightGBM and Tune.

Now, please go to the [micro tutorial README](https://github.com/kamil-kaczmarek/ray-tune-micro-tutorial/blob/kk/dev/README.md), to learn more about next steps, and options to reach out and connect with the community.