# Micro tutorial on how to run and scale hyperparameter optimization with LightGBM and Tune
<img src="https://docs.ray.io/en/latest/_images/tune_overview.png" alt="Tune and integrations" width="500">

Aug 2022. San Francisco, CA

## Part 1: single LightGBM training session
<img src="https://lightgbm.readthedocs.io/en/latest/_images/LightGBM_logo_black_text.svg" alt="LightGBM Logo" width="500">

### Preliminaries

In [None]:
# Imports
import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [None]:
# Prepare dataset
X, y = load_digits(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=7707)

train_data = lgb.Dataset(data=X_train, label=y_train, free_raw_data=False)

Here, we use [digits dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) (classification) and create LightGBM Dataset object that will be used for training.

### Set training parameters for single training run

In [None]:
training_parameters = {
    "objective": "multiclass",
    "metric": "multi_logloss",
    "num_class": 10,
    "num_leaves": 5,
    "learning_rate": 0.001,
    "feature_fraction": 0.5,
    "bagging_fraction": 0.5,
    "bagging_freq": 50,
    "max_depth": 2,
    "verbose": -1,
}

### Initialize and train LightGBM model

In [None]:
# Initialize booster
gbm = lgb.Booster(params=training_parameters, train_set=train_data)

# Train booster for 200 iterations
for i in range(200):
    gbm = lgb.train(
        params=training_parameters,
        train_set=train_data,
        num_boost_round=1,
        init_model=gbm,
        keep_training_booster=True,
    )

### Report accuracy on validation data

In [None]:
y_pred = np.argmax(gbm.predict(X_valid), axis=1)
acc = accuracy_score(y_true=y_valid, y_pred=y_pred)
print(f"Accuracy on valid set: {acc:.4f}, after {gbm.current_iteration()} iterations.")

### Summary
* We just ran single LightGBM training session. To do that we prepared dataset and training hyperparameters.
* Next, let's have a closer look at Tune.

## Part 2: Tune quickstart
<img src="https://docs.ray.io/en/latest/_images/tune.png" alt="Tune logo" width="500">

### Introduction to Tune
There are few components that we should look at first:

<img src="https://docs.ray.io/en/latest/_images/tune_flow.png" alt="Tune key concepts" width="800">

Learn more about it from the [Key concepts](https://docs.ray.io/en/latest/tune/key-concepts.html) docs page.

### Initialize Ray cluster

In [None]:
import ray

if ray.is_initialized:
    ray.shutdown()
cluster_info = ray.init()
cluster_info.address_info

This cluster will be used for all tuning jobs.

### Import Tune

In [None]:
from ray import tune

### Define search space

In [None]:
search_space = {
    "objective": "multiclass",
    "metric": "multi_logloss",
    "num_class": 10,
    "num_leaves": tune.choice([2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 100]),
    "learning_rate": tune.loguniform(1e-4, 1e-1),
    "feature_fraction": tune.uniform(0.5, 0.999),
    "bagging_fraction": 0.5,
    "bagging_freq": tune.randint(1, 50),
    "max_depth": tune.randint(1, 11),
    "verbose": -1,
}

### Define trainable

In [None]:
def train_lgbm(training_params, checkpoint_dir=None):
    train_data = lgb.Dataset(data=X_train, label=y_train, free_raw_data=False)

    # Initialize booster
    gbm = lgb.Booster(params=training_params, train_set=train_data)

    # Train booster for 200 iterations
    for i in range(200):
        gbm = lgb.train(
            params=training_params,
            train_set=train_data,
            num_boost_round=1,
            init_model=gbm,
            keep_training_booster=True,
        )

        y_pred = np.argmax(gbm.predict(X_valid), axis=1)
        acc = accuracy_score(y_true=y_valid, y_pred=y_pred)

        # Send accuracy back to Tune
        tune.report(valid_acc=acc)

### Run hyperparameter tuning, single trial

In [None]:
analysis = tune.run(train_lgbm, config=search_space)

### Display info about this trial

In [None]:
df = analysis.dataframe(metric="valid_acc")
df

### Summary
* We just ran first trial using Tune.
* Next, we will modify `tune.run()` in order to run tuning with 100 trials.

## Part 3: Execute 100 tuning runs with Tune

### Run hyperparameter tuning

In [None]:
analysis = tune.run(
    train_lgbm,
    config=search_space,
    num_samples=100,
    metric="valid_acc",
    resources_per_trial={"cpu": 1},
    verbose=1,
)

### Display info about best trials

In [None]:
df = analysis.dataframe(metric="valid_acc")
df.sort_values(by=["valid_acc"], ascending=False).head(n=5)

### Summary
* We just optimized hyperparameters by executing 100 tuning trials.
* Next, we will ontroduce `scheduler` to early stop not promising trials and as a result save compute time.

## Part 4: ASHA with Tune

### Introduction to ASHA (Asynchronous Successive Halving Algorithm)

### Import ASHA from Tune schedulers

In [None]:
from ray.tune.schedulers import ASHAScheduler

### Create ASHA scheduler

In [None]:
asha = ASHAScheduler(
    time_attr="training_iteration",
    mode="max",
    grace_period=50,
)

### Run hyperparameter tuning with ASHA scheduler

In [None]:
analysis = tune.run(
    train_lgbm,
    config=search_space,
    num_samples=100,
    metric="valid_acc",
    resources_per_trial={"cpu": 1},
    scheduler=asha,
    verbose=1,
)

### Display info about best trials

In [None]:
df = analysis.dataframe(metric="valid_acc")
df.sort_values(by=["valid_acc"], ascending=False).head(n=5)

### Summary
* We ran hyperparameter tuning with 100 trials. ASHA scheduler terminated not promising trials early. Saving compute resources.

## Shutdown Ray cluster
Shutdown ray cluster at the end of the tutorial.

In [None]:
ray.shutdown()

## Where to go next?

Congrats!

You just finished the micro tutorial on how to run and scale hyperparameter optimization with LightGBM and Tune.

Now, please go to the [micro tutorial README](https://github.com/kamil-kaczmarek/ray-tune-micro-tutorial/blob/kk/dev/README.md), to learn more about next steps, and options to reach out and connect with the community.