Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray Tune custom callback based on model structure #44

Closed
Chrisjb opened this issue Mar 27, 2023 · 2 comments
Closed

Ray Tune custom callback based on model structure #44

Chrisjb opened this issue Mar 27, 2023 · 2 comments

Comments

@Chrisjb
Copy link

Chrisjb commented Mar 27, 2023

I have some code that uses a callback to stop a Ray Tune trial if the complexity of the model (total leaves in the model) exceeds a given threshold). This works fine with a normal lightgbmmodel but fails when I use a lightgbm_ray model.

In the below code, "use_distributed" can be toggled to True to reproduce the error.

I presume the error is because the correct way of passing the metrics back to tune is with the TuneReportCheckpointCallback() from ray.tune.integration.lightgbm. I've played around with this, but it seems like I can only access the metrics reported by the lightgbm model. I can't add the "total_leaves" as a metric because it relies on accessing the model itself, not just the data and predictions.

Is it possible to report total_leaves to ray tune with lightgbm_ray?

#%%
# set up and load boston data
import numpy as np
import pandas as pd
import os
import lightgbm
from lightgbm_ray import RayLGBMRegressor, RayParams, RayDMatrix
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import ray
from ray.air import session
from ray import tune
from ray.tune.search.optuna import OptunaSearch


ray.shutdown()
## Initialise ray:
if ray.is_initialized() == False:
    service_host = os.environ['RAY_HEAD_SERVICE_HOST']
    service_port = os.environ['RAY_HEAD_SERVICE_PORT']
    ray.init(
        f'ray://{service_host}:{service_port}'
    )

use_distributed = False
out_dir =< '/path/to/output_folder'>

boston = load_boston()
x, y = boston.data, boston.target
df = pd.DataFrame(x, columns= boston.feature_names)

# make into dmatrix
if use_distributed:
    actors = 2
    ray_params = RayParams(
        num_actors= actors,
        cpus_per_actor = 2,
    )


    train_df_with_target = df.copy()
    train_df_with_target['target'] = y

    train_set = RayDMatrix(
        data=train_df_with_target,
        label = 'target'
        )
else:
    actors = 1
    

# set params and ray params
params = {
    'boosting_type': 'goss',
    'objective': 'regression',
    'metric': 'rmse',
    'n_estimators':100,
    'num_leaves': 6,
    'max_depth': 3,
    'learning_rate': tune.quniform(0.05,0.1, 0.01),
    'verbose': 1
}




#%% define function to count total leaves in model
def leaves_callback(env):
    model = env.model

    mod_dump = model.dump_model()
    tree_info = mod_dump['tree_info']
    num_leaves = 0
    num_iterations = 0
    for tree in tree_info:
        num_leaves += tree['num_leaves']
        num_iterations += 1

    session.report({'total_leaves': num_leaves,
                    "rmse_train":  env.evaluation_result_list[0][2],
                    'num_iterations': num_iterations})

# define trainable
def trainable(params):
    if use_distributed:
        mod_ray = RayLGBMRegressor(
            random_state=100,
            **params
        )


        mod_ray.fit(train_set,
            y='target',
            eval_set = [(train_set, 'target')],
            eval_names=["train"],
            ray_params=ray_params,
            callbacks = [leaves_callback])
    else:
        mod = lightgbm.LGBMRegressor(
            random_state=100,
            **params
        )

        mod.fit(X = x,
            y=y,
            eval_set = [(x, y)],
            eval_names=["train"],
            callbacks = [leaves_callback])


#%% RUN TUNING

resources = [{'CPU': 2.0} for x in range(actors+1)] + [{'CPU': 1.0}]

analysis = tune.Tuner(
    tune.with_resources(
            trainable,
            tune.PlacementGroupFactory(
                resources,
                strategy='PACK')
        ),
    tune_config=tune.TuneConfig(
        metric="rmse_train",
        mode= "min",
        search_alg=OptunaSearch(),
        num_samples=5),
        
    run_config= ray.air.RunConfig(local_dir=out_dir,
                                name = 'test_callback',
                                stop = {'total_leaves': 300}),
    param_space= params,     
    )


results = analysis.fit()

If I toggle use_distributed to True

(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) File "/opt/conda/lib/python3.9/site-packages/ray/air/session.py", line 61, in report
(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) _get_session().report(metrics, checkpoint=checkpoint)
(_RemoteRayLightGBMActor pid=585, ip=10.99.15.76) AttributeError: 'NoneType' object has no attribute 'report'

If I toggle use_distributed to False, I get the expected result:

(TunerInternal pid=2096) +--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------+
(TunerInternal pid=2096) | Trial name | status | loc | learning_rate | iter | total time (s) | total_leaves | rmse_train | num_iterations |
(TunerInternal pid=2096) |--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------|
(TunerInternal pid=2096) | trainable_a895fa72 | TERMINATED | 10.99.5.8:2131 | 0.05 | 56 | 0.24444 | 300 | 3.44845 | 56 |
(TunerInternal pid=2096) | trainable_aa0be088 | TERMINATED | 10.99.5.8:2131 | 0.1 | 61 | 0.296924 | 302 | 2.82896 | 61 |
(TunerInternal pid=2096) | trainable_aa3ab41c | TERMINATED | 10.99.5.8:2131 | 0.08 | 60 | 0.354107 | 301 | 2.89081 | 60 |
(TunerInternal pid=2096) | trainable_aa6d4d32 | TERMINATED | 10.99.15.76:749 | 0.07 | 59 | 0.310418 | 300 | 2.99355 | 59 |
(TunerInternal pid=2096) | trainable_aa89c7a0 | TERMINATED | 10.99.5.8:2131 | 0.05 | 56 | 0.265122 | 300 | 3.44845 | 56 |
(TunerInternal pid=2096) +--------------------+------------+-----------------+-----------------+--------+------------------+----------------+--------------+------------------+

@chrisjb-dlg
Copy link

I managed to solve my particular issue by subclassing TuneReportCallback:

class CustomTuneReportCallback(TuneReportCallback):
    def __init__(self, metrics):
        super(CustomTuneReportCallback, self).__init__(metrics)

    def __call__(self, env: CallbackEnv) -> None:
            if not self.is_rank_0:
                return
            eval_result = self._get_eval_result(env)
            report_dict = self._get_report_dict(eval_result)

            model = env.model

            mod_dump = model.dump_model()
            tree_info = mod_dump['tree_info']
            num_leaves = 0
            num_iterations = 0
            for tree in tree_info:
                num_leaves += tree['num_leaves']
                num_iterations += 1
            report_dict.update({'total_leaves': num_leaves,
                    'rmse_train': env.evaluation_result_list[0][2],
                    'num_iterations': num_iterations})

            put_queue(lambda: tune.report(**report_dict))

@Yard1
Copy link
Member

Yard1 commented Mar 30, 2023

Love it, that's the intended way to do it! Feel free to close the issue unless you have more questions, thanks for posting!

@Chrisjb Chrisjb closed this as completed Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants