[python-package] reorganize early stopping callback #6114

jameslamb · 2023-09-27T02:05:48Z

Contributes to #3756.
Contributes to #3867.

Fixes the following errors from mypy:

callback.py:232: error: Too few arguments  [call-arg]
callback.py:308: error: Argument "train_name" to "_is_train_set" of "_EarlyStoppingCallback" has incompatible type "str | Callable[[Any, Any], list[Any]]"; expected "str"  [arg-type]
callback.py:398: error: Argument 3 to "_is_train_set" of "_EarlyStoppingCallback" has incompatible type "str | Callable[[Any, Any], list[Any]]"; expected "str"  [arg-type]

Those all come from the fact that when env.model in CallbackEnv is a CVBooster, any attributes not explicitly defined on the CVBooster class return a method that is called on each of the Booster objects in .boosters

LightGBM/python-package/lightgbm/engine.py

Lines 357 to 365 in 60a4a13

    
           def __getattr__(self, name: str) -> Callable[[Any, Any], List[Any]]: 
        
               """Redirect methods call of CVBooster.""" 
        
               def handler_function(*args: Any, **kwargs: Any) -> List[Any]: 
        
                   """Call methods with each booster, and concatenate their results.""" 
        
                   ret = [] 
        
                   for booster in self.boosters: 
        
                       ret.append(getattr(booster, name)(*args, **kwargs)) 
        
                   return ret 
        
               return handler_function

For example:

import lightgbm as lgb
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=10_000)
dtrain = lgb.Dataset(X, y)
results = lgb.cv(
    train_set=dtrain,
    params={"objective": "regression"},
    num_boost_round=7,
    nfold=3,
    stratified=False,
    return_cvbooster=True
)

cv_booster = results["cvbooster"]

cv_booster.num_trees()
# [7, 7, 7]

mypy is rightly complaining that given that, it isn't safe to treat env.model._train_data_name as if it was a string... since for a CVBooster, it won't be.

cv_booster._train_data_name
# <function CVBooster.__getattr__.<locals>.handler_function at 0x120c11bc0>

This PR proposes some changes to avoid those cases in the early stopping callback.

It also proposes some other changes to make the code in that callback a bit easier to understand (for example, calling _EarlyStoppingCallback._is_train_set() with keyword arguments).

python-package/lightgbm/callback.py

jmoralez · 2023-10-03T18:49:28Z

python-package/lightgbm/callback.py

+        if self.stopping_rounds <= 0:
+            raise ValueError(f"stopping_rounds should be greater than zero. got: {self.stopping_rounds}")


Since this doesn't need the env we could move it to __init__, WDYT?

I agree! Having it be a loud error right when the callback is created, instead of deferred all the way til the first iteration of training, seems useful. And I'd be surprised to learn that there are other libraries or user code depending on initializing lgb.early_stopping() with a negative value of this and then somehow updating the value before the first time it's called.

Moved into __init__() in bd3366a.

Doing this broke this test:

LightGBM/tests/python_package_test/test_engine.py

Lines 4502 to 4507 in f175ceb

def test_train_raises_informative_error_for_params_of_wrong_type():

X, y = make_synthetic_regression()

params = {"early_stopping_round": "too-many"}

dtrain = lgb.Dataset(X, label=y)

with pytest.raises(lgb.basic.LightGBMError, match="Parameter early_stopping_round should be of type int, got \"too-many\""):

lgb.train(params, dtrain)

Now the error from the early stopping callback gets thrown before this one from the C++ side:

LightGBM/src/io/config.cpp

Line 37 in f175ceb

Log::Fatal("Parameter %s should be of type int, got \"%s\"", key.c_str(), candidate);

So I pushed 7a98d82, which:

switches that test to use a different parameter, to keep covering that C++-side validation

adds a test in test_callback.py on this specific error from lgb.early_stopping()

adds an isinstance() check in the condition guarding that error in lgb.early_stopping(), so you can an informative error instead of something like TypeError: '<=' not supported between instances of 'str' and 'int'

Given all those changes, @jmoralez could you re-review? I don't want to sneak those in on your previous approval.

python-package/lightgbm/callback.py

Co-authored-by: José Morales <jmoralz92@gmail.com>

…htGBM into python/mypy-early-stopping

jmoralez · 2023-10-05T16:13:31Z

tests/python_package_test/test_callback.py

+        lgb.early_stopping(stopping_rounds=-1)
+
+    with pytest.raises(ValueError, match="stopping_rounds should be an integer and greater than 0. got: neverrrr"):
+        lgb.early_stopping(stopping_rounds="neverrrr")


haha thank you, thank you 😂

[python-package] reorganize early stopping callback

177bd50

jameslamb added the maintenance label Sep 27, 2023

linting

4af2b9a

jameslamb changed the title ~~WIP: [python-package] reorganize early stopping callback~~ [python-package] reorganize early stopping callback Sep 27, 2023

jameslamb marked this pull request as ready for review September 27, 2023 04:20

jameslamb requested review from guolinke, shiyu1994 and jmoralez as code owners September 27, 2023 04:20

jameslamb added the awaiting review label Sep 27, 2023

Merge branch 'master' into python/mypy-early-stopping

4b988a0

jmoralez reviewed Oct 3, 2023

View reviewed changes

jameslamb and others added 4 commits October 3, 2023 21:37

Merge branch 'master' into python/mypy-early-stopping

bafc232

Apply suggestions from code review

56f73b4

Co-authored-by: José Morales <jmoralz92@gmail.com>

Merge branch 'python/mypy-early-stopping' of github.com:microsoft/Lig…

17fedac

…htGBM into python/mypy-early-stopping

move validation up

bd3366a

jameslamb requested a review from jmoralez October 4, 2023 02:59

jmoralez approved these changes Oct 4, 2023

View reviewed changes

jmoralez removed the awaiting review label Oct 4, 2023

fix tests

7a98d82

jameslamb requested a review from jmoralez October 5, 2023 03:16

jameslamb added the awaiting review label Oct 5, 2023

jmoralez approved these changes Oct 5, 2023

View reviewed changes

jmoralez removed the awaiting review label Oct 5, 2023

jameslamb merged commit d45dca7 into master Oct 5, 2023
41 checks passed

jameslamb deleted the python/mypy-early-stopping branch October 5, 2023 17:45

Ten0 pushed a commit to Ten0/LightGBM that referenced this pull request Jan 12, 2024

[python-package] reorganize early stopping callback (microsoft#6114)

d0b29e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] reorganize early stopping callback #6114

[python-package] reorganize early stopping callback #6114

jameslamb commented Sep 27, 2023 •

edited

jmoralez Oct 3, 2023

jameslamb Oct 4, 2023

jameslamb Oct 5, 2023

jmoralez Oct 5, 2023

jameslamb Oct 5, 2023

	def __getattr__(self, name: str) -> Callable[[Any, Any], List[Any]]:
	"""Redirect methods call of CVBooster."""
	def handler_function(args: Any, *kwargs: Any) -> List[Any]:
	"""Call methods with each booster, and concatenate their results."""
	ret = []
	for booster in self.boosters:
	ret.append(getattr(booster, name)(args, *kwargs))
	return ret
	return handler_function

		if self.stopping_rounds <= 0:
		raise ValueError(f"stopping_rounds should be greater than zero. got: {self.stopping_rounds}")

	def test_train_raises_informative_error_for_params_of_wrong_type():
	X, y = make_synthetic_regression()
	params = {"early_stopping_round": "too-many"}
	dtrain = lgb.Dataset(X, label=y)
	with pytest.raises(lgb.basic.LightGBMError, match="Parameter early_stopping_round should be of type int, got \"too-many\""):
	lgb.train(params, dtrain)

[python-package] reorganize early stopping callback #6114

[python-package] reorganize early stopping callback #6114

Conversation

jameslamb commented Sep 27, 2023 • edited

jmoralez Oct 3, 2023

Choose a reason for hiding this comment

jameslamb Oct 4, 2023

Choose a reason for hiding this comment

jameslamb Oct 5, 2023

Choose a reason for hiding this comment

jmoralez Oct 5, 2023

Choose a reason for hiding this comment

jameslamb Oct 5, 2023

Choose a reason for hiding this comment

jameslamb commented Sep 27, 2023 •

edited