Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] prefer params to keyword argument in lgb.train() #5007

Merged
merged 5 commits into from
Feb 18, 2022

Conversation

jameslamb
Copy link
Collaborator

From #4976 (comment).

Relates to #4913.

Today, when using lightgbm::lgb.train(), if you you pass different values to keyword argument obj and parameter "objective" in the params list, the value passed to the keyword argument will be preferred.

That is inconsistent with the way that the rest of the R package and the Python package works. Across LightGBM language-specific wrapper packages, it's expected that the values from params will take precedence over keyword arguments.

This PR proposes changing the R package to ensure that if objective or its aliases are passed through params, that value takes precedence over the keyword argument.

Why make this breaking change now?

Normally I'd propose that a change like this go through a deprecation cycle. However, it's been almost 4 months since the previous release (v3.3.1, not counting v3.3.2 since that was just a small patch to satisfy CRAN), and given the current state of open large PRs like #3234, #4630, and #4827 I expect it will be at least 2 months until the next, 4.0.0 release.

R-package/tests/testthat/test_basic.R Outdated Show resolved Hide resolved
R-package/tests/testthat/test_basic.R Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

jameslamb commented Feb 16, 2022

@shiyu1994 can you please help us diagnose why this license/cla check is stuck in pending?

Screen Shot 2022-02-16 at 8 33 55 AM

#4985 and #5009 are also blocked by it. I believe the automation that checks whether contributors have signed the Contributor License Agreement is a Microsoft-wide configuration, not something specific to this repo, and I suspect something about it has changed.

if (is.character(params$objective)) {

# If the objective is a character, check if it is a known objective
if (!(params$objective %in% OBJECTIVES)) {

stop("lgb.check.obj: objective name error should be one of (", paste0(OBJECTIVES, collapse = ", "), ")")
Copy link
Collaborator

@StrikerRUS StrikerRUS Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you remember why is this line needed? Or more generally: why is lgb.check.obj() function needed?
Looks like there is a duplication with a similar check at cpp side. For example, there is no similar logic in the Python-package, but users can pass only registered objective names. Please consider the following example:

import numpy as np
import lightgbm as lgb

X = np.random.random((30, 5))
y = np.random.random((30))

lgb.LGBMRegressor(objective='no_such_objective').fit(X, y)

and the output:

---------------------------------------------------------------------------
LightGBMError                             Traceback (most recent call last)
<ipython-input-7-adb0d6a9439c> in <module>
----> 1 lgb.LGBMRegressor(objective='no_such_objective').fit(X, y)

D:\Miniconda3\lib\site-packages\lightgbm\sklearn.py in fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    893             callbacks=None, init_model=None):
    894         """Docstring is inherited from the LGBMModel."""
--> 895         super().fit(X, y, sample_weight=sample_weight, init_score=init_score,
    896                     eval_set=eval_set, eval_names=eval_names, eval_sample_weight=eval_sample_weight,
    897                     eval_init_score=eval_init_score, eval_metric=eval_metric,

D:\Miniconda3\lib\site-packages\lightgbm\sklearn.py in fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks, init_model)
    746         callbacks.append(record_evaluation(evals_result))
    747 
--> 748         self._Booster = train(
    749             params=params,
    750             train_set=train_set,

D:\Miniconda3\lib\site-packages\lightgbm\engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    269     # construct booster
    270     try:
--> 271         booster = Booster(params=params, train_set=train_set)
    272         if is_valid_contain_train:
    273             booster.set_train_data_name(train_data_name)

D:\Miniconda3\lib\site-packages\lightgbm\basic.py in __init__(self, params, train_set, model_file, model_str, silent)
   2608             params_str = param_dict_to_str(params)
   2609             self.handle = ctypes.c_void_p()
-> 2610             _safe_call(_LIB.LGBM_BoosterCreate(
   2611                 train_set.handle,
   2612                 c_str(params_str),

D:\Miniconda3\lib\site-packages\lightgbm\basic.py in _safe_call(ret)
    123     """
    124     if ret != 0:
--> 125         raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
    126 
    127 

LightGBMError: Unknown objective type name: no_such_objective

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like lgb.check.obj() has been a part of the R package since the very first PR introducing the R package, back in 2017.

https://github.com/microsoft/LightGBM/blame/9259a5318cd03951f2eefe378d9fb8db74457ee3/R-package/R/utils.R#L120

#168

I agree that in the current version of the R package, this isn't really necessary any more. And I'd support removing it, to remove the need to maintain a hard-coded list of objective names in the R package.

library(lightgbm)

data("EuStockMarkets")
lightgbm:::Booster$new(
    train_set = lightgbm::lgb.Dataset(
        data = EuStockMarkets[, c("SMI", "CAC", "FTSE")]
        , label = EuStockMarkets[, "DAX"]
    )
    , params = list(
        objective = "no_such_objective"
    )
)

Raises the following exception:

[LightGBM] [Fatal] Unknown objective type name: no_such_objective
Error in try({ : Unknown objective type name: no_such_objective
Error in initialize(...) : lgb.Booster: cannot create Booster handle

I'll open a separate PR tonight factoring it out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

R-package/tests/testthat/test_basic.R Outdated Show resolved Hide resolved
R-package/tests/testthat/test_basic.R Outdated Show resolved Hide resolved
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@shiyu1994
Copy link
Collaborator

shiyu1994 commented Feb 17, 2022

@shiyu1994 can you please help us diagnose why this license/cla check is stuck in pending?

Screen Shot 2022-02-16 at 8 33 55 AM

#4985 and #5009 are also blocked by it. I believe the automation that checks whether contributors have signed the Contributor License Agreement is a Microsoft-wide configuration, not something specific to this repo, and I suspect something about it has changed.

Seems that has recovered now.

I think that would be related to the Microsoft CLA GitHub App.

@jameslamb
Copy link
Collaborator Author

Seems that has recovered now.

ah great, glad just closing and re-opening things fixed it. Thanks for checking @shiyu1994 and @StrikerRUS

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@StrikerRUS StrikerRUS merged commit 9e73cee into master Feb 18, 2022
@StrikerRUS StrikerRUS deleted the fix/prefer-objective-param branch February 18, 2022 01:10
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants