Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] use more specific method names on _DaskLGBMModel #4004

Merged
merged 1 commit into from
Feb 20, 2021

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Feb 19, 2021

The Dask sklearn estimators in lightgbm.dask inherit from both their lightgbm.sklearn equivalent and a mixin called _DaskLGBMModel.

class DaskLGBMClassifier(LGBMClassifier, _DaskLGBMModel):

Those equivalents from lightgbm.sklearn inherit from a mix of LightGBM internal classes and some mixins brought in from sklearn.

from lightgbm.sklearn import DaskLGBMClassifier
for c in DaskLGBMClassifier.__mro__:
    print(c)

# <class 'lightgbm.dask.DaskLGBMClassifier'>
# <class 'lightgbm.sklearn.LGBMClassifier'>
# <class 'lightgbm.sklearn.LGBMModel'>
# <class 'sklearn.base.BaseEstimator'>
# <class 'sklearn.base.ClassifierMixin'>
# <class 'lightgbm.dask._DaskLGBMModel'>
# <class 'object'>

Since LGBMClassifier is the first parent class in the method resolution order ("MRO") for DaskLGBMClassifier, if it or any of the parent classes from sklearn introduce a method that conflicts with one of the method names in _DaskLGBMModel, the _DaskLGBMModel method won't be used, which could break DaskLGBMClassifier.

To protect against this happening, this PR proposes changing the method names on _DaskLGBMModel to names that are less likely to conflict with sklearn in the future. For example, it proposes changing _fit() to _lgb_dask_fit(), so that if an sklearn.base.BaseEstimator._fit() is introduced in some future version of sklearn, DaskLGBMClassifier will still use the correct method.

All of the methods changed here are internal, so this should not have any user-facing impact.

Notes for reviewers

I also considered the pattern below for absolute certainty:

class DaskLGBMClassifier(LGBMClassifier, _DaskLGBMModel)
    def fit(self, ...):
        _DaskLGBMModel._fit(self, ...)

But I don't understand the full implications of doing that, so I decided against it.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Thank you!

@StrikerRUS StrikerRUS merged commit 646267d into microsoft:master Feb 20, 2021
@jameslamb jameslamb deleted the misc/mro branch February 20, 2021 17:27
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants