Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask][docs] initial setup for Dask docs #3822

Merged
merged 5 commits into from
Jan 25, 2021
Merged

[dask][docs] initial setup for Dask docs #3822

merged 5 commits into from
Jan 25, 2021

Conversation

StrikerRUS
Copy link
Collaborator

Towards #3814.

Maybe we can ask to archive old dask-lightgbm repo?

Live demo: https://lightgbm.readthedocs.io/en/dask_docs/Python-API.html#dask-api.

Why don't we have the most general class DaskLGBMModel?

@jameslamb
Copy link
Collaborator

Maybe we can ask to archive old dask-lightgbm repo?

After LightGBM 3.2.0 is released, I'd like to make a pull request in dask-lightgbm that adds a warning to all public functions and constructors saying "dask-lightgbm will not receive further updates. Please use lightgbm instead ('pip install lightgbm[dask]')". If they agree, then after that release we can ask for that repo to be archived.

Why don't we have the most general class DaskLGBMModel?

What is the value of LGBMModel? I don't understand why someone would use that instead of one of the task-specific classes (DaskLGBMRanker, DaskLGBMRegressor, DaskLGBMClassifier). Could you help me understand that?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this! I left some small suggestions, but I agree with these changes.

@@ -384,6 +385,9 @@ def _predict(model, data, raw_score=False, pred_proba=False, pred_leaf=False, pr


class _LGBMModel:
def __init__(self):
if not all((DASK_INSTALLED, PANDAS_INSTALLED, SKLEARN_INSTALLED)):
raise LightGBMError('Dask, Pandas and Scikit-learn are required for this module')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise LightGBMError('Dask, Pandas and Scikit-learn are required for this module')
raise LightGBMError('dask, pandas and scikit-learn are required for lightgbm.dask')

Instead of "this module", could you use the specific name? I think that makes the log message a little more useful standalone. It can be helpful for cases where people don't have direct access to the stack trace, which is required to understand what "this module" refers to.

For example, user code or other frameworks might write things like this

try:
    dask_reg = DaskLGBMClassifier()
except LightGBMError as err:
    log.fatal(err)
    raise SomeOtherException("LightGBM training failed")

I also think packages should be referenced by their exact package names, not capitalized names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Addressed in acac78f.

@@ -344,7 +344,7 @@ def run(self):
extras_require={
'dask': [
'dask[array]>=2.0.0',
'dask[dataframe]>=2.0.0'
'dask[dataframe]>=2.0.0',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow, thank you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, first time I noticed that was LGTM site:
https://lgtm.com/projects/g/microsoft/LightGBM?mode=tree

@StrikerRUS
Copy link
Collaborator Author

@jameslamb

After LightGBM 3.2.0 is released, I'd like to make a pull request in

Great!

What is the value of LGBMModel?

I believe this class is needed for extending scikit-learn features by supporting more objectives. For example, right now LightGBM supports cross-entropy application. LGBMClassifier cannot be used with this objective because scikit-learn checks targets to be ints

_LGBMCheckClassificationTargets(y)

One can workaround this with LGBMRegressor and cross_entropy objective but I don't think this is semantically correct.

Some other checks can be added in RegressorMixin and ClassifierMixin at scikit-learn side that will prevent using some potential new LightGBM features.

So in general LGBMModel is for everything beyond classification and regression.

@jameslamb jameslamb mentioned this pull request Jan 25, 2021
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes look good, thanks very much!

@jameslamb
Copy link
Collaborator

So in general LGBMModel is for everything beyond classification and regression.

Ok, thanks for the explanation! I've created #3845 for the feature request.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants