Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group, init_score, and sample_weight cannot be type Dask DataFrame #4375

Closed
ffineis opened this issue Jun 14, 2021 · 2 comments · Fixed by #4558
Closed

Group, init_score, and sample_weight cannot be type Dask DataFrame #4375

ffineis opened this issue Jun 14, 2021 · 2 comments · Fixed by #4558
Labels

Comments

@ffineis
Copy link
Contributor

ffineis commented Jun 14, 2021

Description

The group, sample_weight, and init_score Dask estimator parameters cannot be Dask DataFrames. They are currently described as "Dask Array, Dask DataFrame, Dask Series..." (e.g. here). Serial LightGBM estimators do not accept DataFrame values for these parameters - they should be array-like and not 2-D - which means that the distributed versions of these params also cannot be 2-D.

In python-package/lightgbm/dask.py these args' typing needs to reflect this in dask.py by adding a _DaskAarrayLike constant equal to Union[dask_Array, dask_Series] and then changing any required docstrings to remove the mention that they can be "Dask DataFrame" type.

Reproducible example

import dask.dataframe as dd
from dask.distributed import Client, LocalCluster
import lightgbm as lgb
import numpy as np
import pandas as pd

X = pd.DataFrame(np.random.normal(size = (100, 5)))
y = pd.Series(np.random.choice([0, 1], size = 100))
init_score = pd.DataFrame(np.ones(100))

# -- get TypeError because init_score, group are 2-D.
clf = lgb.LGBMClassifier()
clf.fit(X, y)

g = pd.DataFrame([10] * 10, columns = ['group'])
rnk = lgb.LGBMRanker()
rnk.fit(X, y, group=g)

# -- similarly, in the distributed case...
client = Client(LocalCluster())

dX = dd.from_pandas(X, chunksize=10)
dy = dd.from_pandas(y, chunksize=10)
dinit_score = dd.from_pandas(pd.DataFrame(np.ones(100)), chunksize=10)

# -- still TypeError because distributed init_score is 2-D.
clf = lgb.DaskLGBMClassifier()
clf.fit(dX, dy, init_score=dinit_score)

Environment info

LightGBM version or commit hash: 3.2.0

Command(s) you used to install LightGBM

pip install lightgbm

Additional Comments

First raised here: #4101 (comment)

@jameslamb
Copy link
Collaborator

Thanks for writing this up! I'm adding the literal text of the error message here, so people will be able to find this issue from search engines if they encounter it.

Traceback (most recent call last):
File "", line 1, in
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/sklearn.py", line 1000, in fit
super().fit(X, y, sample_weight=sample_weight, init_score=init_score, group=group,
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/sklearn.py", line 694, in fit
self._Booster = train(params, train_set,
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/engine.py", line 233, in train
booster = Booster(params=params, train_set=train_set)
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 2228, in init
train_set.construct()
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 1462, in construct
self._lazy_init(self.data, label=self.label,
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 1282, in _lazy_init
self.set_group(group)
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 1860, in set_group
group = list_to_1d_numpy(group, np.int32, name='group')
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 164, in list_to_1d_numpy
raise TypeError(f"Wrong type({type(data).name}) for {name}.\n"
TypeError: Wrong type(DataFrame) for group.
It should be list, numpy 1-D array or pandas Series

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants