-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutputCodeClassifier does not work with sparse input data #17218
Comments
From what I can see, it seems that we could delegate the check to the underlying classifier. So we can easily add a non-regression test without relying on XGBoost as well. @zoj613 do you want to make a PR with the fix and the non-regression test? |
It seems that there is also another change that needs to be made so that it works too when calling scikit-learn/sklearn/multiclass.py Line 853 in 8abe0a8
also needs accept_sparse=True to work.
Also, ive just noticed that the code on the master branch has changed. The fit method now calls scikit-learn/sklearn/multiclass.py Line 805 in 8abe0a8
and it seems that _validate_data allows **check_params keywords to be passed. Those keyword arguments are then passed to check_array (which allows accept_sparse=True parameter).
Do you think adding an extra keyword argument to |
No I think that it is more elegant to just accept sparse in both |
Okay, no problem. Im on it. |
NVM, i figured it out. made a stupid error of not using |
Describe the bug
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
is thrown when passing a sparse matrix to thefit
methodSteps/Code to Reproduce
Expected Results
No error thrown, successful fitting
Actual Results
It appears that the
check_X_y
function causes the exception and is not set to allow sparse matrices.This is especially bad when using this classifier in a pipeline where the previous step outputs a sparse matrix. The easy workaround in this case was to create an intermediate transformer to convert the sparse to dense
unfortunately this causes everything to crash because of ram being filled up by using a huge dense matrix. Simply adding the keyword argument
allow_sparse=True
to thecheck_X_y
function fixes this bug.Versions
The text was updated successfully, but these errors were encountered: