-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stacking_cv_regression.py should use X, y = check_X_y(X, y, accept_sparse=['csc', 'csr'], dtype=None) #562
Comments
Thanks for the note! I think the current implementation should work fine though: from sklearn.utils import check_X_y
import numpy as np
X = np.random.random((10, 2)).astype(str)
y = np.ones(10)
check_X_y(X, y, accept_sparse=['csc', 'csr']) returns
However, based on the FutureWarning: /Users/sebastian/miniconda3/lib/python3.7/site-packages/sklearn/utils/validation.py:532: FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in scikit-learn, for example by using your_array = your_array.astype(np.float64). It may stop working in the next release. What would your recommendation be for handling that type of input where you have object-type output? One could simply get rid of chck_X_y, but maybe you have an alternative in mind? EDIT: Ooops, I overlooked your suggestion ... I thought dtype= |
Shouldn't it be |
Thanks for the note. Yes, the idea is that the models should take care of that. One might argue that this is also true for "np.inf". So, we could change it to |
Good point, best not to discard any kind of data a prior. There's another issue, somehow Also, the |
Yeah, I think it may not be necessary to recast the inputs to NumPy arrays ... depending on the behavior of the estimators used within the stacking classifier. Regarding the inconsistency between the StackingCVClassifier and the StackingClassifier, that's probably because they were implemented independently and have been modified inconsistently with respect to each other. Some time ago, we tried to unify them in terms of (Python) class inheritance but haven't done so yet. |
I ran into this issue in Are you open to a PR for this? (Just curious: the recent |
Yes, I am open to it. I think I already addressed that via #606 though, but you are welcome to take another look if there are still issues. |
@rasbt Thanks for pointing me to that issue; I hadn't seen it, and it does resolve my error. Unfortunately in my firewalled env I'm restricted to using official pip releases. The MR from #606 came after the current v0.17 on PyPy. Any idea when v0.18 will be released, or is there a possibility for releasing v0.17.1 with the various changes since? That would be awesome. @tjhgit As for this Issue, it can probably be closed, unless @tjhgit wants to re-introduce the |
just catching up with notifications: Made a 0.17.1 version ~2 weeks back that should have that fix :). |
currently stacking_cv_regression.py uses X, y = check_X_y(X, y, accept_sparse=['csc', 'csr']) which checks for numeric input.
However if the first layer models perform a categorical encoding it is perfectly fine not to have numeric but rather np.array of type object.
So please change the check to:
X, y = check_X_y(X, y, accept_sparse=['csc', 'csr'], dtype=None)
The text was updated successfully, but these errors were encountered: