input validation in predict of DummyRegressor is too strict #9832
Labels
Comments
I think this may be fixed in 0.19. Please check
…On 26 September 2017 at 20:00, David Catteeuw ***@***.***> wrote:
Description
sklearn.dummy.DummyRegressor requires X to be numeric and have at least 1
feature in predict. IMO, this is too strict.
Also, it is not done in fit.
Steps/Code to Reproduce
import numpy as np
import pandas as pd
import sklearn.dummy
cls = sklearn.dummy.DummyRegressor(strategy='mean')
df = pd.DataFrame(data={'A': ['foo', 'bar', 'baz']})
X = df.loc[:, ['A']].values # X's data is of type object
y = [1, 2, 3]
cls.fit(X, y)
cls.predict(X)
Expected Result
returns [2, 2, 2] and no ValueError at all.
Actual Result
Traceback (most recent call last):
File "<input>", line 1, in <module>
File ".../python3.6/site-packages/sklearn/dummy.py", line 468, in predict
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
File ".../python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'baz'
Versions
Python 3.6.1 | packaged by conda-forge | (default, Mar 23 2017, 21:57:00)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1
Proposal
in sklearn.dummy.DummyRegressor, method predict call check_array with
kwargs dtype=None and ensure_min_features=0
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9832>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz6zcb6jQBHWtfRYju4kn8b8szX69Lks5smMtJgaJpZM4Pj_QK>
.
|
Good point. I forgot to mention there was a similar issue that is fixed in 0.19: #8916 In 0.19, check_array is called with force_all_finite=False. This means you can now pass data with missing values, but X must still be numeric and have at least 1 feature. |
Okay |
Do we need it to be an array?
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
sklearn.dummy.DummyRegressor
requiresX
to be numeric and have at least 1 feature inpredict
. IMO, this is too strict.Also, it is not done in
fit
.Steps/Code to Reproduce
Expected Result
returns
[2, 2, 2]
and no ValueError at all.Actual Result
Versions
Python 3.6.1 | packaged by conda-forge | (default, Mar 23 2017, 21:57:00)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1
Proposal
in
sklearn.dummy.DummyRegressor
, methodpredict
callcheck_array
with kwargsdtype=None
andensure_min_features=0
The text was updated successfully, but these errors were encountered: