Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buffer dtype mismatch in isotonic regression #14902

Merged
merged 6 commits into from Sep 24, 2019

Conversation

@lostcoaster
Copy link
Contributor

lostcoaster commented Sep 6, 2019

Fixes #15004

Error replay:

from sklearn import isotonic
import numpy as np
m = isotonic.IsotonicRegression()
m.fit(np.zeros((10,),dtype='float32'), np.zeros((10,),dtype='int64'))

Gives

File "sklearn/_isotonic.pyx", line 66, in sklearn._isotonic._make_unique
ValueError: Buffer dtype mismatch, expected 'float' but got 'double'

Tested under : scikit-learn==0.21.3 numpy==0.17.0.

A more realistic scenario is creating an XGBClassifier model (with xgboost==0.90), and run sklearn.calibration.CalibratedClassifierCV on it. The same error happens.

I am not good at Cython but I think the reason is by using check_array, X and y get converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.

Error replay:
```
from sklearn import isotonic
import numpy as np
m = isotonic.IsotonicRegression()
m.fit(np.zeros((10,),dtype='float32'), np.zeros((10,),dtype='int64'))
```
Gives
```
File "sklearn/_isotonic.pyx", line 66, in sklearn._isotonic._make_unique
ValueError: Buffer dtype mismatch, expected 'float' but got 'double'
```
Tested under : `scikit-learn==0.21.3` `numpy==0.17.0`.

A more realistic scenario is creating an XGBClassifier model (with xgboost==0.90), and run `sklearn.calibration.CalibratedClassifierCV` on it. The same error happens.

I am not good at Cython but I think the reason is by using `check_array`, `X` and `y` get converted to different types. Here is a simple fix that make the code work, I think there should be a better way to fix it in Cython code.
Copy link
Member

thomasjpfan left a comment

Please add a test similar to your code snippet to verify that the fix works.

@@ -325,7 +325,7 @@ def fit(self, X, y, sample_weight=None):
new input data.
"""
check_params = dict(accept_sparse=False, ensure_2d=False,
dtype=[np.float64, np.float32])
dtype=[np.float64])
X = check_array(X, **check_params)
y = check_array(y, **check_params)

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan Sep 6, 2019

Member

We need to make sure y has the same dtype as X:

X = check_array(X, dtype=[np.float64, np.float32], accept_sparse=False, ensure_2d=False)
y = check_array(y, dtype=X.dtype, accept_sparse=False, ensure_2d=False)

This comment has been minimized.

Copy link
@lostcoaster

lostcoaster Sep 9, 2019

Author Contributor

Thanks for the review, I fixed the code and added a test case.

sklearn/tests/test_isotonic.py Outdated Show resolved Hide resolved
@rth
rth approved these changes Sep 18, 2019
Copy link
Member

rth left a comment

LGTM otherwise.

Copy link
Member

jnothman left a comment

Otherwise LGTM

@@ -385,19 +385,19 @@ def test_isotonic_ymin_ymax():
-0.896, -0.377, -1.327, 0.180])
y = isotonic_regression(x, y_min=0., y_max=0.1)

assert(np.all(y >= 0))
assert(np.all(y <= 0.1))
assert (np.all(y >= 0))

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 18, 2019

Member

if you're going to do a cosmetic clean-up here, it would be best to remove the parentheses.

@@ -485,6 +485,16 @@ def test_isotonic_dtype():
assert res.dtype == expected_dtype


def test_isotonic_mismatched_dtype():

This comment has been minimized.

Copy link
@jnothman

jnothman Sep 18, 2019

Member

This would actually be a great test for all regressors, I reckon. We should include it in the common tests (sklearn/utils/estimator_checks.py). Would you like to contribute that in a later PR?

This comment has been minimized.

Copy link
@lostcoaster

lostcoaster Sep 18, 2019

Author Contributor

I can do that. :)

BTW the space thing is added by PyCharm so it didn't remove the parenthesis. My working PC is not at my hand so I cannot change it now. Will fix later if not merged.

@narendramukherjee

This comment has been minimized.

Copy link

narendramukherjee commented Sep 18, 2019

Is this PR going to be merged soon? I was running into the same issue #15004

Co-Authored-By: Roman Yurchak <rth.yurchak@gmail.com>
@rth

This comment has been minimized.

Copy link
Member

rth commented Sep 18, 2019

Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

@rth

This comment has been minimized.

Copy link
Member

rth commented Sep 18, 2019

Never mind, I was wrong: IsotonicRegression seems to currently only take 1D arrays as input. IMO it's a bug #15012 However please revert my suggestion for now so that this PR can be merged. Sorry about that. :)

@lostcoaster

This comment has been minimized.

Copy link
Contributor Author

lostcoaster commented Sep 18, 2019

No worries, I will revert the change and apply other suggestions tomorrow. :)

lucas added 2 commits Sep 19, 2019
lucas
@lostcoaster lostcoaster force-pushed the lostcoaster:patch-1 branch from fa9b1ef to 11e1ebe Sep 19, 2019
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Sep 19, 2019

Relevant test failure: Arrays are not almost equal to 2 decimals ACTUAL: 0.99485082169712713 DESIRED: 1

@lostcoaster

This comment has been minimized.

Copy link
Contributor Author

lostcoaster commented Sep 19, 2019

It is weird, the last commit passed all tests, and the new commit only modified an rst file. How could it trigger failure?

@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented Sep 19, 2019

I think test_fastica_simple is irrelevant to this PR and fails randomly.

@glemaitre glemaitre self-requested a review Sep 24, 2019
@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Sep 24, 2019

Since there are already 2 approvals, I just pushed a commit parametrizing the test.
I will merge if the CIs turn green

@glemaitre glemaitre added this to TO BE MERGED in Guillaume's pet Sep 24, 2019
@glemaitre glemaitre merged commit 93c628e into scikit-learn:master Sep 24, 2019
19 checks passed
19 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python 9 new alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.74%)
Details
codecov/project Absolute coverage decreased by -2.43% but relative coverage increased by +3.25% compared to d369a5f
Details
scikit-learn.scikit-learn Build #20190924.11 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl) Linux pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@glemaitre glemaitre moved this from TO BE MERGED to MERGED in Guillaume's pet Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
6 participants
You can’t perform that action at this time.