-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10 #5398
Conversation
X_test = _check_copy_and_writeable(X_test, copy).astype(np.float64, | ||
copy=False) | ||
y_test = _check_copy_and_writeable(y_test, copy).astype(np.float64, | ||
copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I do this correctly?? (@MechCoder(?)) Will this work properly when copy is False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need as_float_array here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI .astype(np.float64, copy=False)
is not supported on old numpy. we have a backport in sklearn.utils.fixes
.
@GaelVaroquaux's suggestion to use as_float_array
is better to avoid a systematic upcast of 32 bit float to 64 bit float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for that it was me in the previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats awesome... are we having a bot?? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and okay :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the input checks on dtypes should probably be done outside, in the public function or in the fit method.
We do the _check_copy_and_writeable
check in this function for specific constraints caused by the use of multiprocessing with memmap data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the error happening in a test that calls _lars_path_residues
directly? If so it's better to change the test to pass float data only.
d922bfe
to
1f20d75
Compare
1f20d75
to
0c0de34
Compare
beta, | ||
(1. - error_vect) * self.learning_rate) | ||
sample_weight *= as_float_array(np.power( | ||
beta, (1. - error_vect) * self.learning_rate), copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure which fix is more preferrable but I had a slightly different fix for the AdaboostRegressor failure, see this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that actually seems better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! I'll take @lesteve's fix here...!!
2 questions -
I've explicitly passed float64 in the test as a workaround... is that correct? or should binarizer raise an error (or warning)? |
@@ -563,8 +565,7 @@ def inverse_transform(self, X, copy=True): | |||
""" | |||
check_is_fitted(self, 'mixing_') | |||
|
|||
if copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that not check_array here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and why would we want to copy? fast_dot
doesn't change inplace, right? That wouldn't make any sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes X += self.mean_
made numpy 0.10 raise an error since X is not explicitly float but mean_
was... (is this fix okay or should I rather do check_array(X, dtype=FLOAT_DTYPES)
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so even if self.mixing_
is float, fast_dot(X, self.mixing_.T)
is not? That is somewhat surprising to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fast_dot
does casting according to my quick check on numpy 1.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! So I must've done this pre-emptively I think... I'll check against master once to confirm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have replaced with check_array
...! And yes I did this pre-emptively since there was a failure w.r.t to fast ica here -
======================================================================
ERROR: sklearn.tests.test_common.test_non_meta_estimators('FastICA', <class 'sklearn.decomposition.fastica_.FastICA'>)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/scikit-learn/sklearn/utils/testing.py", line 317, in wrapper
return fn(*args, **kwargs)
File "/scikit-learn/sklearn/utils/estimator_checks.py", line 669, in check_estimators_dtypes
estimator.fit(X_train, y)
File "/scikit-learn/sklearn/decomposition/fastica_.py", line 522, in fit
self._fit(X, compute_sources=False)
File "/scikit-learn/sklearn/decomposition/fastica_.py", line 478, in _fit
compute_sources=compute_sources, return_n_iter=True)
File "/scikit-learn/sklearn/decomposition/fastica_.py", line 300, in fastica
X -= X_mean[:, np.newaxis]
TypeError: Cannot cast ufunc subtract output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set copy=copy and self.whiten
? There is no need to trigger a copy when self.whiten
is False.
If there needs to be a type conversion, then copy=False shouldn't do anything. Why does it fail? |
Not if the initial datatype is of a lower type like int and we want it converted to float right? So that overrides the copy=False (ref: this comment in one stack overflow answer ) |
ec73540
to
6c960e8
Compare
Exactly, that overwrites copy=False, i.e. copy=False has no effect, as I said. Is the test to see whether copy=False works? I only see "is not" tests, not "is" tests. Which test are we talking about? |
6c960e8
to
c02ad8b
Compare
Oh :P Anyway I had meant for this commit, which fixes the previously failing check which tests if the |
Ah, ok. Yeah the fix is fine. |
@amueller One more question, do we need a |
- DISTRIB="conda" PYTHON_VERSION="3.4" INSTALL_MKL="true" | ||
NUMPY_VERSION="1.8.1" SCIPY_VERSION="0.14.0" | ||
- DISTRIB="conda" PYTHON_VERSION="3.5" INSTALL_MKL="true" | ||
NUMPY_VERSION="1.10.1" SCIPY_VERSION="0.16.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c02ad8b
to
4fd9e8f
Compare
This failure is from python3.5 rather than due to the new version of numpy / scipy I think --
|
It looks to me like some numerics changed, which is more likely to be a numpy/ scipy issue. Also appveyor already runs 3.5 |
see numpy/numpy#6462 |
Okay!! So this waits until that gets fixed? |
we could put in a temporary fix? Why is there the median of an empty array in the test? |
e0370d0
to
01c36de
Compare
|
||
|
||
def safe_mean(arr, *args, **kwargs): | ||
# np.mean([]) raises a RuntimeWarning for numpy >= 1.10.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and before, I think. But the statement is not wrong, I guess ^^
|
||
def safe_mean(arr, *args, **kwargs): | ||
# np.mean([]) raises a RuntimeWarning for numpy >= 1.10.1 | ||
length = arr.size if hasattr(arr, 'size') else len(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line is missing here now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol sorry :P done :)
LGTM once the missing line is added back in and the tests pass. |
01c36de
to
77b774a
Compare
Right, we don't, let us move that to another PR. Just my one comment about the binarizer remains, i.e (https://github.com/scikit-learn/scikit-learn/pull/5398/files#diff-5ebddebc20987b6125fffc893f5abc4cR1336) if this is necessary. |
@@ -1243,10 +1244,20 @@ def test_binarizer(): | |||
assert_equal(np.sum(X_bin == 0), 2) | |||
assert_equal(np.sum(X_bin == 1), 4) | |||
|
|||
# dtype of X is int* and binarizer will require the X to be of type | |||
# float and hence inplace computation (without a copy will fail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first test doesn't really make sense, right? was that your comment @MechCoder ?
|
@@ -1333,7 +1333,8 @@ def binarize(X, threshold=0.0, copy=True): | |||
using the ``Transformer`` API (e.g. as part of a preprocessing | |||
:class:`sklearn.pipeline.Pipeline`) | |||
""" | |||
X = check_array(X, accept_sparse=['csr', 'csc'], copy=copy) | |||
X = check_array(X, accept_sparse=['csr', 'csc'], copy=copy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why this change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raised no errors... This was done just to fix the input dtype to float (preemptively)... Is this unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, seems so please remove it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I assume you haven't made any such changes pre emptively elsewhere :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol :P wait I'll confirm in a minute!
The inplace modification produces an error only when you do something like |
|
Exactly, in the first two you are explicitly casting the dtype of X, so it triggers a copy. In the second the dtype of X remains the same, and 3.0 is cast to an int, ie. the dtype of X |
Okay so your question is same as andy's asking why I made that change at binarize? (I assumed you asked why binarize was copying it after that change :p) I'll revert that in a bit :) And I think here too - https://github.com/scikit-learn/scikit-learn/pull/5398/files#r42003035 ;) |
FIX set copy to (copy & whiten). FIX/DOC Use float outputs for doctest
MAINT 3rd travis build should be run on python3.5
080d758
to
47f19e9
Compare
lgtm as well |
Thanks for the reviews :) |
[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10
Thanks ! 🍷 🍷 🍷 |
getting more classy here. Though if you drink three glasses at once, maybe not that classy any more? |
I like it when people underestimate my abilities ;) |
Fixes #5397
Explicitly specifies the dtypes to avoid failures in numpy v10...
@amueller