[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10 #5398

raghavrv · 2015-10-14T10:51:22Z

Explicitly specifies the dtypes to avoid failures in numpy v10...

raghavrv · 2015-10-14T10:52:31Z

sklearn/linear_model/least_angle.py

+    X_test = _check_copy_and_writeable(X_test, copy).astype(np.float64,
+                                                            copy=False)
+    y_test = _check_copy_and_writeable(y_test, copy).astype(np.float64,
+                                                            copy=False)


Did I do this correctly?? (@MechCoder(?)) Will this work properly when copy is False?

I think you need as_float_array here too.

FYI .astype(np.float64, copy=False) is not supported on old numpy. we have a backport in sklearn.utils.fixes.

@GaelVaroquaux's suggestion to use as_float_array is better to avoid a systematic upcast of 32 bit float to 64 bit float.

Sorry for that it was me in the previous comment.

thats awesome... are we having a bot?? :D

and okay :)

But the input checks on dtypes should probably be done outside, in the public function or in the fit method.

We do the _check_copy_and_writeable check in this function for specific constraints caused by the use of multiprocessing with memmap data.

Is the error happening in a test that calls _lars_path_residues directly? If so it's better to change the test to pass float data only.

lesteve · 2015-10-14T11:33:20Z

Maybe it would be worth adding an additional distribution with Python 3.5 and latest numpy + scipy versions to .travis.yml ? (as noted in #5397)

You can cherry-pick or take inspiration from this commit.

lesteve · 2015-10-14T12:26:49Z

sklearn/ensemble/weight_boosting.py

-                beta,
-                (1. - error_vect) * self.learning_rate)
+            sample_weight *= as_float_array(np.power(
+                beta, (1. - error_vect) * self.learning_rate), copy=False)


Not sure which fix is more preferrable but I had a slightly different fix for the AdaboostRegressor failure, see this

that actually seems better

Okay! I'll take @lesteve's fix here...!!

raghavrv · 2015-10-14T12:58:02Z

2 questions -

Do we need any DataConversion warning here at any of these places?
One of the Binarizer test fails as copy is set to False but the typecasting (from int to float) can't be done inplace.

I've explicitly passed float64 in the test as a workaround... is that correct? or should binarizer raise an error (or warning)?

amueller · 2015-10-14T14:48:19Z

sklearn/decomposition/fastica_.py

@@ -563,8 +565,7 @@ def inverse_transform(self, X, copy=True):
        """
        check_is_fitted(self, 'mixing_')

-        if copy:


Why is that not check_array here?

and why would we want to copy? fast_dot doesn't change inplace, right? That wouldn't make any sense.

Was there an error here?

Yes X += self.mean_ made numpy 0.10 raise an error since X is not explicitly float but mean_ was... (is this fix okay or should I rather do check_array(X, dtype=FLOAT_DTYPES)?)

so even if self.mixing_ is float, fast_dot(X, self.mixing_.T) is not? That is somewhat surprising to me.

fast_dot does casting according to my quick check on numpy 1.10

Oh! So I must've done this pre-emptively I think... I'll check against master once to confirm!

Have replaced with check_array...! And yes I did this pre-emptively since there was a failure w.r.t to fast ica here -

====================================================================== ERROR: sklearn.tests.test_common.test_non_meta_estimators('FastICA', <class 'sklearn.decomposition.fastica_.FastICA'>) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/scikit-learn/sklearn/utils/testing.py", line 317, in wrapper return fn(*args, **kwargs) File "/scikit-learn/sklearn/utils/estimator_checks.py", line 669, in check_estimators_dtypes estimator.fit(X_train, y) File "/scikit-learn/sklearn/decomposition/fastica_.py", line 522, in fit self._fit(X, compute_sources=False) File "/scikit-learn/sklearn/decomposition/fastica_.py", line 478, in _fit compute_sources=compute_sources, return_n_iter=True) File "/scikit-learn/sklearn/decomposition/fastica_.py", line 300, in fastica X -= X_mean[:, np.newaxis] TypeError: Cannot cast ufunc subtract output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

Should we set copy=copy and self.whiten? There is no need to trigger a copy when self.whiten is False.

amueller · 2015-10-14T15:00:45Z

If there needs to be a type conversion, then copy=False shouldn't do anything. Why does it fail?

raghavrv · 2015-10-14T15:02:56Z

Not if the initial datatype is of a lower type like int and we want it converted to float right? So that overrides the copy=False (ref: this comment in one stack overflow answer )

amueller · 2015-10-14T15:06:43Z

Exactly, that overwrites copy=False, i.e. copy=False has no effect, as I said. Is the test to see whether copy=False works? I only see "is not" tests, not "is" tests. Which test are we talking about?

raghavrv · 2015-10-14T15:13:27Z

Oh :P Anyway I had meant for this commit, which fixes the previously failing check which tests if the X is X_bin when copy is set to False... Is that commit the correct thing to do?

amueller · 2015-10-14T15:14:00Z

Ah, ok. Yeah the fix is fine.

raghavrv · 2015-10-14T15:20:59Z

@amueller One more question, do we need a DataConversion or some kind of warning to notify the user that the conversion is taking place?

raghavrv · 2015-10-14T15:22:15Z

.travis.yml

-    - DISTRIB="conda" PYTHON_VERSION="3.4" INSTALL_MKL="true"
-      NUMPY_VERSION="1.8.1" SCIPY_VERSION="0.14.0"
+    - DISTRIB="conda" PYTHON_VERSION="3.5" INSTALL_MKL="true"
+      NUMPY_VERSION="1.10.1" SCIPY_VERSION="0.16.0"


@lesteve @ogrisel Is this better than * since we know which version we are testing against?

raghavrv · 2015-10-14T15:45:58Z

This failure is from python3.5 rather than due to the new version of numpy / scipy I think --

======================================================================

ERROR: sklearn.preprocessing.tests.test_imputation.test_imputation_mean_median

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/miniconda/envs/testenv/lib/python3.5/site-packages/nose/case.py", line 198, in runTest

    self.test(*self.arg)

  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/preprocessing/tests/test_imputation.py", line 162, in test_imputation_mean_median

    true_statistics[j] = true_value_fun(z, v, p)

  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/preprocessing/tests/test_imputation.py", line 139, in <lambda>

    ("median", "NaN", lambda z, v, p: np.median(np.hstack((z, v)))),

  File "/home/travis/miniconda/envs/testenv/lib/python3.5/site-packages/numpy/lib/function_base.py", line 3084, in median

    overwrite_input=overwrite_input)

  File "/home/travis/miniconda/envs/testenv/lib/python3.5/site-packages/numpy/lib/function_base.py", line 2997, in _ureduce

    r = func(a, **kwargs)

  File "/home/travis/miniconda/envs/testenv/lib/python3.5/site-packages/numpy/lib/function_base.py", line 3138, in _median

    n = np.isnan(part[..., -1])

IndexError: index -1 is out of bounds for axis 0 with size 0

----------------------------------------------------------------------

amueller · 2015-10-14T15:47:20Z

It looks to me like some numerics changed, which is more likely to be a numpy/ scipy issue. Also appveyor already runs 3.5

amueller · 2015-10-14T15:54:17Z

see numpy/numpy#6462

raghavrv · 2015-10-14T16:42:58Z

Okay!! So this waits until that gets fixed?

amueller · 2015-10-14T17:15:00Z

we could put in a temporary fix? Why is there the median of an empty array in the test?

amueller · 2015-10-15T15:31:48Z

sklearn/preprocessing/tests/test_imputation.py

+
+
+def safe_mean(arr, *args, **kwargs):
+    # np.mean([]) raises a RuntimeWarning for numpy >= 1.10.1


and before, I think. But the statement is not wrong, I guess ^^

amueller · 2015-10-15T15:32:52Z

sklearn/preprocessing/tests/test_imputation.py

+
+def safe_mean(arr, *args, **kwargs):
+    # np.mean([]) raises a RuntimeWarning for numpy >= 1.10.1
+    length = arr.size if hasattr(arr, 'size') else len(arr)


the line is missing here now.

lol sorry :P done :)

amueller · 2015-10-15T15:33:16Z

LGTM once the missing line is added back in and the tests pass.

MechCoder · 2015-10-15T15:38:56Z

why do we need copy here at all?

Right, we don't, let us move that to another PR.

Just my one comment about the binarizer remains, i.e (https://github.com/scikit-learn/scikit-learn/pull/5398/files#diff-5ebddebc20987b6125fffc893f5abc4cR1336) if this is necessary.

amueller · 2015-10-15T15:43:45Z

sklearn/preprocessing/tests/test_data.py

@@ -1243,10 +1244,20 @@ def test_binarizer():
        assert_equal(np.sum(X_bin == 0), 2)
        assert_equal(np.sum(X_bin == 1), 4)

+        # dtype of X is int* and binarizer will require the X to be of type
+        # float and hence inplace computation (without a copy will fail)


This first test doesn't really make sense, right? was that your comment @MechCoder ?

raghavrv · 2015-10-15T15:49:56Z

@amueller no the question was when some dtype conversions can happen inplace, why does binarize unable to do so...?
It gets changed at check_array itself! I am looking into it :)

amueller · 2015-10-15T15:55:07Z

sklearn/preprocessing/data.py

@@ -1333,7 +1333,8 @@ def binarize(X, threshold=0.0, copy=True):
    using the ``Transformer`` API (e.g. as part of a preprocessing
    :class:`sklearn.pipeline.Pipeline`)
    """
-    X = check_array(X, accept_sparse=['csr', 'csc'], copy=copy)
+    X = check_array(X, accept_sparse=['csr', 'csc'], copy=copy,


so why this change here?

This raised no errors... This was done just to fix the input dtype to float (preemptively)... Is this unnecessary?

yup, seems so please remove it :)

And I assume you haven't made any such changes pre emptively elsewhere :P

lol :P wait I'll confirm in a minute!

MechCoder · 2015-10-15T15:57:18Z

The inplace modification produces an error only when you do something like += , -=, etc

raghavrv · 2015-10-15T15:58:38Z

>>> X = np.array([[1, 0, 5], [2, 3, -1]])
>>> print(X is check_array(X, copy=False, dtype=np.float64))
False
>>> print(X is np.array(X, dtype=np.float64, copy=False)) # This is done by check_array
False
>>> X[X>3]=3.0
>>> print(X is X)
True

MechCoder · 2015-10-15T16:01:49Z

Exactly, in the first two you are explicitly casting the dtype of X, so it triggers a copy. In the second the dtype of X remains the same, and 3.0 is cast to an int, ie. the dtype of X

raghavrv · 2015-10-15T16:15:02Z

Okay so your question is same as andy's asking why I made that change at binarize? (I assumed you asked why binarize was copying it after that change :p)

I'll revert that in a bit :)

And I think here too - https://github.com/scikit-learn/scikit-learn/pull/5398/files#r42003035 ;)

FIX set copy to (copy & whiten). FIX/DOC Use float outputs for doctest

MAINT 3rd travis build should be run on python3.5

MechCoder · 2015-10-15T16:27:57Z

lgtm as well

raghavrv · 2015-10-15T16:28:36Z

Thanks for the reviews :)

[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10

MechCoder · 2015-10-15T16:53:32Z

Thanks ! 🍷 🍷 🍷

amueller · 2015-10-15T17:03:41Z

getting more classy here. Though if you drink three glasses at once, maybe not that classy any more?

MechCoder · 2015-10-15T17:06:16Z

I like it when people underestimate my abilities ;)

raghavrv reviewed Oct 14, 2015
View reviewed changes

raghavrv force-pushed the fix_for_numpy_10 branch from d922bfe to 1f20d75 Compare October 14, 2015 11:31

raghavrv force-pushed the fix_for_numpy_10 branch from 1f20d75 to 0c0de34 Compare October 14, 2015 11:53

raghavrv mentioned this pull request Oct 14, 2015

Numpy 1.10.1 compatibility #5397

Closed

lesteve reviewed Oct 14, 2015
View reviewed changes

amueller reviewed Oct 14, 2015
View reviewed changes

raghavrv force-pushed the fix_for_numpy_10 branch from ec73540 to 6c960e8 Compare October 14, 2015 15:05

raghavrv force-pushed the fix_for_numpy_10 branch from 6c960e8 to c02ad8b Compare October 14, 2015 15:06

raghavrv reviewed Oct 14, 2015
View reviewed changes

raghavrv force-pushed the fix_for_numpy_10 branch from c02ad8b to 4fd9e8f Compare October 14, 2015 15:29

raghavrv force-pushed the fix_for_numpy_10 branch from e0370d0 to 01c36de Compare October 15, 2015 15:15

amueller reviewed Oct 15, 2015
View reviewed changes

amueller changed the title ~~[MRG] FIX dtypes to conform to the stricter type cast rules of numpy 1.10~~ [MRG + 1] FIX dtypes to conform to the stricter type cast rules of numpy 1.10 Oct 15, 2015

amueller reviewed Oct 15, 2015
View reviewed changes

raghavrv force-pushed the fix_for_numpy_10 branch from 01c36de to 77b774a Compare October 15, 2015 15:35

amueller reviewed Oct 15, 2015
View reviewed changes

raghavrv and others added 5 commits October 15, 2015 18:21

FIX dtypes to conform to the stricter type cast rules of numpy 1.10

4f4c359

FIX set copy to (copy & whiten). FIX/DOC Use float outputs for doctest

MAINT Update the 3rd travis build to latest versions of numpy and scipy

affa122

MAINT 3rd travis build should be run on python3.5

FIX Move validation from helper to main function

21fab6e

FIX AdaBoostRegressort test failure with numpy 1.10

cac1d28

MAINT add safe_{median|mean} for np 1.10.1

47f19e9

raghavrv force-pushed the fix_for_numpy_10 branch from 080d758 to 47f19e9 Compare October 15, 2015 16:22

MechCoder changed the title ~~[MRG + 1] FIX dtypes to conform to the stricter type cast rules of numpy 1.10~~ [MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10 Oct 15, 2015

amueller added a commit that referenced this pull request Oct 15, 2015

Merge pull request #5398 from rvraghav93/fix_for_numpy_10

1b9e791

[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10

amueller merged commit 1b9e791 into scikit-learn:master Oct 15, 2015

raghavrv deleted the fix_for_numpy_10 branch October 16, 2015 12:45

RobinVogel mentioned this pull request Nov 8, 2015

Linear discriminant analysis yields numpy version error #5765

Closed



		def safe_mean(arr, args, *kwargs):
		# np.mean([]) raises a RuntimeWarning for numpy >= 1.10.1

[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10 #5398

[MRG + 2] FIX dtypes to conform to the stricter type cast rules of numpy 1.10 #5398

Conversation

raghavrv commented Oct 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lesteve commented Oct 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv commented Oct 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Oct 14, 2015

raghavrv commented Oct 14, 2015

amueller commented Oct 14, 2015

raghavrv commented Oct 14, 2015

amueller commented Oct 14, 2015

raghavrv commented Oct 14, 2015

Choose a reason for hiding this comment

raghavrv commented Oct 14, 2015

amueller commented Oct 14, 2015

amueller commented Oct 14, 2015

raghavrv commented Oct 14, 2015

amueller commented Oct 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Oct 15, 2015

MechCoder commented Oct 15, 2015

Choose a reason for hiding this comment

raghavrv commented Oct 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MechCoder commented Oct 15, 2015

raghavrv commented Oct 15, 2015

MechCoder commented Oct 15, 2015

raghavrv commented Oct 15, 2015

MechCoder commented Oct 15, 2015

raghavrv commented Oct 15, 2015

MechCoder commented Oct 15, 2015

amueller commented Oct 15, 2015

MechCoder commented Oct 15, 2015