Unstable test_common.test_transformers under Windows with Python 32-bit for some estimators #3255

Closed
kastnerkyle opened this Issue Jun 6, 2014 · 6 comments

Comments

Projects
None yet
4 participants
@kastnerkyle
Owner

kastnerkyle commented Jun 6, 2014

I am seeing failing tests with both python 2.7 and python 3.4 for Windows

scipy 0.14
numpy 1.8.1
all 32 bit

CCA, LLE, and KernelPCA seem to be the primary culprits. Here is a sample traceback

======================================================================
FAIL: sklearn.tests.test_common.test_transformers('KernelPCA', <class 'sklearn.decomposition.kernel_pca.KernelPCA
ay([[ 2.51189522,  2.6430893 ,  2.54847718],
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Python27\lib\site-packages\sklearn\tests\test_common.py", line 269, in check_transformer
    % Transformer)
  File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 599, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 2 decimals
consecutive fit_transform outcomes not consistent in <class 'sklearn.decomposition.kernel_pca.KernelPCA'>
(shapes (30, 15), (30, 14) mismatch)
 x: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,
          3.31837404e-08,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   2.24505178e-08,   0.00000000e+00,...
 y: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,
          3.31844220e-08,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   2.24490761e-08,   0.00000000e+00,...

======================================================================
FAIL: sklearn.tests.test_common.test_transformers('LocallyLinearEmbedding', <class 'sklearn.manifold.locally_line
llyLinearEmbedding'>, array([[ 2.51189522,  2.6430893 ,  2.54847718],
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\Python27\lib\site-packages\sklearn\tests\test_common.py", line 269, in check_transformer
    % Transformer)
  File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 811, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "C:\Python27\lib\site-packages\numpy\testing\utils.py", line 644, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Arrays are not almost equal to 2 decimals
consecutive fit_transform outcomes not consistent in <class 'sklearn.manifold.locally_linear.LocallyLinearEmbeddi
(mismatch 25.0%)
 x: array([[ -2.27507872e-01,   2.98382398e-01],
       [  1.22093549e-01,  -1.92026395e-11],
       [  1.22093549e-01,  -2.04742612e-11],...
 y: array([[ -2.35941411e-01,   2.98382398e-01],
       [  1.04872863e-01,  -1.23895338e-12],
       [  1.04872863e-01,   1.95896077e-12],...

----------------------------------------------------------------------
Ran 3257 tests in 279.742s

Names of estimators that cause the failure:

  • KernelPCA
  • LocallyLinearEmbedding
  • CCA
@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jun 6, 2014

Owner

Note that those failures are random. I saw them both with the numpy + atlas package of numpy.org and the numpy + MKL package of Christoph Gohlke.

I tried to change the data of the test_commons:test_transformers test to have well conditioned input matrix and it does not seem to stabilize the test.

I am not sure whether this is a numpy bug under windows or a real stability bug in those algorithms that is only triggered under windows for some reason.

Owner

ogrisel commented Jun 6, 2014

Note that those failures are random. I saw them both with the numpy + atlas package of numpy.org and the numpy + MKL package of Christoph Gohlke.

I tried to change the data of the test_commons:test_transformers test to have well conditioned input matrix and it does not seem to stabilize the test.

I am not sure whether this is a numpy bug under windows or a real stability bug in those algorithms that is only triggered under windows for some reason.

@ogrisel ogrisel added the Bug label Jun 6, 2014

@ogrisel ogrisel added this to the 0.15 milestone Jun 6, 2014

@kastnerkyle

This comment has been minimized.

Show comment Hide comment
@kastnerkyle

kastnerkyle Jul 4, 2014

Owner

Depending on luck, different tests fail out of (it appears) 3 possibilities. Most interestingly, the transformed shape of these is different! I have been investigating the KernelPCA case so far

The lines that make me most suspicious here are:

if self.remove_zero_eig or self.n_components is None:
            self.alphas_ = self.alphas_[:, self.lambdas_ > 0]
            self.lambdas_ = self.lambdas_[self.lambdas_ > 0]

If the values returned for self.lambdas_ (eigenvalues) were on the edge of stability, and went between slightly positive and slightly negative, this could cause the sign to change and shrink the returned array (I think). This is using arpack eigsh or linalg.eigh, depending on some heuristic unless explicitly set if K.shape[0] > 200 and n_components < 10:.

Also, I see fit_transform for KernelPCA has a **params argument ... should this be changed?

Error in question...

AssertionError:
Arrays are not almost equal to 2 decimals
consecutive fit_transform outcomes not consistent in <class 'sklearn.decomposition.kernel_pca.KernelPCA'>
(shapes (30, 17), (30, 13) mismatch)
 x: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,
          3.13413026e-08,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,...
 y: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,
          3.26217481e-08,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,...
>>  raise AssertionError("\nArrays are not almost equal to 2 decimals\nconsecutive fit_transform outcomes not consistent
 in <class 'sklearn.decomposition.kernel_pca.KernelPCA'>\n(shapes (30, 17), (30, 13) mismatch)\n x: array([[  1.87664949
e+00,   8.57398986e-02,   4.20312700e-02,\n          3.13413026e-08,   0.00000000e+00,   0.00000000e+00,\n          0.00
000000e+00,   0.00000000e+00,   0.00000000e+00,...\n y: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,\n
         3.26217481e-08,   0.00000000e+00,   0.00000000e+00,\n          0.00000000e+00,   0.00000000e+00,   0.00000000e+
00,...")
Owner

kastnerkyle commented Jul 4, 2014

Depending on luck, different tests fail out of (it appears) 3 possibilities. Most interestingly, the transformed shape of these is different! I have been investigating the KernelPCA case so far

The lines that make me most suspicious here are:

if self.remove_zero_eig or self.n_components is None:
            self.alphas_ = self.alphas_[:, self.lambdas_ > 0]
            self.lambdas_ = self.lambdas_[self.lambdas_ > 0]

If the values returned for self.lambdas_ (eigenvalues) were on the edge of stability, and went between slightly positive and slightly negative, this could cause the sign to change and shrink the returned array (I think). This is using arpack eigsh or linalg.eigh, depending on some heuristic unless explicitly set if K.shape[0] > 200 and n_components < 10:.

Also, I see fit_transform for KernelPCA has a **params argument ... should this be changed?

Error in question...

AssertionError:
Arrays are not almost equal to 2 decimals
consecutive fit_transform outcomes not consistent in <class 'sklearn.decomposition.kernel_pca.KernelPCA'>
(shapes (30, 17), (30, 13) mismatch)
 x: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,
          3.13413026e-08,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,...
 y: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,
          3.26217481e-08,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,...
>>  raise AssertionError("\nArrays are not almost equal to 2 decimals\nconsecutive fit_transform outcomes not consistent
 in <class 'sklearn.decomposition.kernel_pca.KernelPCA'>\n(shapes (30, 17), (30, 13) mismatch)\n x: array([[  1.87664949
e+00,   8.57398986e-02,   4.20312700e-02,\n          3.13413026e-08,   0.00000000e+00,   0.00000000e+00,\n          0.00
000000e+00,   0.00000000e+00,   0.00000000e+00,...\n y: array([[  1.87664949e+00,   8.57398986e-02,   4.20312700e-02,\n
         3.26217481e-08,   0.00000000e+00,   0.00000000e+00,\n          0.00000000e+00,   0.00000000e+00,   0.00000000e+
00,...")
@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 4, 2014

Owner

Also, I see fit_transform for KernelPCA has a **params argument ...
should this be changed?

Probably, yes.

Owner

GaelVaroquaux commented Jul 4, 2014

Also, I see fit_transform for KernelPCA has a **params argument ...
should this be changed?

Probably, yes.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 7, 2014

Owner

I investigated further and I confirm that this only happens on 32 bit Python. All tests pass on 64 bit Python. I will work on a PR to skip those tests when run on a 32 bit Python.

Owner

ogrisel commented Jul 7, 2014

I investigated further and I confirm that this only happens on 32 bit Python. All tests pass on 64 bit Python. I will work on a PR to skip those tests when run on a 32 bit Python.

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jul 9, 2014

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jul 9, 2014

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jul 9, 2014

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jul 10, 2014

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jul 10, 2014

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jul 10, 2014

@ogrisel ogrisel changed the title from Failing tests in Windows to Unstable test_common.test_transformers under Windows for some estimators Jul 13, 2014

@ogrisel ogrisel changed the title from Unstable test_common.test_transformers under Windows for some estimators to Unstable test_common.test_transformers under Windows with Python 32-bit for some estimators Jul 13, 2014

@amueller amueller modified the milestones: 0.15.1, 0.15 Jul 18, 2014

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jul 18, 2014

Owner

@ogrisel should we close this one?

Owner

amueller commented Jul 18, 2014

@ogrisel should we close this one?

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 18, 2014

Owner

Yes. Closing it.

Owner

ogrisel commented Jul 18, 2014

Yes. Closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment