MRG: Update ARPACK Backport #984

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
6 participants
@jakevdp
Member

jakevdp commented Jul 26, 2012

This updates scikit-learn's arpack backport with bug fixes from scipy v0.11, and adds a copy of scipy's arpack test suite to scikit-learn.

For reference, the main bugs fixed in this new version are reported here:
http://projects.scipy.org/scipy/ticket/1515

Note that in scipy 0.10 there were some minor updates to the ARPACK fortran source bundled with scipy, so a few rare cases will fail under older scipy versions. Short of bundling ARPACK with scikit-learn, I don't think we have any means of addressing this.

sklearn/utils/arpack.py
-
+This contains a copy of the ARPACK module introduced in scipy 0.10,
+and updated in scipy 0.11 to fix rare non-convergence errors.
+Note that ARPACK support existed in scipy 0.09, but that version of the

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 26, 2012

Owner

0.09 => 0.9

@ogrisel

ogrisel Jul 26, 2012

Owner

0.09 => 0.9

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 26, 2012

Owner

I get plenty of the same error when running make clean inplace test-coverage on your branch:

======================================================================
ERROR: sklearn.utils.tests.test_arpack.test_symmetric_modes(True, <std-symmetric>, 'f', 2, 'LM', None, None, <class 'scipy.sparse.csr.csr_matrix'>, None, 'normal')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Python/2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/oliviergrisel/coding/scikit-learn/sklearn/utils/tests/test_arpack.py", line 235, in eval_evec
    eval, evec = eigs_func(ac, k, **kwargs)
  File "/Users/oliviergrisel/coding/scikit-learn/sklearn/utils/arpack.py", line 1565, in _eigsh
    params.iterate()
  File "/Users/oliviergrisel/coding/scikit-learn/sklearn/utils/arpack.py", line 550, in iterate
    self.ipntr, self.workd, self.workl, self.info)
ValueError: need more than 6 values to unpack
Owner

ogrisel commented Jul 26, 2012

I get plenty of the same error when running make clean inplace test-coverage on your branch:

======================================================================
ERROR: sklearn.utils.tests.test_arpack.test_symmetric_modes(True, <std-symmetric>, 'f', 2, 'LM', None, None, <class 'scipy.sparse.csr.csr_matrix'>, None, 'normal')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Python/2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/oliviergrisel/coding/scikit-learn/sklearn/utils/tests/test_arpack.py", line 235, in eval_evec
    eval, evec = eigs_func(ac, k, **kwargs)
  File "/Users/oliviergrisel/coding/scikit-learn/sklearn/utils/arpack.py", line 1565, in _eigsh
    params.iterate()
  File "/Users/oliviergrisel/coding/scikit-learn/sklearn/utils/arpack.py", line 550, in iterate
    self.ipntr, self.workd, self.workl, self.info)
ValueError: need more than 6 values to unpack
@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 26, 2012

Owner

I am running python 2.6, numpy 1.6.2, scipy 0.10.1

Owner

ogrisel commented Jul 26, 2012

I am running python 2.6, numpy 1.6.2, scipy 0.10.1

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Jul 26, 2012

Member

Thanks Olivier - I did some digging, and the source of this difference is here:
scipy/scipy@201f709
The code in this PR uses the ARPACK wrapper version from scipy 0.11 (silly thing for a back-port!). I think I need to instead copy the implementation from scipy 0.10, and note that the backported code willl fail on scipy 0.11+.

This time I'll test with scipy 0.9, 0.10, and 0.11 before re-submitting.

Member

jakevdp commented Jul 26, 2012

Thanks Olivier - I did some digging, and the source of this difference is here:
scipy/scipy@201f709
The code in this PR uses the ARPACK wrapper version from scipy 0.11 (silly thing for a back-port!). I think I need to instead copy the implementation from scipy 0.10, and note that the backported code willl fail on scipy 0.11+.

This time I'll test with scipy 0.9, 0.10, and 0.11 before re-submitting.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Jul 26, 2012

Member

This should now be compatible with both old and new versions of scipy.

Member

jakevdp commented Jul 26, 2012

This should now be compatible with both old and new versions of scipy.

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 26, 2012

Owner

I confirm the tests pass now.

Owner

ogrisel commented Jul 26, 2012

I confirm the tests pass now.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Jul 26, 2012

Member

One question on this: in the tests, I forced the test module to load the back-ported versions of the functions, even if the newer scipy version is available. On the one hand, this makes sense because we don't want our test suite to simply test the scipy functions. On the other hand, this may not make sense because the backports aren't necessary when newer scipy versions are available. Any thoughts on that?

Member

jakevdp commented Jul 26, 2012

One question on this: in the tests, I forced the test module to load the back-ported versions of the functions, even if the newer scipy version is available. On the one hand, this makes sense because we don't want our test suite to simply test the scipy functions. On the other hand, this may not make sense because the backports aren't necessary when newer scipy versions are available. Any thoughts on that?

@ogrisel

This comment has been minimized.

Show comment Hide comment
@ogrisel

ogrisel Jul 26, 2012

Owner

I am not sure either...

Owner

ogrisel commented Jul 26, 2012

I am not sure either...

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jul 27, 2012

Owner

@jakevdp If the scipy versions are available, I think we should test that these are indeed used else.

Owner

amueller commented Jul 27, 2012

@jakevdp If the scipy versions are available, I think we should test that these are indeed used else.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Jul 27, 2012

Member

@jakevdp If the scipy versions are available, I think we should test that these are indeed used

I'm not sure I understand. Do you mean having a test that's something like:

def test_version():
    if scipy.version.version >= '0.10':
        from scipy.sparse.linalg import eigsh as sp_eigsh
        assert eigsh == sp_eigsh
    else:
        from sklearn.utils.arpack import _eigsh
        assert eigsh == _eigsh
Member

jakevdp commented Jul 27, 2012

@jakevdp If the scipy versions are available, I think we should test that these are indeed used

I'm not sure I understand. Do you mean having a test that's something like:

def test_version():
    if scipy.version.version >= '0.10':
        from scipy.sparse.linalg import eigsh as sp_eigsh
        assert eigsh == sp_eigsh
    else:
        from sklearn.utils.arpack import _eigsh
        assert eigsh == _eigsh
@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Jul 27, 2012

Owner

I haven't really thought about how to do it but something like that, yes. Or do you think that doesn't help?

Testing the scipy functions like you do now would ensure that the same as the back port on the other hand.

Owner

amueller commented Jul 27, 2012

I haven't really thought about how to do it but something like that, yes. Or do you think that doesn't help?

Testing the scipy functions like you do now would ensure that the same as the back port on the other hand.

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 30, 2012

Owner

I actually think that it is good to always test our versions, because we only want to test our code. In addition, for the 'broken window effect' we do want to test our code for coverage reasons.

However, if at some point it leads to problem (i.e. test failures for functions that we don't actually use), I think that we should skip the corresponding tests.

Owner

GaelVaroquaux commented Jul 30, 2012

I actually think that it is good to always test our versions, because we only want to test our code. In addition, for the 'broken window effect' we do want to test our code for coverage reasons.

However, if at some point it leads to problem (i.e. test failures for functions that we don't actually use), I think that we should skip the corresponding tests.

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 30, 2012

Owner

Using the latest scipy, I get a bunch of failing tests on this branch on my work computer ( http://pastebin.com/JgTJzzXW ). Granted, they fail in my scipy checkout too, so its probably a problem of scipy, not of the scikit, but it seems to me that these tests are fragile.

Owner

GaelVaroquaux commented Jul 30, 2012

Using the latest scipy, I get a bunch of failing tests on this branch on my work computer ( http://pastebin.com/JgTJzzXW ). Granted, they fail in my scipy checkout too, so its probably a problem of scipy, not of the scikit, but it seems to me that these tests are fragile.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Jul 30, 2012

Member

I know for certain that these tests are fragile. We had a lot of trouble with this in scipy. We ended up completely deprecating single-precision routines, and this seemed to fix all the platform-dependent convergence problems. You should report these errors to scipy - maybe someone else has a better idea of what could be causing them.

Member

jakevdp commented Jul 30, 2012

I know for certain that these tests are fragile. We had a lot of trouble with this in scipy. We ended up completely deprecating single-precision routines, and this seemed to fix all the platform-dependent convergence problems. You should report these errors to scipy - maybe someone else has a better idea of what could be causing them.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Aug 1, 2012

Member

As I see it, we have a few options:

  1. test our backport regardless of whether newer scipy is available. This is how the PR is currently written, and has the advantage that we don't waste test cycles unit-testing code which is not in our own code-base.
  2. test scipy versions when available. This has the advantage that it tests the actual code used by the user of scikit-learn when scipy is available.
  3. remove all tests. All the code is duplicated from scipy anyway, and the routines which use the backport will be tested, which gives coverage for the parts that use scikit-learn.

I vote for number 1, which is how the PR is currently structured. The disadvantage is the instability mentioned by Gael -- this is an issue in scipy, not in scikit-learn, and I don't see a way around this short of removing the tests entirely.

Member

jakevdp commented Aug 1, 2012

As I see it, we have a few options:

  1. test our backport regardless of whether newer scipy is available. This is how the PR is currently written, and has the advantage that we don't waste test cycles unit-testing code which is not in our own code-base.
  2. test scipy versions when available. This has the advantage that it tests the actual code used by the user of scikit-learn when scipy is available.
  3. remove all tests. All the code is duplicated from scipy anyway, and the routines which use the backport will be tested, which gives coverage for the parts that use scikit-learn.

I vote for number 1, which is how the PR is currently structured. The disadvantage is the instability mentioned by Gael -- this is an issue in scipy, not in scikit-learn, and I don't see a way around this short of removing the tests entirely.

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Aug 1, 2012

Owner

On Wed, Aug 01, 2012 at 10:16:36AM -0700, Jacob Vanderplas wrote:

I vote for number 1, which is how the PR is currently structured. The disadvantage is the instability mentioned by Gael -- this is an issue in scipy, not in scikit-learn, and I don't see a way around this short of removing the tests entirely.

I really would like to shoot for no failing tests. I believe that failing
tests are extremely bad psychologically. One of the reason that I don't
report failing tests for scipy is that I have always had failing tests
for scipy, and on this particular box dozens and dozens.

I would vote for cherry picking robust tests and keeping these. I can
give you a hand with that if you want.

Owner

GaelVaroquaux commented Aug 1, 2012

On Wed, Aug 01, 2012 at 10:16:36AM -0700, Jacob Vanderplas wrote:

I vote for number 1, which is how the PR is currently structured. The disadvantage is the instability mentioned by Gael -- this is an issue in scipy, not in scikit-learn, and I don't see a way around this short of removing the tests entirely.

I really would like to shoot for no failing tests. I believe that failing
tests are extremely bad psychologically. One of the reason that I don't
report failing tests for scipy is that I have always had failing tests
for scipy, and on this particular box dozens and dozens.

I would vote for cherry picking robust tests and keeping these. I can
give you a hand with that if you want.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Aug 2, 2012

Member

If we're cherry-picking robust tests, then why not just go with option 3? Unit tests for parts of scikit-learn which use ARPACK will give test coverage for the pieces of the backport we need to perform robustly.

Member

jakevdp commented Aug 2, 2012

If we're cherry-picking robust tests, then why not just go with option 3? Unit tests for parts of scikit-learn which use ARPACK will give test coverage for the pieces of the backport we need to perform robustly.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Aug 2, 2012

Owner

I am also strongly against failing / not rubost tests.

Owner

amueller commented Aug 2, 2012

I am also strongly against failing / not rubost tests.

@GaelVaroquaux

This comment has been minimized.

Show comment Hide comment
@GaelVaroquaux

GaelVaroquaux Aug 2, 2012

Owner

If we're cherry-picking robust tests, then why not just go with option 3?

Because that will not give us test coverage on the backport on recent
versions of scipy.

I am willing to help with the cherry-picking.

Owner

GaelVaroquaux commented Aug 2, 2012

If we're cherry-picking robust tests, then why not just go with option 3?

Because that will not give us test coverage on the backport on recent
versions of scipy.

I am willing to help with the cherry-picking.

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Aug 5, 2012

Owner

Is there no way to make the tests more robust?

Owner

amueller commented Aug 5, 2012

Is there no way to make the tests more robust?

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Aug 5, 2012

Owner

On my box, all tests pass btw

Owner

amueller commented Aug 5, 2012

On my box, all tests pass btw

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Aug 6, 2012

Member

I suspect that the lack of robustness is due to ARPACK itself. Looking at the problems Gael is having, they seem to be similar to the single-precision issues we dealt with in scipy 0.10. Our solution there was to disable the single-precision ARPACK interface, and upgrade all problems to double. It seems now that the double routines aren't robust either, though to my knowledge, that's never been reported before to scipy.

With that in mind, I don't think there's much we can do to make the tests more robust.

Member

jakevdp commented Aug 6, 2012

I suspect that the lack of robustness is due to ARPACK itself. Looking at the problems Gael is having, they seem to be similar to the single-precision issues we dealt with in scipy 0.10. Our solution there was to disable the single-precision ARPACK interface, and upgrade all problems to double. It seems now that the double routines aren't robust either, though to my knowledge, that's never been reported before to scipy.

With that in mind, I don't think there's much we can do to make the tests more robust.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Aug 6, 2012

Member

@GaelVaroquaux - what scipy version are you having issues with? There was a fix to the ARPACK library bundled with scipy in version 0.10.1.

Member

jakevdp commented Aug 6, 2012

@GaelVaroquaux - what scipy version are you having issues with? There was a fix to the ARPACK library bundled with scipy in version 0.10.1.

@jakevdp jakevdp referenced this pull request Aug 17, 2012

Closed

Unstable LLE test #1031

@vene

This comment has been minimized.

Show comment Hide comment
@vene

vene Aug 29, 2012

Owner

I get failing tests, they look like the "standard" failures to me, but posting the log just in case. http://pastebin.com/AsS7NE3f

Owner

vene commented Aug 29, 2012

I get failing tests, they look like the "standard" failures to me, but posting the log just in case. http://pastebin.com/AsS7NE3f

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Aug 25, 2015

Owner

This PR is 3 years old. Is it still relevant?
Also, when do we bump scipy requirements? [see #3282]
The main benefit from dropping old scipy is that we could remove backports, right? As we already have them, that doesn't seem like a huge win.

Owner

amueller commented Aug 25, 2015

This PR is 3 years old. Is it still relevant?
Also, when do we bump scipy requirements? [see #3282]
The main benefit from dropping old scipy is that we could remove backports, right? As we already have them, that doesn't seem like a huge win.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Aug 25, 2015

Member

I think it's still relevant if we want full unit test coverage of back-ported code.

A benefit of removing backports is that we won't have to keep on top of bug fixes as they are applied to scipy. Also our test coverage metric would go way up 😄

Member

jakevdp commented Aug 25, 2015

I think it's still relevant if we want full unit test coverage of back-ported code.

A benefit of removing backports is that we won't have to keep on top of bug fixes as they are applied to scipy. Also our test coverage metric would go way up 😄

@amueller

This comment has been minimized.

Show comment Hide comment
@amueller

amueller Aug 26, 2015

Owner

I'm still -1 on adding unit test coverage to the backport, but this also updates the wrappers themselves, right?

Owner

amueller commented Aug 26, 2015

I'm still -1 on adding unit test coverage to the backport, but this also updates the wrappers themselves, right?

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Aug 27, 2015

Member

Right – forgot about that. I haven't really kept up with the arpack stuff in the last year or two, so I'm not sure about the current status.

Member

jakevdp commented Aug 27, 2015

Right – forgot about that. I haven't really kept up with the arpack stuff in the last year or two, so I'm not sure about the current status.

@giorgiop

This comment has been minimized.

Show comment Hide comment
@giorgiop

giorgiop Oct 23, 2015

Contributor

(I haven't read the part of the discussion from 2012.) FIY I just did a backport of arpack in #5299.
If you are satisfied with that one, we could close this issue.

Contributor

giorgiop commented Oct 23, 2015

(I haven't read the part of the discussion from 2012.) FIY I just did a backport of arpack in #5299.
If you are satisfied with that one, we could close this issue.

@jakevdp

This comment has been minimized.

Show comment Hide comment
@jakevdp

jakevdp Oct 23, 2015

Member

Closing, as the issues discussed here are made obsolete by #3282 and #5299.

Member

jakevdp commented Oct 23, 2015

Closing, as the issues discussed here are made obsolete by #3282 and #5299.

@jakevdp jakevdp closed this Oct 23, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment