-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Bugfix for not-yet-confirmed issue #12863: arpack returns singular values in ascending order, the opposite was supposed in sklearn #12898
base: main
Are you sure you want to change the base?
Conversation
Please add a test. |
I'm working on it! |
I have added a test checking if the labels implied by the singular values computed by 'arpack' and 'randomized' methods are equal (the svs are not available in the BaseSpectral objects, so the direct comparison of the order of svs cannot be tested). I also tested the test, it passes with my updates to BaseSpectral._svd but fails without it, so the test matrix seems to be a good choice to drive the testing. However, one CI job fails: the one related to OSX. Interestingly, another svd calculation fails, the numbers do not match the expectations in the 12th digit. In my best understanding this has nothing to do with my bugfix, this mismatch is based on an svd call made directly to scipy at line 163 of decomposition/truncated_svd.py. I'm wondering how to proceed. Shall I maybe decrease the required precision in this test to 11 digits? |
I've not looked into the failure, but I think it must be due to your change, even if your change is right. |
Yes, I think it should be okay to increase the tolerance. |
(The rpca in that test code is a bit weird... both are using 'arpack', not 'randomized') |
I did some experimentation, let me summarize the results:
E AssertionError: My consequence is that "test_singular_values" needs to be fixed, however, I find changing the precision to 0 digits a too heavy intervention, although this difference in precision is what I would expect when a randomized method is involved. Would it make sense to play around with the parameters (like the size of the sample matrix) and try to find a combination which ensures at least 3-4 digits matching? |
Are you sure it's the change to the test, not the change to the implementation, that resulted in the failure? |
I suspect that when that test comparing rpca to apca was adapted from test_pca, it was seen as inappropriate to compare apca and rpca, and somehow randomized was changed to arpack without thinking it through... not sure though. |
So I don't think we should be testing randomized there, at least not in this PR. |
I did some further checks, let me summarize briefly:
Now, I have the bugfix in the code, my test is added, the required precision of the failing (arpack vs arpack) test is decreased to 11 digits, and things seem to work fine, all tests are passing. Maybe it would worth reporting the failure of the arpack vs arpack comparison in the truncated svd test (they should match to any precision) as a separate issue. Any comments are welcome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've confirmed the test failing at master.
Please add an entry to the change log at doc/whats_new/v0.21.rst
. Like the other entries there, please reference this pull request with :issue:
and credit yourself (and other contributors if applicable) with :user:
The arpack state issue might be worth raising separately.
Done! Thank you for the guidance! I'll raise the arpack state issue and also try to take it soon. |
Hi @gykovacs , if you are still interested in working on that, do you mind fixing conflicts? Thanks for your work and your patience! |
Yeah, let me look into it! |
Bugfix for not-yet-confirmed issue #12863: arpack returns singular values in ascending order, the opposite was supposed in sklearn
Reference Issues/PRs
Fixes [MRG] not-yet-confirmed issue #12863
What does this implement/fix? Explain your changes.
In the sklearn code it was assumed that the scipy wrapper for arpack singular value decomposition returns singular values (and corresponding vectors) in descending order. This is not the case. The ARPACK documentation (https://www.caam.rice.edu/software/ARPACK/UG/node136.html#SECTION001210000000000000000
) clearly states that the singular values are returned in ascending order.
The bug manifests in differing results depending on the solver:
gives
, but
gives
This bugfix makes the arpack call return the same result as the randomized sv based solution.
Any other comments?