New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes safe handling of small singular values in svds. #11829
Conversation
The way to safeguard the computation of singular vectors in the presence of small singular values currently adopted carries substantial risks of discarding valid (and valuable) singular values and vectors. This commit implements alternative way, adopted by scikit-learn's and raleigh's PCA and truncated SVD solvers, whereby all singular values (no matter how small) and singular vectors are computed in a stable and accurate manner.
…alg.svds. The way to safeguard the computation of singular vectors in the presence of small singular values currently adopted carries substantial risks of discarding valid (and valuable) singular values and vectors. This commit implements alternative way, adopted by scikit-learn's and raleigh's PCA and truncated SVD solvers, whereby all singular values (no matter how small) and singular vectors are computed in a stable and accurate manner.
@mdhaber Here is a good use-case for |
Selecting a tolerance is a nontrivial task for the users. So it should have a sensible default value. That depends often to the infinity norm of the matrix but in this case it is a bit more involved. I will try to check the issue a bit later. |
@ilayn Do you have time to review this sufficiently before we branch in ~6 days? |
Hello, I have no opinion on the use of svd(AV), however I would be very happy with the removal of the cutoff (see #12629). A simple optimization can be made when
The update of |
To write it completely while I'm on it: a solution could be from Line 1874
|
Had to remove 'const' in definitions of xx, vxx and vrr in spatial/ckdtree.pyx (would not build otherwise). Added myself to THANKS.txt.
@Gauthe: thanks a lot, implemented your suggestions! |
I am bit puzzled why you kept the increasing order convention Note that you may completely remove the functions |
@Gauthe: |
@tylerjereddy I think we will need to bump this again |
@evgueni-ovtchinnikov the codes here and at #16240 are basically identical now. If you could please update the summary here similar to that in #16240 my new PR will become redundant and could be closed |
@lobpcg will do later today |
@evgueni-ovtchinnikov the tentative deadlines are May 29/2022: rc1 Apparently delayed by a few days. If we miss this release, I could no longer help with this PR - life is too short. |
@lobpcg I was waiting for your comments on the updated summary if you are ok with it, then we are done, are we not? should I then ask ilayn et al for approval? |
I have not noticed the update, sorry. Let me have a look. Yes, please ask @ilayn . With the new summary it should be more clear what this PR does. |
@evgueni-ovtchinnikov the summary should follow the format that I use in #16240 since it is automatically processed Since the unit tests here and in #16240 are the same, you could repeat that "the new unit tests specifically for very small singular values that fail on the main; see https://github.com/scipy/scipy/runs/6538373727?check_suite_focus=true " which specifically demonstrates that this is a bug-fix, rather than enhancement PR. You can put most the current text in the Additional information but may be you want to cut it and refer to https://en.wikipedia.org/wiki/Rayleigh%E2%80%93Ritz_method#Using_the_normal_matrix instead. Please add the following info in the Section "What does this implement/fix?"
|
@lobpcg thanks for reminding about the format - last time I read about it was two years ago, when this PR was created. The summary is updated. |
I had a look a it. The PR rationale seems more clear to me. I am somewhat puzzled why the QR is needed on line 329. I had a look at
|
@lobpcg could you add a comment explaining why you added QR, please? My guess is that in lobpcg, Rayleigh-Ritz procedure uses non-orthogonal basis vectors in the trial subspace, which may cause eigenvectors' deviation from orthogonality - could it be the reason? |
# solver lobpcg does not guarantee exactly orthonormal eigenvectors if solver == 'lobpcg': eigvec, _ = np.linalg.qr(eigvec)
@ogauthe great point, thanks! It was my feeling too that only lobpcg may output not fully orthonormal eigenvectors and thus require this extra qr for all unit tests to pass, not arpack. I've made the corresponding change in the code and keep fingers crosses that all tests would still be green. Could you please may be perform a formal review of this PR, hoping that it might help to have it accepted? |
@evgueni-ovtchinnikov The possible inexact orthogonality in lobpcg comes from the fact that the RR matrices are computed implicitly to avoid extra matmuls during iterations and the postprocessing is still not implemented that would run one final exact RR and compute exact residuals, like @ogauthe notices above. I have edited the code to reflect this. I have made #16319 as a reminder to have this finally done in lobpcg internally one day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the QR only applies to the lobpcg case then I think it would be more clear to put this after line 295.
On balance, I'm going to squash merge this in before branching since it looks like domain experts think it merits merging, the CI is passing and there are regression tests added that claim to fix a bug. The patch diff coverage is also pretty solid (that this, of course, not a guarantee of correctness though). That said, the discussion was extremely hard to follow and there seem to be some concerns from @ilayn, which worries me a little. It looks like a unit test has also been temporarily disabled which makes me a bit nervous, but I understand that several additional/unrelated issues are confounding the purpose of this PR. I guess we'll see how this fares during the release candidate stage of testing. I'm prepared to deal with the burden of i.e., an additional release candidate if this ultimately causes too many issues, but hopefully it will work out. I did notice some frustration in the discussion above, and I do think we want this kind of deep expertise engaged in our specialized linalg codes, so hopefully future adjustments will be smoother for all involved. |
I don't think this PR does the right thing at all but I don't have any capacity to review for a while because of my work. So I'll leave it to the sparse residents. It is concerning that sparse library is becoming a dense linalg with a mild sparse flavor in it and none of the tools are effectively tested with sparse problems just to put in record. |
@ilayn Unit tests are difficult if at all possible to effectively test with sparse problems. It would be surely beneficial for the community if someone writes solid benchmarking for the sparse library, including eigensolvers and svds and integrates it into scipy documentation. Creating such a issue could be a good first step in this direction. |
@tylerjereddy thanks for the merge! There are a few of small issues with svds, e.g., #15152 that this merge now opens an opportunity to quickly fix |
No they are not difficult. As we discussed above, they can be conveniently added to the test suite. We just don't do it and I still don't see a valid reason why we don't. A very large sparse matrix with small nnz is not a problem to incorporate. Most sparse features are tested with 3x3 examples. That is not something we can continue if we want this library to be relevant to sparse users. Otherwise it will share the same fate as parts of scipy.signal is having. |
@ilayn difficult for me to write it from scratch. Since it is easy for you, may be you could add one example for everybody to follow? E.g., I would be glad to reuse such a unit test as a sample, if such a working unit test was already merged. |
There is already one I've put above as a test case why this PR is not working as intended. We should probably first get you to build SciPy locally. So that you can test things instead of relying on CI/CD |
@ilayn to my recollection, your example was not a unit test, so I could not just add it to the set of existing unit tests unfortunately. It would be great if you open a PR specifically to add at least one unit test that you like for any of the sparse solvers, svds, eigenvalues or linear systems that could be later used as a sample for other solvers after it gets merged. |
Reference issue
Allows to close #12629
Partially closes #11406
What does this implement/fix?
The way to compute a few singular values and vectors of a matrix A used by
svds
is to compute eigenvectors of either A' A or A A', whichever is smaller in size.Assuming some eigenvalues λi and corresponding eigenvectors vi of A' A are computed, the original implementation of
svds
computed singular values and vectors as follows. Observing that the computed eigenvectors vi are the right singular vectors of A, and the square roots of corresponding eigenvalues λi are corresponding singular values σi, the corresponding left singular values ui were computed by dividing A vi by σi so that ui and vi satisfy A vi = σi ui and A' ui = σi-1 A' A vi = σi vi.This way was clearly unsafe: computed λi can be negative because of the round-off errors, σi can be zero if A is not full rank, and if computed σi are very small then ui may lose orthogonality. In an attempt to fix this, at some point the current approach was implemented whereby eigenvalues λi below certain cut-off value were replaced by zeros, and corresponding left singular vectors by random vectors orthogonal to computed singular vectors. Such an approach does make sense if one employs
svds
for computing a low rank approximation of A. Assuming that the singular values are in descending order, the norm of the difference between A and the sum of σi ui v'i, i = 1, ..., k is equal to σk + 1. Therefore, if σk + 1 falls below the required accuracy of approximation, σi, i = k + 1, ... can be zeroed and the respective singular vectors no longer play any role. However, in this kind ofsvds
application, it is the user who is to decide how small σk + 1 should be for this and further singular values and vectors to become of no importance, and this is what issue #12629 advocates.But the real disaster may happen if the user is after few smallest singular values or needs to find the nullspace of A (the subspace of vectors v with zero A v), as has been discovered by #11406. The algorithm employed by svds for computing singular vectors corresponding to singular values below the cut-off requires the availability of all singular vectors corresponding to singular values above cut-off. The reason for the failure reported by #11406 is that in the case of small singular values computation, only a few such singular vectors are available, and computing the rest of them would essentially require full SVD.
This PR suggests an alternative fix for small singular values problem adopted by some other software implementing truncated SVD and PCA (e.g.
scikit-learn
andraleigh
).Assuming A 'tall' (the number of rows not less than the number of columns) and having computed V = [v1, ... , vk] ,
svds
should compute SVD for A V:A V = U Σ Q',
where U is the matrix whose columns are singular vectors of A V, square matrix Σ is diagonal with corresponding singular values of A V on the diagonal and Q is the matrix whose columns are the corresponding right singular vectors of A V. After which V is updated as V := V Q, and we end up with
A V = U Σ
and hence V' A' A V = Σ U' U Σ = Σ2 = Λ, i.e. the columns of V are eigenvectors of A' A.
Note that the singular values are guaranteed to be non-negative, singular vectors are orthonormal to machine accuracy, and zero (or very small) singular values are no longer a problem at all.
The suggested fix eliminates the need for eigenvalues thresholding requested by #12629, allowing to close this issue.
It also fixes the problem discovered by #11406 for matrices with the number of rows not less than the number of columns.
Additional information
Regarding the extra cost of the suggested post-processing step, one needs to bear in mind that both arpack and lobpcg solvers employed by svds for computing eigenvectors of A' A or A A' involve the Rayleigh-Ritz procedure in subspaces of dimension not less than 2k, which takes O(k3) flops. Therefore, using these solvers only makes sense for k << min(n, m), where (n, m) = A.shape.
Assuming n > m >> k, the SVD of the n-by-k matrix A V would involve two steps: QR factorization of this matrix, and full SVD of the k-by-k factor R. The first step takes less than (n + 1)k(k + 1) multiplications, which is much less than 2 n m k needed for applying A' A to k vector iterates, and negligible extra memory. The second step takes O(k3) flops, which is still less, and 5 k2 extra memory at worst (if divide-and-conquer algorithm is used).
Thus, the cost of the suggested post-processing step is going to be a small fraction of the overall computational costs.