Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes safe handling of small singular values in svds. #11829

Merged
merged 109 commits into from May 30, 2022

Conversation

evgueni-ovtchinnikov
Copy link
Contributor

@evgueni-ovtchinnikov evgueni-ovtchinnikov commented Apr 9, 2020

Reference issue

Allows to close #12629
Partially closes #11406

What does this implement/fix?

The way to compute a few singular values and vectors of a matrix A used by svds is to compute eigenvectors of either A' A or A A', whichever is smaller in size.

Assuming some eigenvalues λi and corresponding eigenvectors vi of A' A are computed, the original implementation of svds computed singular values and vectors as follows. Observing that the computed eigenvectors vi are the right singular vectors of A, and the square roots of corresponding eigenvalues λi are corresponding singular values σi, the corresponding left singular values ui were computed by dividing A vi by σi so that ui and vi satisfy A vi = σi ui and A' ui = σi-1 A' A vi = σi vi.

This way was clearly unsafe: computed λi can be negative because of the round-off errors, σi can be zero if A is not full rank, and if computed σi are very small then ui may lose orthogonality. In an attempt to fix this, at some point the current approach was implemented whereby eigenvalues λi below certain cut-off value were replaced by zeros, and corresponding left singular vectors by random vectors orthogonal to computed singular vectors. Such an approach does make sense if one employs svds for computing a low rank approximation of A. Assuming that the singular values are in descending order, the norm of the difference between A and the sum of σi ui v'i, i = 1, ..., k is equal to σk + 1. Therefore, if σk + 1 falls below the required accuracy of approximation, σi, i = k + 1, ... can be zeroed and the respective singular vectors no longer play any role. However, in this kind of svds application, it is the user who is to decide how small σk + 1 should be for this and further singular values and vectors to become of no importance, and this is what issue #12629 advocates.

But the real disaster may happen if the user is after few smallest singular values or needs to find the nullspace of A (the subspace of vectors v with zero A v), as has been discovered by #11406. The algorithm employed by svds for computing singular vectors corresponding to singular values below the cut-off requires the availability of all singular vectors corresponding to singular values above cut-off. The reason for the failure reported by #11406 is that in the case of small singular values computation, only a few such singular vectors are available, and computing the rest of them would essentially require full SVD.

This PR suggests an alternative fix for small singular values problem adopted by some other software implementing truncated SVD and PCA (e.g. scikit-learn and raleigh).

Assuming A 'tall' (the number of rows not less than the number of columns) and having computed V = [v1, ... , vk] , svds should compute SVD for A V:

A V = U Σ Q',

where U is the matrix whose columns are singular vectors of A V, square matrix Σ is diagonal with corresponding singular values of A V on the diagonal and Q is the matrix whose columns are the corresponding right singular vectors of A V. After which V is updated as V := V Q, and we end up with

A V = U Σ

and hence V' A' A V = Σ U' U Σ = Σ2 = Λ, i.e. the columns of V are eigenvectors of A' A.

Note that the singular values are guaranteed to be non-negative, singular vectors are orthonormal to machine accuracy, and zero (or very small) singular values are no longer a problem at all.

The suggested fix eliminates the need for eigenvalues thresholding requested by #12629, allowing to close this issue.

It also fixes the problem discovered by #11406 for matrices with the number of rows not less than the number of columns.

Additional information

Regarding the extra cost of the suggested post-processing step, one needs to bear in mind that both arpack and lobpcg solvers employed by svds for computing eigenvectors of A' A or A A' involve the Rayleigh-Ritz procedure in subspaces of dimension not less than 2k, which takes O(k3) flops. Therefore, using these solvers only makes sense for k << min(n, m), where (n, m) = A.shape.

Assuming n > m >> k, the SVD of the n-by-k matrix A V would involve two steps: QR factorization of this matrix, and full SVD of the k-by-k factor R. The first step takes less than (n + 1)k(k + 1) multiplications, which is much less than 2 n m k needed for applying A' A to k vector iterates, and negligible extra memory. The second step takes O(k3) flops, which is still less, and 5 k2 extra memory at worst (if divide-and-conquer algorithm is used).

Thus, the cost of the suggested post-processing step is going to be a small fraction of the overall computational costs.

evgueni-ovtchinnikov and others added 3 commits April 5, 2020 16:57
The way to safeguard the computation of singular vectors in the presence
of small singular values currently adopted carries substantial risks of
discarding valid (and valuable) singular values and vectors.

This commit implements alternative way, adopted by scikit-learn's and
raleigh's PCA and truncated SVD solvers, whereby all singular values
(no matter how small) and singular vectors are computed in a stable and
accurate manner.
…alg.svds.

The way to safeguard the computation of singular vectors in the presence
of small singular values currently adopted carries substantial risks of
discarding valid (and valuable) singular values and vectors.

This commit implements alternative way, adopted by scikit-learn's and
raleigh's PCA and truncated SVD solvers, whereby all singular values
(no matter how small) and singular vectors are computed in a stable and
accurate manner.
@miladsade96 miladsade96 added maintenance Items related to regular maintenance tasks scipy.sparse.linalg labels Apr 9, 2020
@ilayn
Copy link
Member

ilayn commented Apr 9, 2020

@mdhaber Here is a good use-case for ?gejsv. Thoughts?

@ilayn
Copy link
Member

ilayn commented Apr 9, 2020

Selecting a tolerance is a nontrivial task for the users. So it should have a sensible default value. That depends often to the infinity norm of the matrix but in this case it is a bit more involved. I will try to check the issue a bit later.

@pv pv added this to the 1.5.0 milestone Apr 13, 2020
@tylerjereddy
Copy link
Contributor

I will try to check the issue a bit later.

@ilayn Do you have time to review this sufficiently before we branch in ~6 days?

@tylerjereddy tylerjereddy modified the milestones: 1.5.0, 1.6.0 May 25, 2020
@ogauthe ogauthe mentioned this pull request Jul 29, 2020
@ogauthe
Copy link
Contributor

ogauthe commented Jul 29, 2020

Hello,

I have no opinion on the use of svd(AV), however I would be very happy with the removal of the cutoff (see #12629).

A simple optimization can be made when return_singular_vectors is False by avoiding computing singular vectors and vh dot product : lines 1875-1882 could be replaced with

    u = X_matmat(eigvec)
    if not return_singular_vectors:
        s = svd(u, compute_uv=False)
        return s
    u, s, vh = svd(u, full_matrices=False)
    vh = vh @ _herm(eigvec)

The update of vh can be even more restrained to cases where it is actually returned depending on transpose and return. I also kept the standard decreasing order convention for singular values (see #12627). I can make a separate PR if this is more convenient than modifying this one.

@ogauthe
Copy link
Contributor

ogauthe commented Jul 29, 2020

To write it completely while I'm on it: a solution could be from Line 1874

    u = X_matmat(eigvec)
    if not return_singular_vectors:
        s = svd(u, compute_uv=False)
        return s

    # compute the right singular vectors of X and update the left ones accordingly
    u, s, vh = svd(u, full_matrices=False)
    return_u = (return_singular_vectors == 'u')
    return_vh = (return_singular_vectors == 'vh')
    if not transpose:
        if return_vh:
            u = None
        if return_u:
            vh = None
        else:
           vh = vh @ _herm(eigvec)
        return u, s, vh
    else:
        if return_u:
            u = eigvec @ _herm(vh)
            return u, s, None
        if return_vh:
            return None, s, _herm(u)
        u, vh = eigvec @ _herm(vh), _herm(u)
        return u, s, vh

Had to remove 'const' in definitions of xx, vxx and vrr in
spatial/ckdtree.pyx (would not build otherwise).

Added myself to THANKS.txt.
@evgueni-ovtchinnikov
Copy link
Contributor Author

@Gauthe: thanks a lot, implemented your suggestions!

@ogauthe
Copy link
Contributor

ogauthe commented Aug 19, 2020

I am bit puzzled why you kept the increasing order convention s[::-1] at Line 1877 when singular vectors are not computed, but used the decreasing order convention else. I think you should remove the [::-1].

Note that you may completely remove the functions _augmented_orthonormal_cols and _augmented_orthonormal_rows, they are no more used in the file.

@evgueni-ovtchinnikov
Copy link
Contributor Author

@Gauthe: svd returns singular values and vectors in the descending order of values, svds returns them in the opposite order.

@tylerjereddy
Copy link
Contributor

@ilayn @pmla Is this close enough to your expertise to give a review before we branch in about two weeks?

@ilayn
Copy link
Member

ilayn commented Nov 12, 2020

@tylerjereddy I think we will need to bump this again ☹️

@tylerjereddy tylerjereddy modified the milestones: 1.6.0, 1.7.0 Nov 13, 2020
@lobpcg
Copy link
Contributor

lobpcg commented May 23, 2022

@evgueni-ovtchinnikov the codes here and at #16240 are basically identical now. If you could please update the summary here similar to that in #16240 my new PR will become redundant and could be closed

@evgueni-ovtchinnikov
Copy link
Contributor Author

@lobpcg will do later today

@lobpcg
Copy link
Contributor

lobpcg commented May 26, 2022

@evgueni-ovtchinnikov the tentative deadlines are
May 26/2022: branch 1.9.x

May 29/2022: rc1

Apparently delayed by a few days.

If we miss this release, I could no longer help with this PR - life is too short.

@evgueni-ovtchinnikov
Copy link
Contributor Author

@lobpcg I was waiting for your comments on the updated summary

if you are ok with it, then we are done, are we not?

should I then ask ilayn et al for approval?

@lobpcg
Copy link
Contributor

lobpcg commented May 26, 2022

@lobpcg I was waiting for your comments on the updated summary

if you are ok with it, then we are done, are we not?

should I then ask ilayn et al for approval?

I have not noticed the update, sorry. Let me have a look.

Yes, please ask @ilayn . With the new summary it should be more clear what this PR does.

@lobpcg
Copy link
Contributor

lobpcg commented May 26, 2022

@evgueni-ovtchinnikov the summary should follow the format that I use in #16240 since it is automatically processed

Since the unit tests here and in #16240 are the same, you could repeat that "the new unit tests specifically for very small singular values that fail on the main; see https://github.com/scipy/scipy/runs/6538373727?check_suite_focus=true " which specifically demonstrates that this is a bug-fix, rather than enhancement PR.

You can put most the current text in the Additional information but may be you want to cut it and refer to https://en.wikipedia.org/wiki/Rayleigh%E2%80%93Ritz_method#Using_the_normal_matrix instead.

Please add the following info in the Section "What does this implement/fix?"

  1. why it only partially closes scipy.sparse.linalg.svds (v1.4.1) on singular matrix does not give vh vectors in null space #11406
  2. the extra cost of postprocessing

@evgueni-ovtchinnikov
Copy link
Contributor Author

@lobpcg thanks for reminding about the format - last time I read about it was two years ago, when this PR was created.

The summary is updated.

@evgueni-ovtchinnikov
Copy link
Contributor Author

@ilayn @mdhaber @tupui @tylerjereddy @ogauthe @pv @lobpcg
The rationale for this PR has been re-defined as a bugfix for #11406 - could someone please have a look and formally review it to give it a chance to be merged before 1.9.x is released?

@ogauthe
Copy link
Contributor

ogauthe commented May 30, 2022

I had a look a it. The PR rationale seems more clear to me.

I am somewhat puzzled why the QR is needed on line 329. I had a look at

# Future work: Need to add Postprocessing here:
, I understand it as eigenvectors may not be fully orthogonal. Is it something specific to lobpcg? I expect the restarted Lanczos method from ARPACK to always return orthogonal eigenvectors, but I may be too optimistic. I think adding a comment explaining its use would be useful.

@evgueni-ovtchinnikov
Copy link
Contributor Author

@lobpcg could you add a comment explaining why you added QR, please? My guess is that in lobpcg, Rayleigh-Ritz procedure uses non-orthogonal basis vectors in the trial subspace, which may cause eigenvectors' deviation from orthogonality - could it be the reason?

    # solver lobpcg does not guarantee exactly orthonormal eigenvectors
    if solver == 'lobpcg':
        eigvec, _ = np.linalg.qr(eigvec)
@lobpcg
Copy link
Contributor

lobpcg commented May 30, 2022

I had a look a it. The PR rationale seems more clear to me.

I am somewhat puzzled why the QR is needed on line 329. I had a look at

# Future work: Need to add Postprocessing here:

, I understand it as eigenvectors may not be fully orthogonal. Is it something specific to lobpcg? I expect the restarted Lanczos method from ARPACK to always return orthogonal eigenvectors, but I may be too optimistic. I think adding a comment explaining its use would be useful.

@ogauthe great point, thanks! It was my feeling too that only lobpcg may output not fully orthonormal eigenvectors and thus require this extra qr for all unit tests to pass, not arpack. I've made the corresponding change in the code and keep fingers crosses that all tests would still be green.

Could you please may be perform a formal review of this PR, hoping that it might help to have it accepted?

@lobpcg
Copy link
Contributor

lobpcg commented May 30, 2022

@lobpcg could you add a comment explaining why you added QR, please? My guess is that in lobpcg, Rayleigh-Ritz procedure uses non-orthogonal basis vectors in the trial subspace, which may cause eigenvectors' deviation from orthogonality - could it be the reason?

@evgueni-ovtchinnikov The possible inexact orthogonality in lobpcg comes from the fact that the RR matrices are computed implicitly to avoid extra matmuls during iterations and the postprocessing is still not implemented that would run one final exact RR and compute exact residuals, like @ogauthe notices above. I have edited the code to reflect this.

I have made #16319 as a reminder to have this finally done in lobpcg internally one day.

Copy link
Contributor

@ogauthe ogauthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the QR only applies to the lobpcg case then I think it would be more clear to put this after line 295.

@tylerjereddy
Copy link
Contributor

On balance, I'm going to squash merge this in before branching since it looks like domain experts think it merits merging, the CI is passing and there are regression tests added that claim to fix a bug. The patch diff coverage is also pretty solid (that this, of course, not a guarantee of correctness though).

That said, the discussion was extremely hard to follow and there seem to be some concerns from @ilayn, which worries me a little. It looks like a unit test has also been temporarily disabled which makes me a bit nervous, but I understand that several additional/unrelated issues are confounding the purpose of this PR.

I guess we'll see how this fares during the release candidate stage of testing. I'm prepared to deal with the burden of i.e., an additional release candidate if this ultimately causes too many issues, but hopefully it will work out. I did notice some frustration in the discussion above, and I do think we want this kind of deep expertise engaged in our specialized linalg codes, so hopefully future adjustments will be smoother for all involved.

@tylerjereddy tylerjereddy merged commit 01621e0 into scipy:main May 30, 2022
@ilayn
Copy link
Member

ilayn commented May 30, 2022

I don't think this PR does the right thing at all but I don't have any capacity to review for a while because of my work. So I'll leave it to the sparse residents. It is concerning that sparse library is becoming a dense linalg with a mild sparse flavor in it and none of the tools are effectively tested with sparse problems just to put in record.

@lobpcg
Copy link
Contributor

lobpcg commented May 30, 2022

I don't think this PR does the right thing at all but I don't have any capacity to review for a while because of my work. So I'll leave it to the sparse residents. It is concerning that sparse library is becoming a dense linalg with a mild sparse flavor in it and none of the tools are effectively tested with sparse problems just to put in record.

@ilayn Unit tests are difficult if at all possible to effectively test with sparse problems. It would be surely beneficial for the community if someone writes solid benchmarking for the sparse library, including eigensolvers and svds and integrates it into scipy documentation. Creating such a issue could be a good first step in this direction.

@lobpcg
Copy link
Contributor

lobpcg commented May 30, 2022

@tylerjereddy thanks for the merge! There are a few of small issues with svds, e.g., #15152 that this merge now opens an opportunity to quickly fix

@ilayn
Copy link
Member

ilayn commented May 30, 2022

No they are not difficult. As we discussed above, they can be conveniently added to the test suite. We just don't do it and I still don't see a valid reason why we don't. A very large sparse matrix with small nnz is not a problem to incorporate. Most sparse features are tested with 3x3 examples. That is not something we can continue if we want this library to be relevant to sparse users. Otherwise it will share the same fate as parts of scipy.signal is having.

@lobpcg
Copy link
Contributor

lobpcg commented May 30, 2022

@ilayn difficult for me to write it from scratch. Since it is easy for you, may be you could add one example for everybody to follow? E.g., I would be glad to reuse such a unit test as a sample, if such a working unit test was already merged.

@ilayn
Copy link
Member

ilayn commented Jun 2, 2022

There is already one I've put above as a test case why this PR is not working as intended.

We should probably first get you to build SciPy locally. So that you can test things instead of relying on CI/CD

@lobpcg
Copy link
Contributor

lobpcg commented Jun 4, 2022

There is already one I've put above as a test case why this PR is not working as intended.

@ilayn to my recollection, your example was not a unit test, so I could not just add it to the set of existing unit tests unfortunately. It would be great if you open a PR specifically to add at least one unit test that you like for any of the sparse solvers, svds, eigenvalues or linear systems that could be later used as a sample for other solvers after it gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.sparse.linalg
Projects
None yet
10 participants