Skip to content

Commit

Permalink
Fixes safe handling of small singular values in svds. (#11829)
Browse files Browse the repository at this point in the history
* Fixes safe handling of small singular values in svds.

The way to safeguard the computation of singular vectors in the presence
of small singular values currently adopted carries substantial risks of
discarding valid (and valuable) singular values and vectors.

This commit implements alternative way, adopted by scikit-learn's and
raleigh's PCA and truncated SVD solvers, whereby all singular values
(no matter how small) and singular vectors are computed in a stable and
accurate manner.

* BUG: Fixes safe handling of small singular values in scipy.sparse.linalg.svds.

The way to safeguard the computation of singular vectors in the presence
of small singular values currently adopted carries substantial risks of
discarding valid (and valuable) singular values and vectors.

This commit implements alternative way, adopted by scikit-learn's and
raleigh's PCA and truncated SVD solvers, whereby all singular values
(no matter how small) and singular vectors are computed in a stable and
accurate manner.

* added missing partial return handling

* BUG: fixed two bugs in test_arpack.py

* BUG: fixed a bug in test_svd_LM_ones_matrix()

* BUG: another attempt to fix a bug in test_svd_LM_ones_matrix()

* BUG: fixed E261 in arpack.py

* Implemented code optimizations suggested by Olivier Gauthe.

Had to remove 'const' in definitions of xx, vxx and vrr in
spatial/ckdtree.pyx (would not build otherwise).

Added myself to THANKS.txt.

* restored spatial/ckdtree.pyx (spatial/tests/test_kdtree.py failed)

* restored THANKS.txt to avoid conflicts

* corrected the order of singular values and vectors

* updated _svds.py

* added missing imports in _svds.py

* added missing imports in _sdvs.py

* reverted the change in eigensolvers tolerance

* trying to fix this PR mess

* added missing import svd

* added forgotten definition of transpose in _svds.py

* corrected test_svd_LM_ones_matrix and test_svd_return_singular_vectors

* corrected test_svd_LM_ones_matrix

* covered the case return_singular_vectors=True in svds

* corrected a small mistake in _svds.py

* corrected the case return_singular_vectors=True in svds

* added a test that would fail on master but should work on fix-svds

* made small amendments to eliminate checks errors

* restored accidentally deleted line in test_small_sigma

* test_small_sigma to skip propack if matrix is complex

* simplified post-processing of singular vectors

* STY: sparse.linalg: PEP8 newline at end of file

[skip ci]

* renamed unused eigenval variables

* Apply suggestions from code review

Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>

* error fixed

* swap the ij order as proposed in review

* docstring update and error fix

adding the reference https://bit.ly/3LprcUa

* lint fix

* added a link to the PR in the test

* changes requested by reviewer

* added unit test sparse small singular values

* temp remove complex dtypes

* filtering out some unrelated failures

* type fix

* typo fix

* multiple edits

* error fix

* disable new tests temporary

* remove back which="SM"

* trying to test vectors

* drop the tol to 1e-6 for vectors

* typo fix, comment out singular vector check temporary

* increase maxiter for lobpcg to converge

* LOBPCG needs larger atol and rtol to pass

* remove test for unsupported dtypes in ARPACK

* docsting updated with info input dtypes

* the new test goes back

* np.abs(s) in sorting singular values

* Revert "np.abs(s) in sorting singular values"

This reverts commit 87bbc40.

* run qr on eigenvectors before RR

* qr mistake fixed

* another qr error fixed

* lint fixes

* new _check_svds_n testing

* ty[ps fixed

* typo fixed

* error fix

* errors fixed

* error fixed

* tol adjustments

* lint fix

* remove ARPACK from check_svd

ARPACK often misses nearly multiple singular values

* lint fix + trying PROPACK

* PROPACK removed from tests as unrelated to PR

* lint fix

* test overwrite_a=True in svd

* another test added

#11406

* errors fixed

* tol adjustment + ignore warning

* to adjust

* tol adjust

* comment out 1 failing assert

* atol and rtol adjusted to pass np.float32

* comment in a modified assert

#11829 (comment)

* trying to fix test_small_sigma_sparse

* implemented singular vectors refining necessary for very small singular values

* minor edit just to run the tests

* fixed most test failures

* fixed res_tol and final RR procedure in _svds.py

* fixed a bug in computing singular vectors as eigenvectors of X_XH

* added missing import randn

* dropped singular vectors refinement as arpack and lobpcg fail to deal with X_XH

* 2 lint fixes

* Update scipy/sparse/linalg/_eigen/_svds.py

Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>

* unused removed as suggested + formatting in 1 line

* removed no longer used functions

* extra empty line removed

* run extra qr for lobpcg only + comment

    # solver lobpcg does not guarantee exactly orthonormal eigenvectors
    if solver == 'lobpcg':
        eigvec, _ = np.linalg.qr(eigvec)

* move qr to the right place as proposed

Co-authored-by: Evgueni Ovtchinnikov <eeo@CCPPETMR>
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Co-authored-by: Andrew Knyazev <andrew.knyazev@ucdenver.edu>
Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>
Co-authored-by: lobpcg <42650045+lobpcg@users.noreply.github.com>
  • Loading branch information
6 people committed May 30, 2022
1 parent e9b29a9 commit 01621e0
Show file tree
Hide file tree
Showing 2 changed files with 194 additions and 86 deletions.
119 changes: 44 additions & 75 deletions scipy/sparse/linalg/_eigen/_svds.py
Expand Up @@ -7,39 +7,12 @@
from scipy.sparse.linalg._interface import LinearOperator, aslinearoperator
from scipy.sparse.linalg._eigen.lobpcg import lobpcg # type: ignore[no-redef]
from scipy.sparse.linalg._svdp import _svdp
from scipy.linalg import svd

arpack_int = _arpack.timing.nbx.dtype
__all__ = ['svds']


def _augmented_orthonormal_cols(x, k, random_state):
# extract the shape of the x array
n, m = x.shape
# create the expanded array and copy x into it
y = np.empty((n, m+k), dtype=x.dtype)
y[:, :m] = x
# do some modified gram schmidt to add k random orthonormal vectors
for i in range(k):
# sample a random initial vector
v = random_state.standard_normal(size=n)
if np.iscomplexobj(x):
v = v + 1j*random_state.standard_normal(size=n)
# subtract projections onto the existing unit length vectors
for j in range(m+i):
u = y[:, j]
v -= (np.dot(v, u.conj()) / np.dot(u, u.conj())) * u
# normalize v
v /= np.sqrt(np.dot(v, v.conj()))
# add v into the output array
y[:, m+i] = v
# return the expanded array
return y


def _augmented_orthonormal_rows(x, k, random_state):
return _augmented_orthonormal_cols(x.T, k, random_state).T


def _herm(x):
return x.T.conj()

Expand Down Expand Up @@ -137,8 +110,8 @@ def svds(A, k=6, ncv=None, tol=0, which='LM', v0=None,
Parameters
----------
A : sparse matrix or LinearOperator
Matrix to decompose.
A : ndarray, sparse matrix, or LinearOperator
Matrix to decompose of a floating point numeric dtype.
k : int, default: 6
Number of singular values and singular vectors to compute.
Must satisfy ``1 <= k <= kmax``, where ``kmax=min(M, N)`` for
Expand Down Expand Up @@ -213,7 +186,17 @@ def svds(A, k=6, ncv=None, tol=0, which='LM', v0=None,
-----
This is a naive implementation using ARPACK or LOBPCG as an eigensolver
on ``A.conj().T @ A`` or ``A @ A.conj().T``, depending on which one is more
efficient.
efficient, followed by the Rayleigh-Ritz method as postprocessing; see
https://w.wiki/4zms
Alternatively, the PROPACK solver can be called. ``form="array"``
Choices of the input matrix ``A`` numeric dtype may be limited.
Only ``solver="lobpcg"`` supports all floating point dtypes
real: 'np.single', 'np.double', 'np.longdouble' and
complex: 'np.csingle', 'np.cdouble', 'np.clongdouble'.
The ``solver="arpack"`` supports only
'np.single', 'np.double', and 'np.cdouble'.
Examples
--------
Expand Down Expand Up @@ -274,11 +257,13 @@ def svds(A, k=6, ncv=None, tol=0, which='LM', v0=None,
X_matmat = A.matmat
XH_dot = A.rmatvec
XH_mat = A.rmatmat
transpose = False
else:
X_dot = A.rmatvec
X_matmat = A.rmatmat
XH_dot = A.matvec
XH_mat = A.matmat
transpose = True

dtype = getattr(A, 'dtype', None)
if dtype is None:
Expand Down Expand Up @@ -306,8 +291,10 @@ def matmat_XH_X(x):
else:
X = random_state.uniform(size=(min(A.shape), k))

eigvals, eigvec = lobpcg(XH_X, X, tol=tol ** 2, maxiter=maxiter,
largest=largest, )
_, eigvec = lobpcg(XH_X, X, tol=tol ** 2, maxiter=maxiter,
largest=largest)
# lobpcg does not guarantee exactly orthonormal eigenvectors
eigvec, _ = np.linalg.qr(eigvec)

elif solver == 'propack':
jobu = return_singular_vectors in {True, 'u'}
Expand Down Expand Up @@ -338,49 +325,31 @@ def matmat_XH_X(x):
elif solver == 'arpack' or solver is None:
if v0 is None and not rs_was_None:
v0 = random_state.uniform(size=(min(A.shape),))
eigvals, eigvec = eigsh(XH_X, k=k, tol=tol ** 2, maxiter=maxiter,
ncv=ncv, which=which, v0=v0)

# Gramian matrices have real non-negative eigenvalues.
eigvals = np.maximum(eigvals.real, 0)

# Use the sophisticated detection of small eigenvalues from pinvh.
t = eigvec.dtype.char.lower()
factor = {'f': 1E3, 'd': 1E6}
cond = factor[t] * np.finfo(t).eps
cutoff = cond * np.max(eigvals)

# Get a mask indicating which eigenpairs are not degenerately tiny,
# and create the re-ordered array of thresholded singular values.
above_cutoff = (eigvals > cutoff)
nlarge = above_cutoff.sum()
nsmall = k - nlarge
slarge = np.sqrt(eigvals[above_cutoff])
s = np.zeros_like(eigvals)
s[:nlarge] = slarge
if not return_singular_vectors:
return np.sort(s)
_, eigvec = eigsh(XH_X, k=k, tol=tol ** 2, maxiter=maxiter,
ncv=ncv, which=which, v0=v0)

if n > m:
vlarge = eigvec[:, above_cutoff]
ularge = (X_matmat(vlarge) / slarge
if return_singular_vectors != 'vh' else None)
vhlarge = _herm(vlarge)
Av = X_matmat(eigvec)
if not return_singular_vectors:
s = svd(Av, compute_uv=False, overwrite_a=True)
return s[::-1]

# compute the left singular vectors of X and update the right ones
# accordingly
u, s, vh = svd(Av, full_matrices=False, overwrite_a=True)
u = u[:, ::-1]
s = s[::-1]
vh = vh[::-1]

jobu = return_singular_vectors in {True, 'u'}
jobv = return_singular_vectors in {True, 'vh'}

if transpose:
u_tmp = eigvec @ _herm(vh) if jobu else None
vh = _herm(u) if jobv else None
u = u_tmp
else:
ularge = eigvec[:, above_cutoff]
vhlarge = (_herm(X_matmat(ularge) / slarge)
if return_singular_vectors != 'u' else None)

u = (_augmented_orthonormal_cols(ularge, nsmall, random_state)
if ularge is not None else None)
vh = (_augmented_orthonormal_rows(vhlarge, nsmall, random_state)
if vhlarge is not None else None)

indexes_sorted = np.argsort(s)
s = s[indexes_sorted]
if u is not None:
u = u[:, indexes_sorted]
if vh is not None:
vh = vh[indexes_sorted]
if not jobu:
u = None
vh = vh @ _herm(eigvec) if jobv else None

return u, s, vh

0 comments on commit 01621e0

Please sign in to comment.