New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMG spectral clustering fails just after a few iterations of LOBPCG with " leading minor of the array is not positive definite" #13393
Comments
Fixed in PR #12316, see |
When will this make it into an official release? |
Hopefully the next release, but the patch awaits another review. |
I get a similar error using import pytest
import numpy as np
from sklearn.manifold.spectral_embedding_ import SpectralEmbedding
from sklearn.datasets import make_blobs
seed = 1
centers = np.array([
[0.0, 5.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 4.0, 0.0, 0.0],
[1.0, 0.0, 0.0, 5.0, 1.0],
])
n_samples = 1000
n_clusters, n_features = centers.shape
S, true_labels = make_blobs(n_samples=n_samples, centers=centers,
cluster_std=1., random_state=42)
se_lobpcg = SpectralEmbedding(n_components=2, affinity="nearest_neighbors",
eigen_solver="lobpcg", n_neighbors=5,
random_state=np.random.RandomState(seed))
embed_lobpcg = se_lobpcg.fit_transform(S)
|
My latest version scipy/scipy#10621 fixes this stability issue - just tested myself. @amueller Could you please also give it a try? |
It's not merged so I would need to build scipy, right? might try that later ;) |
No built - the lobpcg code is pure Python, so you can just copy/paste it in place of the old version |
and we have the code in external which can be imported from fixes |
or at least we should update it if there is something new |
scipy/scipy#10621 is tagged for upcoming 1.4.0 Yes, lobpcg is in external, and is already updated in #12319 (comment) to be in sync with scipy 1.3.0 |
I merged the AMG fix in master and as expected it fixed the original issue but not #13393 (comment) which will be tackled by scipy/scipy#10621 and maybe a backport in scikit-learn. |
Description
AMG spectral clustering fails just after a few iterations of LOBPCG; see also #10715 and #11965.
I have suggested the fix is in #6489 (comment) which is tested by @dfilan in #6489 (comment)
Steps/Code to Reproduce
Expected Results
No error is thrown. The code works fine.
Actual Results
Error like:
numpy.linalg.LinAlgError: 3-th leading minor of the array is not positive definite
The problem
sklearn\manifold\spectral_embedding_.py sets up the AMG preconditioning for LOBPCG in line
sklearn AMG implementation follows, without any reference, AMG preconditioning for graph Laplacian first proposed and tested in
Andrew Knyazev, Multiscale Spectral Graph Partitioning and Image Segmentation. Workshop on Algorithms for Modern Massive Data Sets; Stanford University and Yahoo! Research June 21–24, 2006
http://math.ucdenver.edu/~aknyazev/research/conf/image_segment_talk_UCDavis06/image_segment_talk_Stanford06.pdf (slide 13)
But the Laplacian matrix is always singular, having at least one zero eigenvalue, corresponding to the trivial eigenvector, which is a constant. Using a singular matrix for preconditioning may be expected to result in often random failures in LOBPCG and is not supported by the existing theory; see
https://doi.org/10.1007/s10208-015-9297-1
The fix
Although undocumented in the original reference given above, we used a simple fix in
https://bitbucket.org/joseroman/blopex/wiki/HypreforImageSegmentation.md
which is just to shift the Laplacian's diagonal by a positive scalar, alpha, before solving for its eigenvalues. The scalar used was alpha=1e-5, and the matrix was shifted with the MATLAB command:
Mat=Mat+alpha*speye(n);
A similar approach, with alpha=1, is used, in line 323 of https://github.com/mmp2/megaman/blob/master/megaman/utils/eigendecomp.py
In #6489 (comment) @dfilan successfully tested the following in double-precision: Changing line 293 to sklearn\manifold\spectral_embedding_.py from
to
but this creates a second matrix, to be used just for AMG preconditioning. It may be possible to just shift the whole Laplacian:
The choice of
alpha=1e-10
here should be OK in double-precision, but I would advocatealpha=1e-5
to be also safe in single-precision. Choosing an increasingly positive alpha may be expected to slow down the LOBPCG convergence, e.g.,alpha=1
is probably excessively large, while the change fromalpha=1e-10
toalpha=1e-5
would probably be unnoticeable.If the Laplacian is not normalized, i.e. its diagonal is not all ones, its shift changes the eigenpairs, so the results of the spectral embedding and clustering also change depending on the value of the shift, if the whole Laplacian is shifted. If the shift is small, the changes may be unnoticeable. The safe choice is using the separate Laplacian
laplacian4AMG
shifted only inside the smoothed_aggregation_solver, then the shift value may only affect the convergence speed, but not the results of the spectral embedding and clustering.Versions
All, including scikit-learn 0.21.dev0
The text was updated successfully, but these errors were encountered: