Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMG spectral clustering fails just after a few iterations of LOBPCG with " leading minor of the array is not positive definite" #13393

Closed
lobpcg opened this issue Mar 5, 2019 · 11 comments · Fixed by #13707

Comments

@lobpcg
Copy link

@lobpcg lobpcg commented Mar 5, 2019

Description

AMG spectral clustering fails just after a few iterations of LOBPCG; see also #10715 and #11965.

I have suggested the fix is in #6489 (comment) which is tested by @dfilan in #6489 (comment)

Steps/Code to Reproduce

import sklearn.cluster
import scipy.sparse as sparse
import random

random.seed(1337)

rand_int = random.randrange(1000)
num_nodes = 1000
X = sparse.rand(num_nodes, num_nodes, density = 0.1, random_state = rand_int)
upper = sparse.triu(X) - sparse.diags(X.diagonal())
sym_matrix = upper + upper.T
clustering = sklearn.cluster.SpectralClustering(n_clusters = 3,
eigen_solver = 'amg',
random_state = rand_int,
affinity = 'precomputed')
labels = clustering.fit(sym_matrix)

Expected Results

No error is thrown. The code works fine.

Actual Results

Error like:
numpy.linalg.LinAlgError: 3-th leading minor of the array is not positive definite

The problem

sklearn\manifold\spectral_embedding_.py sets up the AMG preconditioning for LOBPCG in line

ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))

sklearn AMG implementation follows, without any reference, AMG preconditioning for graph Laplacian first proposed and tested in

Andrew Knyazev, Multiscale Spectral Graph Partitioning and Image Segmentation. Workshop on Algorithms for Modern Massive Data Sets; Stanford University and Yahoo! Research June 21–24, 2006
http://math.ucdenver.edu/~aknyazev/research/conf/image_segment_talk_UCDavis06/image_segment_talk_Stanford06.pdf (slide 13)

But the Laplacian matrix is always singular, having at least one zero eigenvalue, corresponding to the trivial eigenvector, which is a constant. Using a singular matrix for preconditioning may be expected to result in often random failures in LOBPCG and is not supported by the existing theory; see
https://doi.org/10.1007/s10208-015-9297-1

The fix

Although undocumented in the original reference given above, we used a simple fix in
https://bitbucket.org/joseroman/blopex/wiki/HypreforImageSegmentation.md
which is just to shift the Laplacian's diagonal by a positive scalar, alpha, before solving for its eigenvalues. The scalar used was alpha=1e-5, and the matrix was shifted with the MATLAB command:
Mat=Mat+alpha*speye(n);

A similar approach, with alpha=1, is used, in line 323 of https://github.com/mmp2/megaman/blob/master/megaman/utils/eigendecomp.py

In #6489 (comment) @dfilan successfully tested the following in double-precision: Changing line 293 to sklearn\manifold\spectral_embedding_.py from

ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))

to

        laplacian4AMG = laplacian + 1e-10 * sparse.eye(laplacian.shape[0])
        ml = smoothed_aggregation_solver(check_array(laplacian4AMG, 'csr'))

but this creates a second matrix, to be used just for AMG preconditioning. It may be possible to just shift the whole Laplacian:

        laplacian = laplacian + 1e-10 * sparse.eye(laplacian.shape[0])
        ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))

The choice of alpha=1e-10 here should be OK in double-precision, but I would advocate alpha=1e-5 to be also safe in single-precision. Choosing an increasingly positive alpha may be expected to slow down the LOBPCG convergence, e.g., alpha=1 is probably excessively large, while the change from alpha=1e-10 to alpha=1e-5 would probably be unnoticeable.

If the Laplacian is not normalized, i.e. its diagonal is not all ones, its shift changes the eigenpairs, so the results of the spectral embedding and clustering also change depending on the value of the shift, if the whole Laplacian is shifted. If the shift is small, the changes may be unnoticeable. The safe choice is using the separate Laplacian laplacian4AMG shifted only inside the smoothed_aggregation_solver, then the shift value may only affect the convergence speed, but not the results of the spectral embedding and clustering.

Versions

All, including scikit-learn 0.21.dev0

@lobpcg

This comment has been minimized.

Copy link
Author

@lobpcg lobpcg commented Mar 6, 2019

Fixed in PR #12316, see
#12316 (comment)

@sleighsoft

This comment has been minimized.

Copy link

@sleighsoft sleighsoft commented Aug 5, 2019

When will this make it into an official release?

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Aug 5, 2019

Hopefully the next release, but the patch awaits another review.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Aug 14, 2019

I get a similar error using lobpcg directly:

import pytest

import numpy as np

from sklearn.manifold.spectral_embedding_ import SpectralEmbedding
from sklearn.datasets import make_blobs

seed = 1

centers = np.array([
    [0.0, 5.0, 0.0, 0.0, 0.0],
    [0.0, 0.0, 4.0, 0.0, 0.0],
    [1.0, 0.0, 0.0, 5.0, 1.0],
])
n_samples = 1000
n_clusters, n_features = centers.shape
S, true_labels = make_blobs(n_samples=n_samples, centers=centers,
                            cluster_std=1., random_state=42)

se_lobpcg = SpectralEmbedding(n_components=2, affinity="nearest_neighbors",
                              eigen_solver="lobpcg", n_neighbors=5,
                              random_state=np.random.RandomState(seed))
embed_lobpcg = se_lobpcg.fit_transform(S)

LinAlgError: 2-th leading minor of the array is not positive definite

@lobpcg

This comment has been minimized.

Copy link
Author

@lobpcg lobpcg commented Aug 14, 2019

I get a similar error using lobpcg directly:

import pytest

import numpy as np

from sklearn.manifold.spectral_embedding_ import SpectralEmbedding
from sklearn.datasets import make_blobs

seed = 1

centers = np.array([
    [0.0, 5.0, 0.0, 0.0, 0.0],
    [0.0, 0.0, 4.0, 0.0, 0.0],
    [1.0, 0.0, 0.0, 5.0, 1.0],
])
n_samples = 1000
n_clusters, n_features = centers.shape
S, true_labels = make_blobs(n_samples=n_samples, centers=centers,
                            cluster_std=1., random_state=42)

se_lobpcg = SpectralEmbedding(n_components=2, affinity="nearest_neighbors",
                              eigen_solver="lobpcg", n_neighbors=5,
                              random_state=np.random.RandomState(seed))
embed_lobpcg = se_lobpcg.fit_transform(S)

LinAlgError: 2-th leading minor of the array is not positive definite

My latest version scipy/scipy#10621 fixes this stability issue - just tested myself. @amueller Could you please also give it a try?

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Aug 14, 2019

It's not merged so I would need to build scipy, right? might try that later ;)

@lobpcg

This comment has been minimized.

Copy link
Author

@lobpcg lobpcg commented Aug 14, 2019

It's not merged so I would need to build scipy, right? might try that later ;)

No built - the lobpcg code is pure Python, so you can just copy/paste it in place of the old version

@glemaitre

This comment has been minimized.

Copy link
Contributor

@glemaitre glemaitre commented Aug 14, 2019

and we have the code in external which can be imported from fixes

@glemaitre

This comment has been minimized.

Copy link
Contributor

@glemaitre glemaitre commented Aug 14, 2019

or at least we should update it if there is something new

@lobpcg

This comment has been minimized.

Copy link
Author

@lobpcg lobpcg commented Aug 14, 2019

scipy/scipy#10621 is tagged for upcoming 1.4.0

Yes, lobpcg is in external, and is already updated in #12319 (comment) to be in sync with scipy 1.3.0

@ogrisel

This comment has been minimized.

Copy link
Member

@ogrisel ogrisel commented Aug 29, 2019

I merged the AMG fix in master and as expected it fixed the original issue but not #13393 (comment) which will be tackled by scipy/scipy#10621 and maybe a backport in scikit-learn.

ogrisel added a commit to ogrisel/scikit-learn that referenced this issue Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.