Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for spectral clustering error when using 'amg' solver #13707

Merged
merged 27 commits into from
Aug 29, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a0f1f7b
change AMG tolerance default & laplacian shift (fixes #13393)
whitews Apr 24, 2019
6e5ecf6
add spectral clustering test for AMG solver
whitews Apr 24, 2019
d61cf3b
update docs with edits from Andrew Knyazev (& some fixed)
whitews Apr 24, 2019
b0c4356
revert tolerance value changes, not needed for AMG solver fix
whitews Apr 24, 2019
0c8390b
update v0.21 changelog noting #13393 fix
whitews Apr 24, 2019
f64decb
simplify diag correction in spectral_embedding
whitews Apr 24, 2019
cf126b2
revert the reversion: increased tolerances are required
whitews Apr 24, 2019
d9fc5ee
use importorskip instead of try/except clause for availability of pyamg
whitews Apr 25, 2019
1f544ea
reference issue in amg solver failure test
whitews Apr 25, 2019
5a3a058
clarify random seed change for spectral embedding amg failure test
whitews Apr 25, 2019
bba21b0
Merge branch 'master' into spec-clust-amg-fix
whitews Apr 25, 2019
346cff0
leave original tolerance for 'lobpcg' eigen solver
whitews Apr 25, 2019
98b17ec
implement original shift code from lobpcg, add comment
whitews May 23, 2019
302850c
Merge branch 'master' into spec-clust-amg-fix
whitews May 23, 2019
2eed15e
fix long line
whitews May 23, 2019
783e6e4
only shift laplacian for the solver, then un-shift back to original
whitews May 23, 2019
65bf8e9
Merge branch 'master' into spec-clust-amg-fix
jnothman Jul 25, 2019
4a2e3df
Update sklearn/manifold/spectral_embedding_.py
whitews Aug 2, 2019
0362c39
remove noinspection comment
whitews Aug 2, 2019
f46f91b
removing spectral clustering bug text
whitews Aug 2, 2019
61d54de
add spectral clustering fix contribution
whitews Aug 2, 2019
c48fbe0
fix markup in last commit
whitews Aug 2, 2019
95f2a99
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
whitews Aug 2, 2019
a452c95
mention SpectralEmbedding & SpectralClustering classes in release notes
whitews Aug 4, 2019
e601a8c
oMerge remote-tracking branch 'origin/master' into spec-clust-amg-fix
ogrisel Aug 29, 2019
de645ba
Update AMG docstring and improve codestyle
ogrisel Aug 29, 2019
0501603
Stricter check in pyamg test
ogrisel Aug 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 28 additions & 22 deletions doc/modules/clustering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -428,21 +428,24 @@ given sample.
Spectral clustering
===================

:class:`SpectralClustering` does a low-dimension embedding of the
affinity matrix between samples, followed by a KMeans in the low
dimensional space. It is especially efficient if the affinity matrix is
sparse and the `pyamg <https://github.com/pyamg/pyamg>`_ module is installed.
SpectralClustering requires the number of clusters to be specified. It
works well for a small number of clusters but is not advised when using
many clusters.

For two clusters, it solves a convex relaxation of the `normalised
cuts <https://people.eecs.berkeley.edu/~malik/papers/SM-ncut.pdf>`_ problem on
the similarity graph: cutting the graph in two so that the weight of the
edges cut is small compared to the weights of the edges inside each
cluster. This criteria is especially interesting when working on images:
graph vertices are pixels, and edges of the similarity graph are a
function of the gradient of the image.
:class:`SpectralClustering` performs a low-dimension embedding of the
affinity matrix between samples, followed by clustering, e.g., by KMeans,
of the components of the eigenvectors in the low dimensional space.
It is especially computationally efficient if the affinity matrix is sparse
and the `amg` solver is used for the eigenvalue problem (Note, the `amg` solver
requires that the `pyamg <https://github.com/pyamg/pyamg>`_ module is installed.)

The present version of SpectralClustering requires the number of clusters
to be specified in advance. It works well for a small number of clusters,
but is not advised for many clusters.

For two clusters, SpectralClustering solves a convex relaxation of the
`normalised cuts <https://people.eecs.berkeley.edu/~malik/papers/SM-ncut.pdf>`_
problem on the similarity graph: cutting the graph in two so that the weight of
the edges cut is small compared to the weights of the edges inside each
cluster. This criteria is especially interesting when working on images, where
graph vertices are pixels, and weights of the edges of the similarity graph are
computed using a function of a gradient of the image.


.. |noisy_img| image:: ../auto_examples/cluster/images/sphx_glr_plot_segmentation_toy_001.png
Expand Down Expand Up @@ -489,12 +492,11 @@ Different label assignment strategies

Different label assignment strategies can be used, corresponding to the
``assign_labels`` parameter of :class:`SpectralClustering`.
The ``"kmeans"`` strategy can match finer details of the data, but it can be
more unstable. In particular, unless you control the ``random_state``, it
may not be reproducible from run-to-run, as it depends on a random
initialization. On the other hand, the ``"discretize"`` strategy is 100%
reproducible, but it tends to create parcels of fairly even and
geometrical shape.
``"kmeans"`` strategy can match finer details, but can be unstable.
In particular, unless you control the ``random_state``, it may not be
reproducible from run-to-run, as it depends on random initialization.
The alternative ``"discretize"`` strategy is 100% reproducible, but tends
to create parcels of fairly even and geometrical shape.

===================================== =====================================
``assign_labels="kmeans"`` ``assign_labels="discretize"``
Expand All @@ -505,7 +507,7 @@ geometrical shape.
Spectral Clustering Graphs
--------------------------

Spectral Clustering can also be used to cluster graphs by their spectral
Spectral Clustering can also be used to partition graphs via their spectral
embeddings. In this case, the affinity matrix is the adjacency matrix of the
graph, and SpectralClustering is initialized with `affinity='precomputed'`::

Expand All @@ -532,6 +534,10 @@ graph, and SpectralClustering is initialized with `affinity='precomputed'`::
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.8100>`_
Andrew Y. Ng, Michael I. Jordan, Yair Weiss, 2001

* `"Preconditioned Spectral Clustering for Stochastic
Block Partition Streaming Graph Challenge"
<https://arxiv.org/abs/1708.07481>`_
David Zhuzhunashvili, Andrew Knyazev

.. _hierarchical_clustering:

Expand Down
5 changes: 3 additions & 2 deletions sklearn/manifold/spectral_embedding_.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,11 +288,12 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
laplacian = check_array(laplacian, dtype=np.float64,
accept_sparse=True)
laplacian = _set_diag(laplacian, 1, norm_laplacian)
laplacian = laplacian + 1e-5 * sparse.eye(laplacian.shape[0])
whitews marked this conversation as resolved.
Show resolved Hide resolved
ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))
M = ml.aspreconditioner()
X = random_state.rand(laplacian.shape[0], n_components + 1)
X[:, 0] = dd.ravel()
lambdas, diffusion_map = lobpcg(laplacian, X, M=M, tol=1.e-12,
lambdas, diffusion_map = lobpcg(laplacian, X, M=M, tol=1.e-5,
ogrisel marked this conversation as resolved.
Show resolved Hide resolved
largest=False)
embedding = diffusion_map.T
if norm_laplacian:
Expand Down Expand Up @@ -320,7 +321,7 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
# doesn't behave well in low dimension
X = random_state.rand(laplacian.shape[0], n_components + 1)
X[:, 0] = dd.ravel()
lambdas, diffusion_map = lobpcg(laplacian, X, tol=1e-15,
lambdas, diffusion_map = lobpcg(laplacian, X, tol=1e-5,
ogrisel marked this conversation as resolved.
Show resolved Hide resolved
largest=False, maxiter=2000)
embedding = diffusion_map.T[:n_components]
if norm_laplacian:
Expand Down
26 changes: 26 additions & 0 deletions sklearn/manifold/tests/test_spectral_embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,32 @@ def test_spectral_embedding_amg_solver(seed=36):
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.05)


def test_spectral_embedding_amg_solver_failure(seed=36):
# Test spectral embedding with amg solver failure
ogrisel marked this conversation as resolved.
Show resolved Hide resolved
try:
ogrisel marked this conversation as resolved.
Show resolved Hide resolved
from pyamg import smoothed_aggregation_solver # noqa
except ImportError:
raise SkipTest("pyamg not available.")

# The generated graph below is NOT fully connected if n_neighbors=3
n_samples = 200
n_clusters = 3
n_features = 3
centers = np.eye(n_clusters, n_features)
S, true_labels = make_blobs(n_samples=n_samples, centers=centers,
cluster_std=1., random_state=42)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be check separately norm_laplacian = False and norm_laplacian = True . The latter is the default, the only option currently checked.

se_amg0 = SpectralEmbedding(n_components=3, affinity="nearest_neighbors",
eigen_solver="amg", n_neighbors=3,
random_state=np.random.RandomState(seed))
se_amg1 = SpectralEmbedding(n_components=3, affinity="nearest_neighbors",
ogrisel marked this conversation as resolved.
Show resolved Hide resolved
eigen_solver="amg", n_neighbors=3,
random_state=np.random.RandomState(seed+1))
embed_amg0 = se_amg0.fit_transform(S)
embed_amg1 = se_amg1.fit_transform(S)
assert _check_with_col_sign_flipping(embed_amg0, embed_amg1, 0.05)


@pytest.mark.filterwarnings("ignore:the behavior of nmi will "
"change in version 0.22")
def test_pipeline_spectral_clustering(seed=36):
Expand Down