Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix n_components kwarg missing in SpectralClustering. See #13698 #13726

Merged
merged 19 commits into from May 27, 2019
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
17 changes: 11 additions & 6 deletions sklearn/cluster/spectral.py
Expand Up @@ -307,6 +307,9 @@ class SpectralClustering(BaseEstimator, ClusterMixin):
to be installed. It can be faster on very large, sparse problems,
but may also lead to instabilities.

n_components : integer, optional, default is n_clusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
n_components : integer, optional, default is n_clusters
n_components : integer, optional, default=n_clusters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Should I add the test as def test_n_components in test_spectral.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test that n_components is equivalent to n_clusters by default, I need to initialize the value of n_components in __init__ as self.n_components = n_clusters if n_components is None else n_components.

However, this would lead to Assertion error that happens at the last line assert param_value == init_param.default, init_param.name in check_parameters_default_constructible in sklearn/utils/estimator_checks.py because param_value = n_clusters if n_components is None, but init_param.default = None, and because n_clusters != None, this error occurs.

As thomasjpfan suggested, I "Set self.n_components = n_components and pass self.n_components directly to spectral_clustering without modifying any of the parameters in __init__. " But in this way, I cannot test n_components is equivalent to n_clusters by default, since self.n_components = None and self.n_clusters = n_clusters.

Any suggestion on this? Thank you

Number of eigen vectors to use for the spectral embedding

random_state : int, RandomState instance or None (default)
A pseudo random number generator used for the initialization of the
lobpcg eigen vectors decomposition when ``eigen_solver='amg'`` and by
Expand Down Expand Up @@ -387,8 +390,8 @@ class SpectralClustering(BaseEstimator, ClusterMixin):
>>> clustering # doctest: +NORMALIZE_WHITESPACE
SpectralClustering(affinity='rbf', assign_labels='discretize', coef0=1,
degree=3, eigen_solver=None, eigen_tol=0.0, gamma=1.0,
kernel_params=None, n_clusters=2, n_init=10, n_jobs=None,
n_neighbors=10, random_state=0)
kernel_params=None, n_clusters=2, n_components=None, n_init=10,
n_jobs=None, n_neighbors=10, random_state=0)

Notes
-----
Expand Down Expand Up @@ -425,12 +428,13 @@ class SpectralClustering(BaseEstimator, ClusterMixin):
https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf
"""

def __init__(self, n_clusters=8, eigen_solver=None, random_state=None,
n_init=10, gamma=1., affinity='rbf', n_neighbors=10,
eigen_tol=0.0, assign_labels='kmeans', degree=3, coef0=1,
kernel_params=None, n_jobs=None):
def __init__(self, n_clusters=8, eigen_solver=None, n_components=None,
random_state=None, n_init=10, gamma=1., affinity='rbf',
n_neighbors=10, eigen_tol=0.0, assign_labels='kmeans',
degree=3, coef0=1, kernel_params=None, n_jobs=None):
self.n_clusters = n_clusters
self.eigen_solver = eigen_solver
self.n_components = n_components
self.random_state = random_state
self.n_init = n_init
self.gamma = gamma
Expand Down Expand Up @@ -486,6 +490,7 @@ def fit(self, X, y=None):
random_state = check_random_state(self.random_state)
self.labels_ = spectral_clustering(self.affinity_matrix_,
n_clusters=self.n_clusters,
n_components=self.n_components,
eigen_solver=self.eigen_solver,
random_state=random_state,
n_init=self.n_init,
Expand Down
3 changes: 1 addition & 2 deletions sklearn/cluster/tests/test_spectral.py
Expand Up @@ -107,8 +107,7 @@ def test_affinities():
# a dataset that yields a stable eigen decomposition both when built
# on OSX and Linux
X, y = make_blobs(n_samples=20, random_state=0,
centers=[[1, 1], [-1, -1]], cluster_std=0.01
)
centers=[[1, 1], [-1, -1]], cluster_std=0.01)
# nearest neighbors affinity
sp = SpectralClustering(n_clusters=2, affinity='nearest_neighbors',
random_state=0)
Expand Down