Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix n_components kwarg missing in SpectralClustering. See #13698 #13726

Merged
merged 19 commits into from May 27, 2019

Conversation

@fdas3213
Copy link
Contributor

fdas3213 commented Apr 25, 2019

Reference Issues/PRs

Fixes #13698

What does this implement/fix? Explain your changes.

The SpectralClustering class does not have an instance variable n_components; that's why n_components cannot be accessed by an instance of SpectralClustering class. My temporary fix is to add n_components as an instance variable of SpectralClustering class, and add it as an argument of spectral_clustering in fit.
If this is a reasonable fix of the problem, I will then add a test function def test_n_components in test_spectral.py. Otherwise please let me know what is the best way to fix this.

Any other comments?

@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented Apr 30, 2019

Assertion error happens at the last line assert param_value == init_param.default, init_param.name in check_parameters_default_constructible in sklearn/utils/estimator_checks.py because param_value = n_clusters if n_components is None, but init_param.default = None, and because n_clusters != None, this error occurs.

I tried to fix this by setting self.n_components = n_components in the constructor of SpectralClustering class and then change the value of self.n_components to self.n_cluster in fit function, however by doing this, I got another error Estimator SpectralClustering should not change or mutate the parameter n_components from None to 8 during fit, which basically says that I cannot change the value of instance variable outside the constructor.

These two errors might be the reason why n_components is not set an instance variable for SpectralClustering class.

So does anyone have any idea on how to handle this issue?

self.n_clusters = n_clusters
self.eigen_solver = eigen_solver
self.n_components = n_clusters if n_components \

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan May 6, 2019

Member

Set self.n_components = n_components and pass self.n_components directly to spectral_clustering without modifying any of the parameters in __init__. spectral_clustering already handles the case when n_components is defined:

n_components = n_clusters if n_components is None else n_components

This comment has been minimized.

Copy link
@fdas3213

fdas3213 May 7, 2019

Author Contributor

Nice idea! thanks for help

@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented May 7, 2019

@thomasjpfan When I run pytest locally, there is no DocTestFailure; however when running checks after submitting new commit, I got 2 DocTestFailure. Do you know why this happens and how should I fix this problem? Thank you

@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented May 7, 2019

Merge with master

@fdas3213 fdas3213 closed this May 8, 2019
@fdas3213 fdas3213 reopened this May 8, 2019
@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented May 8, 2019

Merge with master

I think the branch is already up to date with the master. Could you please be more specific on how should I do this? Thanks

@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented May 8, 2019

git checkout LOCAL_BRANCH_NAME
git remote add upstream https://github.com/scikit-learn/scikit-learn.git
git fetch upstream
git merge upstream/master

We updated how we build docs in circleci, you need to merge with master to build the docs without error.

@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented May 8, 2019

git checkout LOCAL_BRANCH_NAME
git remote add upstream https://github.com/scikit-learn/scikit-learn.git
git fetch upstream
git merge upstream/master

We updated how we build docs in circleci, you need to merge with master to build the docs without error.

thanks so much for help

@thomasjpfan
Seems that DocTestFailure still exists after merging with master.
I fixed this error locally by changing
array([1, 1, 1, 0, 0, 0])
to
array([1, 1, 1, 0, 0, 0], dtype=int64) and everything looks good locally when tested with pytest.
But even after merging with master, this DocTestFailure still exists. Any help is appreciated!

@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented May 12, 2019

@thomasjpfan There is still that doc error after merging with master. Do you have any hint about this? Or is there anything that I did not do right? thank you

@@ -383,12 +386,12 @@ class SpectralClustering(BaseEstimator, ClusterMixin):
... assign_labels="discretize",
... random_state=0).fit(X)
>>> clustering.labels_
array([1, 1, 1, 0, 0, 0])
array([1, 1, 1, 0, 0, 0], dtype=int64)

This comment has been minimized.

Copy link
@thomasjpfan

thomasjpfan May 13, 2019

Member

Revert this change

fdas3213 added 2 commits May 13, 2019
@fdas3213 fdas3213 changed the title [WIP] Fix n_components kwarg missing in SpectralClustering. See #13698 [MRG] Fix n_components kwarg missing in SpectralClustering. See #13698 May 13, 2019
@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented May 14, 2019

@thomasjpfan I think this PR is ready to be reviewed. Thank you

Copy link
Member

jnothman left a comment

Please add a brief test to check that n_components affects the result, and that it is equivalent to n_clusters by default.

@@ -307,6 +307,9 @@ class SpectralClustering(BaseEstimator, ClusterMixin):
to be installed. It can be faster on very large, sparse problems,
but may also lead to instabilities.
n_components : integer, optional, default is n_clusters

This comment has been minimized.

Copy link
@jnothman

jnothman May 21, 2019

Member
Suggested change
n_components : integer, optional, default is n_clusters
n_components : integer, optional, default=n_clusters

This comment has been minimized.

Copy link
@fdas3213

fdas3213 May 21, 2019

Author Contributor

Sure. Should I add the test as def test_n_components in test_spectral.py?

This comment has been minimized.

Copy link
@jnothman

jnothman May 22, 2019

Member

Yes, please!

This comment has been minimized.

Copy link
@fdas3213

fdas3213 May 22, 2019

Author Contributor

To test that n_components is equivalent to n_clusters by default, I need to initialize the value of n_components in __init__ as self.n_components = n_clusters if n_components is None else n_components.

However, this would lead to Assertion error that happens at the last line assert param_value == init_param.default, init_param.name in check_parameters_default_constructible in sklearn/utils/estimator_checks.py because param_value = n_clusters if n_components is None, but init_param.default = None, and because n_clusters != None, this error occurs.

As thomasjpfan suggested, I "Set self.n_components = n_components and pass self.n_components directly to spectral_clustering without modifying any of the parameters in __init__. " But in this way, I cannot test n_components is equivalent to n_clusters by default, since self.n_components = None and self.n_clusters = n_clusters.

Any suggestion on this? Thank you

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented May 22, 2019

@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented May 23, 2019

I just mean that you can test that the result of the model is the same as if you set n_components=n_clusters. At this point you should only be modifying test files, no?

Got it. Thanks for the clarification. Yes, only test files will be modified.

Copy link
Member

jnothman left a comment

Good test, thanks!!

@fdas3213

This comment has been minimized.

Copy link
Contributor Author

fdas3213 commented May 24, 2019

Good test, thanks!!

No problem, glad it gets approved. There were still 2 checks failing, do I have to do something?

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented May 25, 2019

Copy link
Contributor

NicolasHug left a comment

LGTM,

Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

:mod:`sklearn.cluster`
..................

- |Fix| :class:`cluster.SpectralClustering` now accepts a ``n_components``

This comment has been minimized.

Copy link
@jnothman

jnothman May 26, 2019

Member

I'd call this an enhancement. Fix indicates that someone's code was previously not functioning as documented, ordinarily. Rather, you provide a way to access existing capabilities

This comment has been minimized.

Copy link
@fdas3213

fdas3213 May 27, 2019

Author Contributor

Thanks for letting me know. Just made the change

fdas3213 and others added 2 commits May 27, 2019
@jnothman jnothman merged commit db48ebc into scikit-learn:master May 27, 2019
8 of 13 checks passed
8 of 13 checks passed
LGTM analysis: C/C++ Fetching git commits
Details
LGTM analysis: JavaScript Fetching git commits
Details
LGTM analysis: Python Fetching git commits
Details
ci/circleci: doc CircleCI is running your tests
Details
ci/circleci: doc-min-dependencies CircleCI is running your tests
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
scikit-learn.scikit-learn Build #20190527.13 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35_32) Windows py35_32 succeeded
Details
scikit-learn.scikit-learn (Windows py37_64) Windows py37_64 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details
koenvandevelde added a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.