New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix n_components kwarg missing in SpectralClustering. See #13698 #13726
Conversation
Assertion error happens at the last line I tried to fix this by setting These two errors might be the reason why So does anyone have any idea on how to handle this issue? |
sklearn/cluster/spectral.py
Outdated
self.n_clusters = n_clusters | ||
self.eigen_solver = eigen_solver | ||
self.n_components = n_clusters if n_components \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set self.n_components = n_components
and pass self.n_components
directly to spectral_clustering
without modifying any of the parameters in __init__
. spectral_clustering
already handles the case when n_components
is defined:
scikit-learn/sklearn/cluster/spectral.py
Line 256 in 8d3b4ff
n_components = n_clusters if n_components is None else n_components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea! thanks for help
@thomasjpfan When I run pytest locally, there is no DocTestFailure; however when running checks after submitting new commit, I got 2 DocTestFailure. Do you know why this happens and how should I fix this problem? Thank you |
Merge with master |
I think the branch is already up to date with the master. Could you please be more specific on how should I do this? Thanks |
git checkout LOCAL_BRANCH_NAME
git remote add upstream https://github.com/scikit-learn/scikit-learn.git
git fetch upstream
git merge upstream/master We updated how we build docs in circleci, you need to merge with master to build the docs without error. |
thanks so much for help @thomasjpfan |
@thomasjpfan There is still that doc error after merging with master. Do you have any hint about this? Or is there anything that I did not do right? thank you |
sklearn/cluster/spectral.py
Outdated
@@ -383,12 +386,12 @@ class SpectralClustering(BaseEstimator, ClusterMixin): | |||
... assign_labels="discretize", | |||
... random_state=0).fit(X) | |||
>>> clustering.labels_ | |||
array([1, 1, 1, 0, 0, 0]) | |||
array([1, 1, 1, 0, 0, 0], dtype=int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this change
@thomasjpfan I think this PR is ready to be reviewed. Thank you |
Please add a brief test to check that n_components affects the result, and that it is equivalent to n_clusters by default.
sklearn/cluster/spectral.py
Outdated
@@ -307,6 +307,9 @@ class SpectralClustering(BaseEstimator, ClusterMixin): | |||
to be installed. It can be faster on very large, sparse problems, | |||
but may also lead to instabilities. | |||
n_components : integer, optional, default is n_clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_components : integer, optional, default is n_clusters | |
n_components : integer, optional, default=n_clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Should I add the test as def test_n_components
in test_spectral.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test that n_components is equivalent to n_clusters by default, I need to initialize the value of n_components in __init__
as self.n_components = n_clusters if n_components is None else n_components
.
However, this would lead to Assertion error that happens at the last line assert param_value == init_param.default, init_param.name
in check_parameters_default_constructible
in sklearn/utils/estimator_checks.py because param_value = n_clusters
if n_components is None, but init_param.default = None
, and because n_clusters != None
, this error occurs.
As thomasjpfan suggested, I "Set self.n_components = n_components
and pass self.n_components
directly to spectral_clustering
without modifying any of the parameters in __init__
. " But in this way, I cannot test n_components
is equivalent to n_clusters
by default, since self.n_components = None
and self.n_clusters = n_clusters
.
Any suggestion on this? Thank you
I just mean that you can test that the result of the model is the same as
if you set n_components=n_clusters. At this point you should only be
modifying test files, no?
|
Got it. Thanks for the clarification. Yes, only test files will be modified. |
No problem, glad it gets approved. There were still 2 checks failing, do I have to do something? |
Those failures are happening on all pull requests. Will be fixed by us
|
LGTM,
Please add an entry to the change log at doc/whats_new/v0.22.rst
. Like the other entries there, please reference this pull request with :pr:
and credit yourself (and other contributors if applicable) with :user:
.
doc/whats_new/v0.22.rst
Outdated
:mod:`sklearn.cluster` | ||
.................. | ||
|
||
- |Fix| :class:`cluster.SpectralClustering` now accepts a ``n_components`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this an enhancement. Fix indicates that someone's code was previously not functioning as documented, ordinarily. Rather, you provide a way to access existing capabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know. Just made the change
Reference Issues/PRs
Fixes #13698
What does this implement/fix? Explain your changes.
The
SpectralClustering
class does not have an instance variablen_components
; that's whyn_components
cannot be accessed by an instance ofSpectralClustering
class. My temporary fix is to addn_components
as an instance variable ofSpectralClustering
class, and add it as an argument ofspectral_clustering
infit
.If this is a reasonable fix of the problem, I will then add a test function
def test_n_components
intest_spectral.py
. Otherwise please let me know what is the best way to fix this.Any other comments?