Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Changed name n_components to n_connected_components in AgglomerativeClustering base class #13427

Conversation

scouvreur
Copy link
Contributor

@scouvreur scouvreur commented Mar 10, 2019

Reference Issues/PRs

This PR is a fix for issue #13357

What does this implement/fix? Explain your changes.

As explained in issue #13357, this fix renames n_components to n_connected_components as n_components refers to something different in the rest of the codebase, but here refers to the attribute of connected components in the graph.

Checked in the following scripts in the codebase for breaking changes:

  • sklearn/cluster/tests/test_hierarchical.py
  • sklearn/cluster/tests/test_birch.py
  • sklearn/cluster/init.py
  • sklearn/cluster/birch.py
  • examples/cluster/plot_inductive_clustering.py
  • examples/cluster/plot_digits_linkage.py
  • examples/cluster/plot_linkage_comparison.py
  • examples/cluster/plot_cluster_comparison.py
  • examples/cluster/plot_agglomerative_clustering_metrics.py
  • examples/cluster/plot_ward_structured_vs_unstructured.py
  • examples/cluster/plot_agglomerative_clustering.py
  • examples/cluster/plot_coin_ward_segmentation.py
  • benchmarks/bench_plot_ward.py

and found none.

@jnothman
Copy link
Member

To avoid breaking backwards compatibility we need to support users accessing n_components_ but provide a deprecation warning.. The easiest way to do this is to define n_components_ as a property which retrieves n_connected_components_ while raising a warning. I've not yet looked at your test failures.

@scouvreur
Copy link
Contributor Author

Thanks for your feedback and guidance Joel - I will try and incorporate that idea by adding commits to this PR

@scouvreur
Copy link
Contributor Author

scouvreur commented Mar 12, 2019

Hi Joel - I just added the deprecation warning. Let me know if you agree on:

  • the scikit-learn versions in the warning (0.20.3 and 0.24 for deprecation)
  • the handling of the case where neither n_components nor n_connected_components are passed

@scouvreur
Copy link
Contributor Author

To avoid breaking backwards compatibility we need to support users accessing n_components_ but provide a deprecation warning.. The easiest way to do this is to define n_components_ as a property which retrieves n_connected_components_ while raising a warning. I've not yet looked at your test failures.

Concerning the test failures, I believe they are not related to this issue however

@jnothman
Copy link
Member

jnothman commented Mar 12, 2019 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation should be on the attribute retrieval, I.e. in the class:

@property
def n_components_(self):
    Warn...
    return self.n_connected_components_

@scouvreur
Copy link
Contributor Author

scouvreur commented Mar 15, 2019

Hi @jnothman - it seems the patch code coverage has decreased, do you know if I should update any of the tests ? Or are the tests not related to this change ?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get code coverage if you add a test that does something like:

with pytest.warns(DeprecationWarning):
    n = my_agglomerative_clusterer.n_components_
assert n == my_agglomerative_clusterer.n_connected_components_

sklearn/cluster/hierarchical.py Outdated Show resolved Hide resolved
sklearn/cluster/hierarchical.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_hierarchical.py Outdated Show resolved Hide resolved
sklearn/cluster/hierarchical.py Show resolved Hide resolved
@scouvreur
Copy link
Contributor Author

Apologies @NicolasHug, I am having some issues with resolving a merge conflict I will look at this tomorrow

@scouvreur scouvreur force-pushed the AgglomerativeClustering-n_components_-Renaming branch from 5d58054 to 5d30766 Compare March 20, 2019 23:29
@scouvreur
Copy link
Contributor Author

@NicolasHug just added the whatsnew in commit #31354c3.

Thanks to you @NicolasHug and @jnothman for your feedback and assistance ! I learned like crazy working on this issue.

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
doc/whats_new/_contributors.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
sklearn/cluster/hierarchical.py Outdated Show resolved Hide resolved
sklearn/cluster/hierarchical.py Outdated Show resolved Hide resolved
sklearn/cluster/hierarchical.py Outdated Show resolved Hide resolved
@scouvreur scouvreur force-pushed the AgglomerativeClustering-n_components_-Renaming branch from 57b1d0d to 4cbe2c1 Compare March 22, 2019 13:57
@scouvreur scouvreur force-pushed the AgglomerativeClustering-n_components_-Renaming branch 5 times, most recently from 0d44e5b to 7642f64 Compare March 24, 2019 01:22
…ion and AgglomerativeClustering class dosctrings
@scouvreur
Copy link
Contributor Author

Apologies @NicolasHug @jnothman - after fixing indentation changes in commit #e48f5cc you requested I get a test failure on test_affinity_passed_to_fix_connectivity() in sklearn/cluster/tests/test_hierarchical.py but can't seem to understand why that is the case. Do you think this could be conflicted changes from master I should merge into this branch ?

=================================== FAILURES ============
___________________ test_affinity_passed_to_fix_connectivity ___________________

    def test_affinity_passed_to_fix_connectivity():
        # Test that the affinity parameter is actually passed to the pairwise
        # function
    
        size = 2
        rng = np.random.RandomState(0)
        X = rng.randn(size, size)
        mask = np.array([True, False, False, True])
    
        connectivity = grid_to_graph(n_x=size, n_y=size,
                                     mask=mask, return_as=np.ndarray)
    
        class FakeAffinity:
            def __init__(self):
                self.counter = 0
    
            def increment(self, *args, **kwargs):
                self.counter += 1
                return self.counter
    
        fa = FakeAffinity()
    
        linkage_tree(X, connectivity=connectivity, affinity=fa.increment)
    
>       assert_equal(fa.counter, 3)

FakeAffinity = <class 'sklearn.cluster.tests.test_hierarchical.test_affinity_passed_to_fix_connectivity.<locals>.FakeAffinity'>
X          = array([[ 1.76405235,  0.40015721],
       [ 0.97873798,  2.2408932 ]])
connectivity = array([[1, 0],
       [0, 1]])
fa         = <sklearn.cluster.tests.test_hierarchical.test_affinity_passed_to_fix_connectivity.<locals>.FakeAffinity object at 0x7f5e46075dd8>
mask       = array([ True, False, False,  True], dtype=bool)
rng        = <mtrand.RandomState object at 0x7f5e45f3aee8>
size       = 2

../1/s/sklearn/cluster/tests/test_hierarchical.py:600: 

@NicolasHug
Copy link
Member

NicolasHug commented Mar 24, 2019

line 471 in hierarchical should be affinity=affinity

And yes when in doubt, merge master, it never hurts.

@scouvreur scouvreur force-pushed the AgglomerativeClustering-n_components_-Renaming branch from e48f5cc to 6c76d7c Compare March 24, 2019 23:15
@scouvreur
Copy link
Contributor Author

Thanks alot @NicolasHug, my mistake

@jnothman
Copy link
Member

Good work @scouvreur.. Glad you've learnt a lot on the way!

@scouvreur scouvreur changed the title [AgglomerativeClustering] Changed name n_components to n_connected_components in base class [MRG] Changed name n_components to n_connected_components in AgglomerativeClustering base class Apr 3, 2019
@agramfort agramfort merged commit 93e09aa into scikit-learn:master Apr 4, 2019
@agramfort
Copy link
Member

thx @scouvreur

@scouvreur scouvreur deleted the AgglomerativeClustering-n_components_-Renaming branch April 5, 2019 05:22
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
…ativeClustering base class (scikit-learn#13427)

* Changed name n_components to n_connected_components in base class

* Fixed line which exceeded PEP8 max of 79 chars

* Fixed line 818 which exceeded PEP8 max of 79 chars

* Added try and except to provide deprecation warning if  passed

* Updated deprecation and removal version numbers

* Added deprecation of n_components using @Property generator

* Makes FeatureAgglomeration class inherit n_connected_components_ attribute from AgglomerativeClustering class

* Added test for DeprecationWarning when trying to access n_components

* Removed @Property generator causing linting error

* Fixed typo in test

* Fixed flake8 error due to single line between 2 functions

* Test fix attempt

* Edited test function docstring

* Corrected n_components deprecation test docstring

* Fixed line continuation issue in AgglomerativeClustering base class

* Added deprecation message as part of the @deprecated decorator

* Added  attribute deprecation information in the Attributes section of the AgglomerativeClustering base class docstring

* Added test for deprecation warning message

* Added  attribute deprecation information in the Attributes section of the FeatureAgglomeration base class docstring

* Fixed test issue and added longer match string

* Edited n_components_ deprecation message to add double backticks

* Fixed match string to reflect deprecation message change in test

* Added name to list of contributors

* Documented information in v0.21 changelog

* Added cluster parent folder to documentation in v0.21 changelog

* Removed myself from list of core contributors

* Moved |API| subsection to the end of the list, and changed reference to Github username

* Removed n_components deprecation documentation from FeatureAgglomeration and AgglomerativeClustering class dosctrings

* Fix indentation on _fix_connectivity function call
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…ativeClustering base class (scikit-learn#13427)

* Changed name n_components to n_connected_components in base class

* Fixed line which exceeded PEP8 max of 79 chars

* Fixed line 818 which exceeded PEP8 max of 79 chars

* Added try and except to provide deprecation warning if  passed

* Updated deprecation and removal version numbers

* Added deprecation of n_components using @Property generator

* Makes FeatureAgglomeration class inherit n_connected_components_ attribute from AgglomerativeClustering class

* Added test for DeprecationWarning when trying to access n_components

* Removed @Property generator causing linting error

* Fixed typo in test

* Fixed flake8 error due to single line between 2 functions

* Test fix attempt

* Edited test function docstring

* Corrected n_components deprecation test docstring

* Fixed line continuation issue in AgglomerativeClustering base class

* Added deprecation message as part of the @deprecated decorator

* Added  attribute deprecation information in the Attributes section of the AgglomerativeClustering base class docstring

* Added test for deprecation warning message

* Added  attribute deprecation information in the Attributes section of the FeatureAgglomeration base class docstring

* Fixed test issue and added longer match string

* Edited n_components_ deprecation message to add double backticks

* Fixed match string to reflect deprecation message change in test

* Added name to list of contributors

* Documented information in v0.21 changelog

* Added cluster parent folder to documentation in v0.21 changelog

* Removed myself from list of core contributors

* Moved |API| subsection to the end of the list, and changed reference to Github username

* Removed n_components deprecation documentation from FeatureAgglomeration and AgglomerativeClustering class dosctrings

* Fix indentation on _fix_connectivity function call
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…AgglomerativeClustering base class (scikit-learn#13427)"

This reverts commit 2d94859.
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…AgglomerativeClustering base class (scikit-learn#13427)"

This reverts commit 2d94859.
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
…ativeClustering base class (scikit-learn#13427)

* Changed name n_components to n_connected_components in base class

* Fixed line which exceeded PEP8 max of 79 chars

* Fixed line 818 which exceeded PEP8 max of 79 chars

* Added try and except to provide deprecation warning if  passed

* Updated deprecation and removal version numbers

* Added deprecation of n_components using @Property generator

* Makes FeatureAgglomeration class inherit n_connected_components_ attribute from AgglomerativeClustering class

* Added test for DeprecationWarning when trying to access n_components

* Removed @Property generator causing linting error

* Fixed typo in test

* Fixed flake8 error due to single line between 2 functions

* Test fix attempt

* Edited test function docstring

* Corrected n_components deprecation test docstring

* Fixed line continuation issue in AgglomerativeClustering base class

* Added deprecation message as part of the @deprecated decorator

* Added  attribute deprecation information in the Attributes section of the AgglomerativeClustering base class docstring

* Added test for deprecation warning message

* Added  attribute deprecation information in the Attributes section of the FeatureAgglomeration base class docstring

* Fixed test issue and added longer match string

* Edited n_components_ deprecation message to add double backticks

* Fixed match string to reflect deprecation message change in test

* Added name to list of contributors

* Documented information in v0.21 changelog

* Added cluster parent folder to documentation in v0.21 changelog

* Removed myself from list of core contributors

* Moved |API| subsection to the end of the list, and changed reference to Github username

* Removed n_components deprecation documentation from FeatureAgglomeration and AgglomerativeClustering class dosctrings

* Fix indentation on _fix_connectivity function call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants