New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Improves class design for AgglomerativeClustering and FeatureAgglomeration #9875

Merged
merged 23 commits into from Oct 24, 2017

Conversation

Projects
None yet
4 participants
@thechargedneutron
Contributor

thechargedneutron commented Oct 5, 2017

Reference Issue

Fixes #9846

What does this implement/fix? Explain your changes.

Improves the class design of AgglomerativeClustering and FeatureAgglomeration

Any other comments?

Changes will be made if not green.

@thechargedneutron thechargedneutron changed the title from [WIP] Improve class design for AgglomerativeClustering and FeatureAgglomeration to [WIP] Improves class design for AgglomerativeClustering and FeatureAgglomeration Oct 5, 2017

thechargedneutron added some commits Oct 5, 2017

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 5, 2017

Contributor

Need help in what needs to be done to pass the tests in test_non_meta_estimators.

Contributor

thechargedneutron commented Oct 5, 2017

Need help in what needs to be done to pass the tests in test_non_meta_estimators.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Oct 6, 2017

Member

"Improves the class design" is what sense? what is the design issue you identified?

Member

agramfort commented Oct 6, 2017

"Improves the class design" is what sense? what is the design issue you identified?

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 13, 2017

Contributor

I am not sure if I am on right track but I tried using a _BaseClass, from which the two classes Agglomerative Clustering and Feature Extraction are derived and having pooling_func only in FeatureAgglomeration. I am not sure.

Contributor

thechargedneutron commented Oct 13, 2017

I am not sure if I am on right track but I tried using a _BaseClass, from which the two classes Agglomerative Clustering and Feature Extraction are derived and having pooling_func only in FeatureAgglomeration. I am not sure.

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 17, 2017

Contributor

@lesteve I guess this is what you were trying to convey. I have made a base class which is used by both AgglomerativeClustering and FeatureAgglomeration. The later one uses pooling_func, not the former.

from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import AgglomerativeClustering


centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
                            random_state=0)

model = AgglomerativeClustering(linkage='complete',
                                connectivity=None,
                                affinity = 'cosine',
                                pooling_func = "test_error",
                                n_clusters=3)
model.fit(X)

The above patch now raises an error:

TypeError: __init__() got an unexpected keyword argument 'pooling_func'
Contributor

thechargedneutron commented Oct 17, 2017

@lesteve I guess this is what you were trying to convey. I have made a base class which is used by both AgglomerativeClustering and FeatureAgglomeration. The later one uses pooling_func, not the former.

from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import AgglomerativeClustering


centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
                            random_state=0)

model = AgglomerativeClustering(linkage='complete',
                                connectivity=None,
                                affinity = 'cosine',
                                pooling_func = "test_error",
                                n_clusters=3)
model.fit(X)

The above patch now raises an error:

TypeError: __init__() got an unexpected keyword argument 'pooling_func'
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 17, 2017

Member

I think this is not quite right. You don't need a new base class. FeatureAgglomeration strictly extends from AgglomerativeClustering already, and that is fine. What is a problem is that pooling_func should not be available in AgglomerativeClustering only in descendants of AgglomerationTransform.

We want to remove pooling_func from AgglomerativeClustering and retain it in FeatureAgglomeration. But we cannot merely remove it from AgglomerativeClustering as this may, in an awkward case, break code. So we should deprecate it.

I see the current PR's implementation, however, as more wrong than right. FeatureAgglomeration should define a new init which stores pooling_func and calls AgglomerativeClustering.__init__ with the remaining params. It should also have its own docstring

Member

jnothman commented Oct 17, 2017

I think this is not quite right. You don't need a new base class. FeatureAgglomeration strictly extends from AgglomerativeClustering already, and that is fine. What is a problem is that pooling_func should not be available in AgglomerativeClustering only in descendants of AgglomerationTransform.

We want to remove pooling_func from AgglomerativeClustering and retain it in FeatureAgglomeration. But we cannot merely remove it from AgglomerativeClustering as this may, in an awkward case, break code. So we should deprecate it.

I see the current PR's implementation, however, as more wrong than right. FeatureAgglomeration should define a new init which stores pooling_func and calls AgglomerativeClustering.__init__ with the remaining params. It should also have its own docstring

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 17, 2017

Contributor

So, what I can understand from your concern is that, at present, I only need to add a deprecation warning which will raise a warning if a user tries to explicitly call pooling_func. The warning may be raised during fit (or init ??). The rest if fine for atleast two releases. Correct me if I am wrong.

Contributor

thechargedneutron commented Oct 17, 2017

So, what I can understand from your concern is that, at present, I only need to add a deprecation warning which will raise a warning if a user tries to explicitly call pooling_func. The warning may be raised during fit (or init ??). The rest if fine for atleast two releases. Correct me if I am wrong.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 17, 2017

Member
Member

jnothman commented Oct 17, 2017

@jnothman

Please mark the parameter as deprecated in the docstring.

Show outdated Hide outdated sklearn/cluster/hierarchical.py

thechargedneutron added some commits Oct 18, 2017

Show outdated Hide outdated sklearn/cluster/hierarchical.py

@thechargedneutron thechargedneutron changed the title from [WIP] Improves class design for AgglomerativeClustering and FeatureAgglomeration to [MRG] Improves class design for AgglomerativeClustering and FeatureAgglomeration Oct 18, 2017

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 19, 2017

Contributor

@jnothman Review please.

Contributor

thechargedneutron commented Oct 19, 2017

@jnothman Review please.

@jnothman jnothman changed the title from [MRG] Improves class design for AgglomerativeClustering and FeatureAgglomeration to [MRG+1] Improves class design for AgglomerativeClustering and FeatureAgglomeration Oct 19, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 19, 2017

Member
Member

jnothman commented Oct 19, 2017

@lesteve

The main problem is that with your changes as it is FeatureAgglomeration with default parameter has a warning:

from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import AgglomerativeClustering, FeatureAgglomeration


centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
                            random_state=0)

model = FeatureAgglomeration()
model.fit(X)

Some other comments too.

Show outdated Hide outdated sklearn/cluster/hierarchical.py
Show outdated Hide outdated sklearn/cluster/hierarchical.py
Show outdated Hide outdated sklearn/cluster/hierarchical.py
@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 21, 2017

Contributor

@lesteve Added the changes. I am still not able to figure out the warnings raised by codecov as pointed out by @jnothman . Also, there's a PEP8 failure "line too long". Here:

super(FeatureAgglomeration, self).__init__(n_clusters=n_clusters,
                                                   memory=memory,
                                                   connectivity=connectivity,
                                                   compute_full_tree=compute_full_tree,
                                                   linkage=linkage, affinity=affinity)

Can I modify it in any way?

Contributor

thechargedneutron commented Oct 21, 2017

@lesteve Added the changes. I am still not able to figure out the warnings raised by codecov as pointed out by @jnothman . Also, there's a PEP8 failure "line too long". Here:

super(FeatureAgglomeration, self).__init__(n_clusters=n_clusters,
                                                   memory=memory,
                                                   connectivity=connectivity,
                                                   compute_full_tree=compute_full_tree,
                                                   linkage=linkage, affinity=affinity)

Can I modify it in any way?

@thechargedneutron

This comment has been minimized.

Show comment
Hide comment
@thechargedneutron

thechargedneutron Oct 23, 2017

Contributor

Is this what you were suggesting? @jnothman

Contributor

thechargedneutron commented Oct 23, 2017

Is this what you were suggesting? @jnothman

@jnothman

Yes, that's alright.

lesteve added some commits Oct 24, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 24, 2017

Member

I added a whats_new entry. I'll look at the generated doc and then merge if it looks fine.

Member

lesteve commented Oct 24, 2017

I added a whats_new entry. I'll look at the generated doc and then merge if it looks fine.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 24, 2017

Member

Merging, thanks a lot!

Member

lesteve commented Oct 24, 2017

Merging, thanks a lot!

@lesteve lesteve merged commit b4561f0 into scikit-learn:master Oct 24, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.17%)
Details
codecov/project 96.17% (+<.01%) compared to 933f953
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details

@thechargedneutron thechargedneutron deleted the thechargedneutron:aggcluster branch Oct 25, 2017

donigian added a commit to donigian/scikit-learn that referenced this pull request Oct 28, 2017

Merge branch 'master' of github.com:scikit-learn/scikit-learn into do…
…cs/donigian-update-contribution-guidelines

* 'master' of github.com:scikit-learn/scikit-learn: (23 commits)
  fixes #10031: fix attribute name and shape in documentation (#10033)
  [MRG+1] add changelog entry for fixed and merged PR #10005 issue #9633 (#10025)
  [MRG] Fix LogisticRegression see also should include LogisticRegressionCV(#9995) (#10022)
  [MRG + 1] Labels of clustering should start at 0 or -1 if noise (#10015)
  MAINT Remove redundancy in #9552 (#9573)
  [MRG+1] correct comparison in GaussianNB for 'priors' (#10005)
  [MRG + 1] ENH add check_inverse in FunctionTransformer (#9399)
  [MRG] FIX bug in nested set_params usage (#9999)
  [MRG+1] Fix LOF and Isolation benchmarks (#9798)
  [MRG + 1] Fix negative inputs checking in mean_squared_log_error (#9968)
  DOC Fix typo (#9996)
  DOC Fix typo: x axis -> y axis (#9985)
  improve example plot_forest_iris.py (#9989)
  [MRG+1] Deprecate pooling_func unused parameter in AgglomerativeClustering (#9875)
  DOC update news
  DOC Fix three typos in manifold documentation (#9990)
  DOC add missing dot in docstring
  DOC Add what's new for 0.19.1 (#9983)
  Improve readability of outlier detection example. (#9973)
  DOC: Fixed typo (#9977)
  ...

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment