Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Ensures that AgglomerativeClustering passes numpydoc validation #20544

Merged
merged 45 commits into from Aug 31, 2021

Conversation

reshamas
Copy link
Member

Reference Issues/PRs

Addresses #20308

What does this implement/fix? Explain your changes.

Fixes numpydoc errors on AgglomerativeClustering functions

Any other comments?

#DataUmbrella LATAM sprint

@reshamas
Copy link
Member Author

reshamas commented Jul 16, 2021

I had a couple of questions:

  1. I wasn't sure what to put for "See Also" section
  2. line 180 is:
    return_distance : bool, default=None
        If True, return the distance between the clusters.
  1. line 412 is:
    return_distance : bool, default=False
        whether or not to return the distances between the clusters.

For line 180: If it is of type bool, should that be True or False? Should the default be False, or does None imply something else?

sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
maint_tools/test_docstrings.py Show resolved Hide resolved
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just marking as "Request changes" to see that we review the PR.

@glemaitre
Copy link
Member

I resolved the conflict but I think that there is still a couple of failures associated with the validation.

@reshamas
Copy link
Member Author

I resolved the conflict but I think that there is still a couple of failures associated with the validation.

This is the error, and I'm not sure where to resolve it:

E           # Errors
E           
E            - GL08: The object does not have a docstring

@ogrisel
Copy link
Member

ogrisel commented Jul 30, 2021

The full error message is:

E           ValueError: 
E           
E           None
E           
E           FeatureAgglomeration.fit_predict
E           Parsing of the method signature failed, possibly because this is a property.
E           
E           Fit the hierarchical clustering from features, or distance matrix.
E           
E           Parameters
E           ----------
E           X : array-like of shape (n_samples, n_features) or                 (n_samples, n_samples)
E               Training instances to cluster, or distances between instances if
E               ``affinity='precomputed'``.
E           
E           y : Ignored
E               Not used, present here for API consistency by convention.
E           
E           Returns
E           -------
E           labels : ndarray of shape (n_samples,)
E               Cluster labels.
E           
E           # Errors
E           
E            - GL08: The object does not have a docstring

So the problem is related to FeatureAgglomeration.fit_predict but thie GL08 error message is not consistent because this method has a docstring and is just a plain Python method (without any complex decorator or so).

@glemaitre any idea?

@ogrisel
Copy link
Member

ogrisel commented Jul 30, 2021

The None at the beginning of the error message is fishy...

@@ -984,8 +1017,7 @@ def fit(self, X, y=None):
return self

def fit_predict(self, X, y=None):
"""Fit the hierarchical clustering from features or distance matrix,
and return cluster labels.
"""Fit the hierarchical clustering from features, or distance matrix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short paragraph to make the difference method between the fit_predict and the fit methods explicit, for instance: "In addition to fitting, this method also return the result of the clustering assignment for each sample in the training set."

@reshamas
Copy link
Member Author

reshamas commented Aug 2, 2021

This is odd, right? Instead of (n_samples, n_samples), I should have (n_samples,) ?

(not that I think that is causing the error.)

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features) or \
                (n_samples, n_samples)
            Training instances to cluster, or distances between instances if
            ``affinity='precomputed'``.

@glemaitre
Copy link
Member

glemaitre commented Aug 2, 2021

This is odd, right? Instead of (n_samples, n_samples), I should have (n_samples,) ?

Nop. (n_samples, n_samples) is when we pass a distance matrix where we computed the distance between all samples and the parameter affinity="precomputed". It avoids recomputing this distance matrix internally.

@reshamas
Copy link
Member Author

reshamas commented Aug 2, 2021

This is odd, right? Instead of (n_samples, n_samples), I should have (n_samples,) ?

Nop. (n_samples, n_samples) is when we pass a distance matrix where we computed the distance between all samples and the parameter affinity="precomputed". It avoids recomputing this distance matrix internally.

Ok, it's been added back. Still not sure what is causing the error. ?


Returns
-------
self
self : object
Returns the transformer.
"""
X = self._validate_data(
X,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error comes from the undocumented property in line 1230:

@property
    def fit_predict(self):
        """Raise `AttributeError`."""
        raise AttributeError

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I pushed a couple of small changes. Let see if the CIs are happy.
I see that we should have a subsequent PR because FeatureAgglomeration should not accept a **params parameter at fit.

@glemaitre
Copy link
Member

The CI for the numpydoc validation passed. So I will merge it now.

@glemaitre glemaitre merged commit 75bdc29 into scikit-learn:main Aug 31, 2021
@glemaitre
Copy link
Member

Thanks @reshamas

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants