DOC: update the pandas.core.groupby.GroupBy.max docstring #20073

shivam6294 · 2018-03-09T11:29:33Z

Added extendible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the pandas.core.groupby.GroupBy.max docstring"
The validation script passes: scripts/validate_docstrings.py pandas.core.groupby.GroupBy.max
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single pandas.core.groupby.GroupBy.max
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################# Docstring (pandas.core.groupby.GroupBy.max)  #################
################################################################################

Compute the maximum of each group.

For multiple groupers, the result index will be a
:class:`~pandas.MultiIndex`.

Parameters
----------
kwargs : dict
    Optional keyword arguments to pass to `max`.

    `numeric_only` : bool
        Include only float, int, boolean columns.
    `min_count` : int
        The required number of valid values to perform the operation.
        If fewer than `min_count` non-NA values are present the result
        will be NA.

Returns
-------
Series or DataFrame

Examples
--------
>>> df = pd.DataFrame(
...     {'type': ['apple', 'apple', 'apple', 'orange', 'orange'],
...      'variety': ['gala', 'fuji', 'fuji', 'valencia', 'navel'],
...      'quantity': [2, 4, 8, 3, 1],
...      'price': [0.8, 1.25, 2.5, 1.25, 1.0],
...     },
...     columns=['type', 'variety', 'quantity', 'price']
... )
>>> df
     type   variety  quantity  price
0   apple      gala         2   0.80
1   apple      fuji         4   1.25
2   apple      fuji         8   2.50
3  orange  valencia         3   1.25
4  orange     navel         1   1.00

>>> g = df.groupby('type')
>>> g.max()
         variety  quantity  price
type
apple       gala         8   2.50
orange  valencia         3   1.25

By default, the `max` operation is performed on columns of all dtypes
(including the 'variety' columns which is of type `str`).

In order to only keep only the numeric columns ('quantity' and 'price'),
the `numeric_only` keyword argument can be used:

>>> g.max(numeric_only=True)
        quantity  price
type
apple          8   2.50
orange         3   1.25

Grouping by more than one column results in :class:`~pandas.DataFrame` with
a :class:`~pandas.MultiIndex`.

>>> g = df.groupby(['type', 'variety'])
>>> g.max()
                 quantity  price
type   variety
apple  fuji             8   2.50
       gala             2   0.80
orange navel            1   1.00
       valencia         3   1.25

See Also
    --------
    pandas.Series.max: compute max of values
    pandas.DataFrame.max: compute max of values
    pandas.Series.groupby: groupby method of Series
    pandas.DataFrame.groupby: groupby method of DataFrame
    pandas.Panel.groupby: goupby method of Panel

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.core.groupby.GroupBy.max" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

…endible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.

shivam6294 · 2018-03-09T11:33:49Z

pandas/core/groupby.py

+    """
+
+_numeric_operations_examples = dict(
+    max="""Examples


Reviewers: If you think this design pattern is okay, please let me know if I can add examples for the other generic numeric operators in this module (i.e. sum, prod, min, first, last) in the same PR

to make this easier to edit, create the _numeric_operations_examples = {}
then each entry like

_numeric_operations['max'] = dedent( ..... )

jreback · 2018-03-09T11:37:30Z

for groupby examples they should reference the Series (or DataFrame) method of the same name in See Also

@jorisvandenbossche @datapythonista generic point.

jorisvandenbossche

Nice docstring, thanks!

Added extendible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.

That seems a good idea

If you think this design pattern is okay, please let me know if I can add examples for the other generic numeric operators in this module

I would leave that for a separate PR.

jorisvandenbossche · 2018-03-09T12:30:48Z

pandas/core/groupby.py

    """)

+_numeric_operations_doc_template = """
+    Compute %(f)s of group values.


We could maybe make a dict with "full" names (max -> the maximum, prod -> the product, ..), to give the first sentence a nicer read

Also wondering, would "of each group" be clearer than "of group values" ?

jorisvandenbossche · 2018-03-09T12:34:23Z

pandas/core/groupby.py

+    pandas.Series.%(name)s: groupby method of Series
+    pandas.DataFrame.%(name)s: groupby method of DataFrame
+    pandas.Panel.%(name)s: groupby method of Panel"""
+


as @jreback mentioned, can you add here pandas.Series/DataFrame.max as well? (I think it will be using %(f)s)
Maybe put those first

jorisvandenbossche · 2018-03-09T12:39:09Z

pandas/core/groupby.py

+    Parameters
+    ----------
+    kwargs : dict
+        Optional keyword arguments to pass to `%(f)s`.


Ideally we would list the valid keywords here .. Looking at where min/max/sum etc are create, it's only either min_count or either numeric_only

codecov · 2018-03-09T18:05:37Z

Codecov Report

Merging #20073 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20073      +/-   ##
==========================================
+ Coverage   91.79%    91.8%   +<.01%     
==========================================
  Files         152      152              
  Lines       49205    49208       +3     
==========================================
+ Hits        45169    45174       +5     
+ Misses       4036     4034       -2

Flag	Coverage Δ
#multiple	`90.18% <100%> (ø)`	⬆️
#single	`41.85% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.15% <100%> (+0.01%)`	⬆️
pandas/core/internals.py	`95.53% <0%> (ø)`	⬆️
pandas/util/testing.py	`83.95% <0%> (+0.2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 670c2e4...38059c9. Read the comment docs.

jreback · 2018-03-10T12:49:05Z

pandas/core/groupby.py

+_numeric_operations_doc_template = """
+    Compute %(f)s of group values.
+
+    For multiple groupings, the result index will be a


groupings -> groupers

jreback · 2018-03-10T12:50:23Z

pandas/core/groupby.py

+    """
+
+_numeric_operations_examples = dict(
+    max="""Examples


to make this easier to edit, create the _numeric_operations_examples = {}
then each entry like

_numeric_operations['max'] = dedent( ..... )

cmdelatorre · 2018-03-10T18:50:10Z

pandas/core/groupby.py

+    --------
+    Grouping by one column.
+
+    >>> df = pd.DataFrame({'A': 'a b b b'.split(), 'B': [1,2,2,3], 'C': [4,5,6,7]})


The Conventions for the examples in the guidelines states:

For more complex examples (groupping for example), avoid using data without interpretation, like a matrix of random numbers with columns A, B, C, D… And instead use a meaningful example

Therefore, this examples should be improved.

jorisvandenbossche · 2018-03-15T14:16:11Z

@shivam6294 do you have time to update the PR based on the feedback?

shivam6294 · 2018-03-18T10:45:44Z

Hey @jorisvandenbossche. I'm working on this right now

…_max

…scription more readable. Used dedent where applicable.

shivam6294 · 2018-03-18T11:39:06Z

pandas/core/groupby.py

    kwargs : dict
        Optional keyword arguments to pass to `%(f)s`.

+        `numeric_only` : bool


Couldn't find an example in the docs where kwargs were listed out, so I've used the same convention used for listing out normal parameters.

jorisvandenbossche

Nice updates! Thanks

Added few more comments

jorisvandenbossche · 2018-03-19T20:48:18Z

pandas/core/groupby.py

+        Optional keyword arguments to pass to `%(f)s`.
+
+        `numeric_only` : bool
+            Include only float, int, boolean columns.


I think it would be fine to actually list them as normal parameters (not inside the description of the 'kwargs').

I am only not sure if min_count is used for all. I think it is only used for sum and prod.
So not sure what the best way is here. Ideally this would also be substituted into the template, but that might get complicated.

(given that this can get complicated, it is also fine to leave this for a separate issue/PR, and not solve it here directly)

jorisvandenbossche · 2018-03-19T20:50:29Z

pandas/core/groupby.py

+)
+
+_numeric_operations_see_also = dedent(
+    """See Also


You need to have the "See also" on the next line, otherwise dedent does not work.
But if this gives a blank line too much, I think you can do:

"""\ See Also ...

jorisvandenbossche · 2018-03-19T20:50:42Z

pandas/core/groupby.py

+    pandas.DataFrame.%(f)s: compute %(f)s of values
+    pandas.Series.%(name)s: groupby method of Series
+    pandas.DataFrame.%(name)s: groupby method of DataFrame
+    pandas.Panel.%(name)s: groupby method of Panel"""


you can leave out Panel (it is deprecated)

shivam6294 · 2018-03-20T16:04:13Z

Thanks for the review @jorisvandenbossche 👍 I'll create a new issue for the complicated template substitution. I had a question about the Parameters - listing the parameters like this:

    Parameters
    ----------
    kwargs : dict
        For compatibility with other groupby methods.
    numeric_only : bool
        Include only float, int, boolean columns.
    min_count : int
        The required number of valid values to perform the operation.
        If fewer than `min_count` non-NA values are present the result
        will be NA.

results in errors when I run the scripts/validate_docstrings.py script:

Errors found:
        Errors in parameters section
                Unknown parameters {'numeric_only', 'min_count'}

Is this alright? Also, should I still leave kwargs in there even though all the kwargs have been listed?

jreback · 2018-11-01T01:34:39Z

closing as stale. if you'd like to continue pls ping.

DOC: Improved docstring of pandas.core.groupby.GroupBy.max. Added ext…

bf9d984

…endible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.

shivam6294 commented Mar 9, 2018

View reviewed changes

jreback added Docs Groupby labels Mar 9, 2018

jorisvandenbossche reviewed Mar 9, 2018

View reviewed changes

jreback requested changes Mar 10, 2018

View reviewed changes

cmdelatorre suggested changes Mar 10, 2018

View reviewed changes

shivam6294 added 2 commits March 18, 2018 19:19

Merge remote-tracking branch 'upstream/master' into docstring_groupby…

bac69c6

…_max

Post code review changes. Improved examples, detailed kwargs, made de…

6a775e7

…scription more readable. Used dedent where applicable.

shivam6294 commented Mar 18, 2018

View reviewed changes

Minor change in examples section.

38059c9

jorisvandenbossche reviewed Mar 19, 2018

View reviewed changes

jreback closed this Nov 1, 2018

Uh oh!

DOC: update the pandas.core.groupby.GroupBy.max docstring #20073

DOC: update the pandas.core.groupby.GroupBy.max docstring #20073

Uh oh!

Conversation

shivam6294 commented Mar 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 9, 2018

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Mar 15, 2018

Uh oh!

shivam6294 commented Mar 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shivam6294 commented Mar 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Nov 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shivam6294 commented Mar 9, 2018 •

edited

Loading

codecov bot commented Mar 9, 2018 •

edited

Loading

shivam6294 commented Mar 20, 2018 •

edited

Loading