Skip to content

Conversation

@shivam6294
Copy link

@shivam6294 shivam6294 commented Mar 9, 2018

Added extendible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the pandas.core.groupby.GroupBy.max docstring"
  • The validation script passes: scripts/validate_docstrings.py pandas.core.groupby.GroupBy.max
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single pandas.core.groupby.GroupBy.max
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################# Docstring (pandas.core.groupby.GroupBy.max)  #################
################################################################################

Compute the maximum of each group.

For multiple groupers, the result index will be a
:class:`~pandas.MultiIndex`.

Parameters
----------
kwargs : dict
    Optional keyword arguments to pass to `max`.

    `numeric_only` : bool
        Include only float, int, boolean columns.
    `min_count` : int
        The required number of valid values to perform the operation.
        If fewer than `min_count` non-NA values are present the result
        will be NA.

Returns
-------
Series or DataFrame

Examples
--------
>>> df = pd.DataFrame(
...     {'type': ['apple', 'apple', 'apple', 'orange', 'orange'],
...      'variety': ['gala', 'fuji', 'fuji', 'valencia', 'navel'],
...      'quantity': [2, 4, 8, 3, 1],
...      'price': [0.8, 1.25, 2.5, 1.25, 1.0],
...     },
...     columns=['type', 'variety', 'quantity', 'price']
... )
>>> df
     type   variety  quantity  price
0   apple      gala         2   0.80
1   apple      fuji         4   1.25
2   apple      fuji         8   2.50
3  orange  valencia         3   1.25
4  orange     navel         1   1.00

>>> g = df.groupby('type')
>>> g.max()
         variety  quantity  price
type
apple       gala         8   2.50
orange  valencia         3   1.25

By default, the `max` operation is performed on columns of all dtypes
(including the 'variety' columns which is of type `str`).

In order to only keep only the numeric columns ('quantity' and 'price'),
the `numeric_only` keyword argument can be used:

>>> g.max(numeric_only=True)
        quantity  price
type
apple          8   2.50
orange         3   1.25

Grouping by more than one column results in :class:`~pandas.DataFrame` with
a :class:`~pandas.MultiIndex`.

>>> g = df.groupby(['type', 'variety'])
>>> g.max()
                 quantity  price
type   variety
apple  fuji             8   2.50
       gala             2   0.80
orange navel            1   1.00
       valencia         3   1.25

See Also
    --------
    pandas.Series.max: compute max of values
    pandas.DataFrame.max: compute max of values
    pandas.Series.groupby: groupby method of Series
    pandas.DataFrame.groupby: groupby method of DataFrame
    pandas.Panel.groupby: goupby method of Panel

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.core.groupby.GroupBy.max" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

…endible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.
"""

_numeric_operations_examples = dict(
max="""Examples
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: If you think this design pattern is okay, please let me know if I can add examples for the other generic numeric operators in this module (i.e. sum, prod, min, first, last) in the same PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make this easier to edit, create the _numeric_operations_examples = {}
then each entry like

_numeric_operations['max'] = dedent(
.....
)

@jreback
Copy link
Contributor

jreback commented Mar 9, 2018

for groupby examples they should reference the Series (or DataFrame) method of the same name in See Also

@jorisvandenbossche @datapythonista generic point.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice docstring, thanks!

Added extendible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.

That seems a good idea

If you think this design pattern is okay, please let me know if I can add examples for the other generic numeric operators in this module

I would leave that for a separate PR.

""")

_numeric_operations_doc_template = """
Compute %(f)s of group values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe make a dict with "full" names (max -> the maximum, prod -> the product, ..), to give the first sentence a nicer read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering, would "of each group" be clearer than "of group values" ?

pandas.Series.%(name)s: groupby method of Series
pandas.DataFrame.%(name)s: groupby method of DataFrame
pandas.Panel.%(name)s: groupby method of Panel"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @jreback mentioned, can you add here pandas.Series/DataFrame.max as well? (I think it will be using %(f)s)
Maybe put those first

Parameters
----------
kwargs : dict
Optional keyword arguments to pass to `%(f)s`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would list the valid keywords here .. Looking at where min/max/sum etc are create, it's only either min_count or either numeric_only

@codecov
Copy link

codecov bot commented Mar 9, 2018

Codecov Report

Merging #20073 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20073      +/-   ##
==========================================
+ Coverage   91.79%    91.8%   +<.01%     
==========================================
  Files         152      152              
  Lines       49205    49208       +3     
==========================================
+ Hits        45169    45174       +5     
+ Misses       4036     4034       -2
Flag Coverage Δ
#multiple 90.18% <100%> (ø) ⬆️
#single 41.85% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.15% <100%> (+0.01%) ⬆️
pandas/core/internals.py 95.53% <0%> (ø) ⬆️
pandas/util/testing.py 83.95% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 670c2e4...38059c9. Read the comment docs.

_numeric_operations_doc_template = """
Compute %(f)s of group values.
For multiple groupings, the result index will be a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupings -> groupers

"""

_numeric_operations_examples = dict(
max="""Examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make this easier to edit, create the _numeric_operations_examples = {}
then each entry like

_numeric_operations['max'] = dedent(
.....
)

--------
Grouping by one column.
>>> df = pd.DataFrame({'A': 'a b b b'.split(), 'B': [1,2,2,3], 'C': [4,5,6,7]})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Conventions for the examples in the guidelines states:

For more complex examples (groupping for example), avoid using data without interpretation, like a matrix of random numbers with columns A, B, C, D… And instead use a meaningful example

Therefore, this examples should be improved.

@jorisvandenbossche
Copy link
Member

@shivam6294 do you have time to update the PR based on the feedback?

@shivam6294
Copy link
Author

Hey @jorisvandenbossche. I'm working on this right now

kwargs : dict
Optional keyword arguments to pass to `%(f)s`.
`numeric_only` : bool
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find an example in the docs where kwargs were listed out, so I've used the same convention used for listing out normal parameters.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice updates! Thanks

Added few more comments

Optional keyword arguments to pass to `%(f)s`.
`numeric_only` : bool
Include only float, int, boolean columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine to actually list them as normal parameters (not inside the description of the 'kwargs').

I am only not sure if min_count is used for all. I think it is only used for sum and prod.
So not sure what the best way is here. Ideally this would also be substituted into the template, but that might get complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(given that this can get complicated, it is also fine to leave this for a separate issue/PR, and not solve it here directly)

)

_numeric_operations_see_also = dedent(
"""See Also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to have the "See also" on the next line, otherwise dedent does not work.
But if this gives a blank line too much, I think you can do:

    """\
    See Also
    ...

pandas.DataFrame.%(f)s: compute %(f)s of values
pandas.Series.%(name)s: groupby method of Series
pandas.DataFrame.%(name)s: groupby method of DataFrame
pandas.Panel.%(name)s: groupby method of Panel"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave out Panel (it is deprecated)

@shivam6294
Copy link
Author

shivam6294 commented Mar 20, 2018

Thanks for the review @jorisvandenbossche 👍 I'll create a new issue for the complicated template substitution. I had a question about the Parameters - listing the parameters like this:

    Parameters
    ----------
    kwargs : dict
        For compatibility with other groupby methods.
    numeric_only : bool
        Include only float, int, boolean columns.
    min_count : int
        The required number of valid values to perform the operation.
        If fewer than `min_count` non-NA values are present the result
        will be NA.

results in errors when I run the scripts/validate_docstrings.py script:

Errors found:
        Errors in parameters section
                Unknown parameters {'numeric_only', 'min_count'}

Is this alright? Also, should I still leave kwargs in there even though all the kwargs have been listed?

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

closing as stale. if you'd like to continue pls ping.

@jreback jreback closed this Nov 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants