Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the pandas.DataFrame.cummax docstring #20336

Merged
merged 23 commits into from
Mar 17, 2018

Conversation

arminv
Copy link
Contributor

@arminv arminv commented Mar 13, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.cummax)  #####################
################################################################################

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
maximum.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cummax : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row,
use ``axis=1``

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0

See also
--------
pandas.core.window.Expanding.max : Similar functionality
    but ignores ``NaN`` values.
DataFrame.max : Return the maximum over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'kwargs', 'args'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type


################################################################################
##################### Docstring (pandas.DataFrame.cummin)  #####################
################################################################################

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
minimum.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cummin : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row,
use ``axis=1``

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

See also
--------
pandas.core.window.Expanding.min : Similar functionality
    but ignores ``NaN`` values.
DataFrame.min : Return the minimum over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'args', 'kwargs'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type


################################################################################
##################### Docstring (pandas.DataFrame.cumsum)  #####################
################################################################################

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
sum.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cumsum : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row,
use ``axis=1``

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0

See also
--------
pandas.core.window.Expanding.sum : Similar functionality
    but ignores ``NaN`` values.
DataFrame.sum : Return the sum over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'args', 'kwargs'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type

################################################################################
##################### Docstring (pandas.DataFrame.cumprod) #####################
################################################################################

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
product.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cumprod : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cumprod(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row,
use ``axis=1``

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0

See also
--------
pandas.core.window.Expanding.prod : Similar functionality
    but ignores ``NaN`` values.
DataFrame.prod : Return the product over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'args', 'kwargs'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

The *args/**kwargs error is a bug in the parameter evaluation

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do better with the examples, but good work.

will be NA
will be NA.
*args : any, default None
**kwargs : any, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you have *args, **kwargs without the type

pandas.DataFrame.cummax : Return cumulative maximum over DataFrame axis.
pandas.DataFrame.cummin : Return cumulative minimum over DataFrame axis.
pandas.DataFrame.cumsum : Return cumulative sum over DataFrame axis.
pandas.DataFrame.cumprod : Return cumulative product over DataFrame axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of the pandas. prefix

@@ -8327,24 +8327,95 @@ def _doc_parms(cls):
"""

_cnum_doc = """
Return %(desc)s over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the %(desc)s.

Parameters
----------
axis : %(axis_descr)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure were axis_descr is defined, but the format it {0 or 'index', 1 or 'columns'} if I'm not wrong. You can check recent merge PRs to be sure.

--------
**DataFrame**

Create a DataFrame:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can get rid of this, I'm sure users will find out :)

0 9 7 9 7
1 9 7 9 7
2 9 7 9 7
3 9 7 9 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this docstring is reused, and we want to keep it this way, I think we should add examples for all methods.

... [7, 5, 2, 7],
... [3, 5, 2, 2],
... [8, 0, 9, 0]],
... columns=list('ABCD'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use a much smaller dataset, probably 2 columns and 3 or 4 rows. Smaller numbers would make it easier for users to see what's being added, specially if we reuse same dataframe for cumsum...

Also, you can add a NaN, so you can show an example with skipna.

@codecov
Copy link

codecov bot commented Mar 14, 2018

Codecov Report

Merging #20336 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20336      +/-   ##
==========================================
+ Coverage    91.8%    91.8%   +<.01%     
==========================================
  Files         152      152              
  Lines       49201    49205       +4     
==========================================
+ Hits        45167    45171       +4     
  Misses       4034     4034
Flag Coverage Δ
#multiple 90.18% <100%> (ø) ⬆️
#single 41.85% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/generic.py 95.85% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad50b1d...1147a0d. Read the comment docs.

@arminv
Copy link
Contributor Author

arminv commented Mar 14, 2018

@datapythonista Thank you for the helpful comments :) Please let me know if I need to change anything else.

@jreback jreback added Docs Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 14, 2018
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better. Just one comment I think can be useful... When you do more changes, generate the html version of one of the pages, and imagine that you're a user that never used these methods and wants to know what are they about. Is the documentation useful? May be too long? May be a bit confusing having all the methods shown together? Or does it help?

If as a user you think this documentation is really helpful and presents things in the most efficient and clear way, the PR should be mostly all right. The point is that, it's usually a bad practice to follow comments in the review blindly, feel free to disagree, and make sure you're happy with your changes and the final result.


Parameters
----------
axis : %(axis_descr)s
axis : {0 or 'index', 1 or 'columns'}, default 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not technically right that default is 0, I think it's None, which I guess it's equivalent to 0.

Can you double check, and and change it if that's right. Something like {0 or 'index', 1 or 'columns'} or None, default None would probably be the most standard way if that's right. And a description about the axis would be useful (pointing out that None means index if that's the case).

If you check recent PRs there are some with a an axis parameter that you can check for reference.

Copy link
Contributor Author

@arminv arminv Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is right, cum_func (i.e. function corresponding to all cumulative methods) is defined with axis=None as default argument.

I also found this regarding the correct format of axis parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is technically None, in practice it is 0 for Series/DataFrame, so I would keep the documentation like this.
The technical reason is because for Panel it is 1, but Panel is deprecated and I think we should not care about them in the documentation.

will be NA
will be NA.
*args : default None
**kwargs : default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As *args and **kwargs are used in the standard Python way, no type or default value is needed, the user will understand. They can share a single line, simply:
*args, **kwargs

**axis**

axis=None : Iterates over rows and finds the cumulative value in each column.
If value is different from the previous one, updates it:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally simply say something like "By default, cumulative functions work on the index axis, meaning that row each row, they accumulate the values from the previous".

The second comment is only right for cummax. Now that you've got the other examples, you probably want to get rid of it.

A B
0 7 1
1 21 4
2 168 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis=None is the default, I'd simply use df.cummax()... and at in the explanation before that this is equivalent to axis=0 and axis='index'


>>> df = pd.DataFrame([[7, 1],
... [3, 4],
... [8, 0]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use smaller values. When illustrating cumprod it should be fast and easy for the user to understand that 2 * 3 = 6 and 6 * 1 = 6, while to know 7 * 3 * 8 they'll probably need a calculator to check if they understood it right.

Also, if you use a nan in one of the columns, you can reuse this example for the next section.

2 NaN NaN

**Series**

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with all the functions this is getting very long, I'd probably avoid having examples for Series, Or may be just one.

If you keep them, personally I'd have Series first, and start the example from the simplest to the more complext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to have separate string examples for each method, we can keep the examples for Series.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to have separate string examples for each method, we can keep the examples for Series.

+ 1

Another suggestion would be to start with Series to just illustrate the concept of "cumulative max", as this will make the examples a little bit easier, and show the effect of NaNs. And then show DataFrame, saying that by default the same happens for each column of the DataFrame, and optionally use axis=1 to take cumulative max for each row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I agree, it is definitely easier to see what is going on with NaNs if we use a Series example instead of DataFrame. I will change it this way.

A B
0 7.0 NaN
1 NaN 4.0
2 56.0 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipna=True is the default. So, if you add the NaN to the initial example, you can use df.cummax()... to illustrate skipna=True and go directly to show how it changes with skipna=False.

_cnum_doc)
axis_descr=axis_descr, accum_func_name=accum_func_name,
examples=examples)
@Appender(_cnum_doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all right like this, but may be it'd be simpler to leave this as it was, and have the examples in _cnum_doc, instead of in a separate variable. As they're the same for all methods, there is not much value in having them separate.

Another option would be to have a different string for each method example, in that case, something similar to this would make more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having separate string examples for each method makes everything clearer, especially when showing examples for use of skipna & axis. It also helps with keeping the docstring concise. For instance, now we can have a Series example for each method.

The disadvantage is user will only see examples for the method they’re checking, but I think this is ok because we are referencing all methods in the ‘See also’ section, which comes before 'Examples'.

In these PRs #20216 and #20217 examples for DataFrame.all and DataFrame.any are separate even though they are similar methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am also in favor of splitting up the examples.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are currently only adding the example to the cummax method, but your example section has examples for both cummin/cumsum/cumprod/cummax.
So we need to make a choice here.

I would personally only include in the docstring of cummax examples of cummax. But no need to throw them away, what you could do is split your existing _cummax_examples in multiple strings, with the examples separated for each method.


Parameters
----------
axis : %(axis_descr)s
axis : {0 or 'index', 1 or 'columns'}, default 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is technically None, in practice it is 0 for Series/DataFrame, so I would keep the documentation like this.
The technical reason is because for Panel it is 1, but Panel is deprecated and I think we should not care about them in the documentation.

See also
--------
pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
core.window.Expanding.%(accum_func_name)s : Similar functionality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to go back and forth, but for this one we want to keep the pandas. (but for the others below the change is perfect!)

2 NaN NaN

**Series**

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to have separate string examples for each method, we can keep the examples for Series.

+ 1

Another suggestion would be to start with Series to just illustrate the concept of "cumulative max", as this will make the examples a little bit easier, and show the effect of NaNs. And then show DataFrame, saying that by default the same happens for each column of the DataFrame, and optionally use axis=1 to take cumulative max for each row.

_cnum_doc)
axis_descr=axis_descr, accum_func_name=accum_func_name,
examples=examples)
@Appender(_cnum_doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am also in favor of splitting up the examples.

but ignores ``NaN`` values.
Series.%(outname)s : Return %(desc)s over Series axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
Copy link
Contributor Author

@arminv arminv Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista @jorisvandenbossche Is it a good idea to also add: DataFrame.sum as a relevant method in the 'See also' section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that might be a good idea, if you can automatically make it link in DataFrame.cumsum to DataFrame.sum, in DataFrame.cummin to DataFrame.min, etc (using the templating)

@pep8speaks
Copy link

pep8speaks commented Mar 16, 2018

Hello @arminv! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 17, 2018 at 16:19 Hours UTC

--------
**Series**

>>> s = pd.Series([2,np.nan,5,-1,0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8, spaces after ,

4 0.0
dtype: float64

skipna=True : Default value, ignores NaN values during operation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we mostly use prose for the intermittent text in examples. Perhaps, say

By default, NA values are ignored.

>>> s.cummin()
...

To include NA values in the operation, use ``skipna=False``

>>> s.cummin(skipna=False) 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, what you have is great and if the others are OK with it I'm +1 as well. Maybe wait to hear feedback from @datapythonista and @jorisvandenbossche before updating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think indeed your prose alternative reads a bit nicer.

1 3.0 NaN
2 1.0 0.0

skipna : Works in the same way as for Series.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably be removed.


skipna : Works in the same way as for Series.

axis=0 : Default value, equivalent to axis=None or axis='index'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the prose.

See also
--------
pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
but ignores ``NaN`` values.
Series.%(outname)s : Return %(desc)s over Series axis.
DataFrame.%(accum_func_name)s : Return the %(accum_func_name)s over
DataFrame axis.
Copy link
Contributor Author

@arminv arminv Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I used templating and %(accum_func_name)s to add DataFrame.prod to 'See also' section of DataFrame.cumprod, etc. All methods look good except DataFrame.prod where it reads: 'Return the prod over DataFrame axis' instead of 'Return the product over...'. Any suggestions on this? Should we even worry about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One relatively easy solution would be to change the desc param to _make_cum_function from "cumulative produce" to "product", because we can put the "cumulative" (which is the same for each of the 4 functions" in the template itself. Then you can use that name in the see also description.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you are doing really great work here!

skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
will be NA.
*args, **kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here an explanation for args/kwargs: "Additional arguments have no effect but might be accepted for compatibility with NumPy."

See also
--------
pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
but ignores ``NaN`` values.
Series.%(outname)s : Return %(desc)s over Series axis.
DataFrame.%(accum_func_name)s : Return the %(accum_func_name)s over
DataFrame axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One relatively easy solution would be to change the desc param to _make_cum_function from "cumulative produce" to "product", because we can put the "cumulative" (which is the same for each of the 4 functions" in the template itself. Then you can use that name in the see also description.

4 0.0
dtype: float64

skipna=True : Default value, ignores NaN values during operation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think indeed your prose alternative reads a bit nicer.

@arminv
Copy link
Contributor Author

arminv commented Mar 17, 2018

I also checked HTML rendering of the corresponding Series cumulative methods (pandas.Series.cummax ,etc.) and they appear to be ok. There is a duplication in their 'See also' section but again I couldn't think of a simple way to fix this through templating.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Small comment about the "see also"

See also
--------
pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
but ignores ``NaN`` values.
Series.%(outname)s : Return cumulative %(desc)s over Series axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one can be left out, as it will already be included in one of the 4 last ones.

Series.%(outname)s : Return cumulative %(desc)s over Series axis.
%(name2)s.%(accum_func_name)s : Return the %(desc)s over
%(name2)s axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also %(name2)s like below ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I implement these changes, we won't get a reference to any corresponding Series in 'See also' of DataFrame methods and vice versa. That's why I left them like this (at the expense of repeating one of them).

I agree that we don't really need them because we have examples of both Series & DataFrame in all docstrings and this should be informative enough.

Just out of curiosity, is there a simple way to determine whether a Series or DataFrame method is being accessed at any one time? For example, if we were generating the doc for Series.cummax, we would determine it is a Series method. Based on this, we know that we need to show cummax but for a DataFrame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that the docstring for both Series and DataFrame (of the same method) will be exactly the same (apart from some of the links here in see also). So linking to the same method but on the other object, is not that important I think, as the other one does not give you more information

Just out of curiosity, is there a simple way to determine whether a Series or DataFrame method is being accessed at any one time? For example, if we were generating the doc for Series.cummax, we would determine it is a Series method. Based on this, we know that we need to show cummax but for a DataFrame.

Yes, the cls that is passed to _make_cum_function is either the Series or DataFrame class. So you can pass cls.__name__ as a variable to the template, but I think this is already done as name2 ? Based on that you can know the other.

@jorisvandenbossche jorisvandenbossche merged commit 699a48b into pandas-dev:master Mar 17, 2018
@jorisvandenbossche
Copy link
Member

@arminv Thanks a lot for this PR!

@arminv
Copy link
Contributor Author

arminv commented Mar 17, 2018

Thank you guys for being so helpful. I look forward to contributing again.

@arminv arminv deleted the docstring_cummax branch March 17, 2018 19:17
nehiljain added a commit to nehiljain/pandas that referenced this pull request Mar 21, 2018
…ame_describe

* upstream/master: (158 commits)
  Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431)
  BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399)
  BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412)
  DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264)
  DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155)
  DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402)
  Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408)
  DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392)
  MAINT: Remove weird pd file
  DOC: update the Index.isin docstring (pandas-dev#20249)
  BUG: Handle all-NA blocks in concat (pandas-dev#20382)
  DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379)
  BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067)
  DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336)
  DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265)
  DOC: update the api.types.is_number docstring (pandas-dev#20196)
  Fix linter (pandas-dev#20389)
  DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142)
  DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181)
  DOC: update the window.Rolling.min docstring (pandas-dev#20263)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants