Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Updated kurt docstring (for pandas sprint) #19999

Merged
merged 6 commits into from
Mar 7, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions pandas/core/window.py
Original file line number Diff line number Diff line change
Expand Up @@ -899,11 +899,45 @@ def skew(self, **kwargs):
return self._apply('roll_skew', 'skew',
check_minp=_require_min_periods(3), **kwargs)

_shared_docs['kurt'] = """Unbiased %(name)s kurtosis"""
_shared_docs['kurt'] = dedent("""Calculate unbiased %(name)s kurtosis.

def kurt(self, **kwargs):
This function uses Fisher's definition of kurtosis (kurtosis of normal
== 0.0) without bias.

Returns
-------
same type as input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a somehow arbitrary comment, but to me it'd look better something like:
Series or DataFrame (same as the input) : some description

Or if the %(name)s above Series or DataFrame depending on in which method is being used? Then we can use it also here.


See Also
--------
scipy.stats.kurtosis
pandas.DataFrame.kurtosis
pandas.Series.kurtosis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could make sense to add .skew too as statistically they are related, and usually appear together in literature. And technically speaking the mean and the variance could also belong here, as they are the first and second moments of a probability density (while skewness and kurtosis are the third and forth).

Also, it's probably worth to add pandas.Series.rolling. And not sure if in the "See Also" or somewhere else, but I think it would be very useful for the users to have a link from these functions to https://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I was thinking the same thing about pandas.Series.rolling and pandas.Series.expanding. Right now there is the _doc_template contained within the module that gives a generic See Also section, but my first attempt to override that didn't work out.

I'll look a little closer to see if there's a scalable way to have all of the functions here reference pandas.Series.rolling and the Series / DataFrame methods at a minimum, while allowing for additional "see also" to be specified per-method


Notes
-----
A minimum of 4 periods is required for the rolling calculation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, did you find that the documentation about the "Notes" section made sense for this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this from familiarity with the NumPy standard, not necessarily the docs that you built (although they were very helpful on many other aspects - kudos)


Examples
--------
>>> arr = [1, 2, 3, 4, 5]
>>> import scipy.stats
>>> scipy.stats.kurtosis(arr, bias=False)
-1.2000000000000004

>>> df = pd.DataFrame(arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a Series is more natural? (as you don't have a column name)

>>> df.rolling(5).kurt()
0
0 NaN
1 NaN
2 NaN
3 NaN
4 -1.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's obvious enough, but may be having 6 values instead of 5, would give more the idea that after the 4th NaN all the values will be filled?

""")

def kurt(self):
return self._apply('roll_kurt', 'kurt',
check_minp=_require_min_periods(4), **kwargs)
check_minp=_require_min_periods(4))

_shared_docs['quantile'] = dedent("""
%(name)s quantile
Expand Down Expand Up @@ -1221,7 +1255,6 @@ def skew(self, **kwargs):
return super(Rolling, self).skew(**kwargs)

@Substitution(name='rolling')
@Appender(_doc_template)
@Appender(_shared_docs['kurt'])
def kurt(self, **kwargs):
return super(Rolling, self).kurt(**kwargs)
Expand Down Expand Up @@ -1461,7 +1494,6 @@ def skew(self, **kwargs):
return super(Expanding, self).skew(**kwargs)

@Substitution(name='expanding')
@Appender(_doc_template)
@Appender(_shared_docs['kurt'])
def kurt(self, **kwargs):
return super(Expanding, self).kurt(**kwargs)
Expand Down