Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Improve pandas.Series.plot.kde docstring and kwargs rewording for whole file #20041

Merged
merged 2 commits into from
Mar 10, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 91 additions & 26 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2532,7 +2532,8 @@ def line(self, **kwds):
Parameters
----------
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
Expand All @@ -2556,7 +2557,8 @@ def bar(self, **kwds):
Parameters
----------
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
Expand All @@ -2571,7 +2573,8 @@ def barh(self, **kwds):
Parameters
----------
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
Expand All @@ -2586,7 +2589,8 @@ def box(self, **kwds):
Parameters
----------
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
Expand All @@ -2603,7 +2607,8 @@ def hist(self, bins=10, **kwds):
bins: integer, default 10
Number of histogram bins to be used
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
Expand All @@ -2613,26 +2618,74 @@ def hist(self, bins=10, **kwds):

def kde(self, bw_method=None, ind=None, **kwds):
"""
Kernel Density Estimate plot
Kernel Density Estimate plot using Gaussian kernels.
Copy link
Contributor

@jonas-schulze jonas-schulze Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short summary should start with an infinitive verb. I would change it to

Draw Kernel Density Estimate plot using Gaussian kernels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for plot functions it is fine to just state the plot type (the rules are to give direction, but sometimes there can be reason to deviate, so in this case the question is if "Draw .." makes it more informative)

Copy link
Contributor

@jonas-schulze jonas-schulze Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in that adding Draw doesn't add much information, but I would still add a prefix since most (all?) of the doc strings created in the Doc Sprint will be written with these rules in mind. Also, the doc string proposed in PR20113 for the hist function includes Draw as well.

On the other hand, Generate might be a better fit than Draw because if pandas isn't used from within a jupyter notebook there is nothing drawn immediately ..


In statistics, kernel density estimation (KDE) is a non-parametric way
to estimate the probability density function (PDF) of a random
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally include a reference to a wiki page (same ones that matplotlib uses?)

variable. This function uses Gaussian kernels and includes automatic
bandwith determination.

Parameters
----------
bw_method: str, scalar or callable, optional
The method used to calculate the estimator bandwidth. This can be
bw_method : str, scalar or callable, optional
The method used to calculate the estimator bandwidth. This can be
'scott', 'silverman', a scalar constant or a callable.
If None (default), 'scott' is used.
See :class:`scipy.stats.gaussian_kde` for more information.
ind : NumPy array or integer, optional
Evaluation points. If None (default), 1000 equally spaced points
are used. If `ind` is a NumPy array, the kde is evaluated at the
points passed. If `ind` is an integer, `ind` number of equally
spaced points are used.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Evaluation points for the estimated PDF. If None (default),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you spell out PDF

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both KDE and DPF are spelled out in the small summary above the parameters section. I wouldn't like to repeat that over here. @dukebody what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm I see it.

1000 equally spaced points are used. If `ind` is a NumPy array, the
kde is evaluated at the points passed. If `ind` is an integer,
`ind` number of equally spaced points are used.
kwds : optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

**kwds : optional

Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them

See also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have an additional reference:

:meth:`pandas.DataFrame.plot.kde` : Generate a KDE plot for a DataFrame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also -> See Also

--------
scipy.stats.gaussian_kde : Representation of a kernel-density
estimate using Gaussian kernels. This is the function used
internally to estimate the PDF.

Examples
--------
Given a Series of points randomly sampled from an unknown
distribution, estimate this distribution using KDE with automatic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather say estimate its distribution instead of this.

bandwidth determination and plot the results, evaluating them at
1000 equally spaced points (default):

.. plot::
:context: close-figs

>>> s = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5])
>>> ax = s.plot.kde()


An scalar fixed bandwidth can be specified. Using a too small bandwidth

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @dukebody! 👍 This is Shivam from the HK Chapter of the Pandas Doc Sprint.

Just had a minor suggestion for simplifying this paragraph:

        A scalar bandwidth can be specified. Using a small bandwidth value can
        lead to overfitting, while using a large bandwidth value can result 
        in underfitting:

Copy link
Contributor

@jonas-schulze jonas-schulze Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm Jonas from the Barcelona Pandas Doc Sprint.

I agree, @shivam6294 version sounds a little better. Anyways, you have to change An to A as scalar doesn't start with a vowel.

can lead to overfitting, while a too large bandwidth can result in
underfitting:

.. plot::
:context: close-figs

>>> ax = s.plot.kde(bw_method=0.3)

.. plot::
:context: close-figs

>>> ax = s.plot.kde(bw_method=3)

Finally, the `ind` parameter determines the evaluation points for the
plot of the estimated PDF:

.. plot::
:context: close-figs

>>> ax = s.plot.kde(ind=[1, 2, 3, 4, 5])
"""
return self(kind='kde', bw_method=bw_method, ind=ind, **kwds)

Expand All @@ -2645,7 +2698,8 @@ def area(self, **kwds):
Parameters
----------
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
Expand All @@ -2660,7 +2714,8 @@ def pie(self, **kwds):
Parameters
----------
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.Series.plot`.
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.

Returns
-------
Expand Down Expand Up @@ -2711,7 +2766,8 @@ def line(self, x=None, y=None, **kwds):
x, y : label or position, optional
Coordinates for each point.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2728,7 +2784,8 @@ def bar(self, x=None, y=None, **kwds):
x, y : label or position, optional
Coordinates for each point.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2745,7 +2802,8 @@ def barh(self, x=None, y=None, **kwds):
x, y : label or position, optional
Coordinates for each point.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2762,7 +2820,8 @@ def box(self, by=None, **kwds):
by : string or sequence
Column in the DataFrame to group by.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2781,7 +2840,8 @@ def hist(self, by=None, bins=10, **kwds):
bins: integer, default 10
Number of histogram bins to be used
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2806,7 +2866,8 @@ def kde(self, bw_method=None, ind=None, **kwds):
points passed. If `ind` is an integer, `ind` number of equally
spaced points are used.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2825,7 +2886,8 @@ def area(self, x=None, y=None, **kwds):
x, y : label or position, optional
Coordinates for each point.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2842,7 +2904,8 @@ def pie(self, y=None, **kwds):
y : label or position, optional
Column to plot.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2863,7 +2926,8 @@ def scatter(self, x, y, s=None, c=None, **kwds):
c : label or position, optional
Color of each point.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand All @@ -2888,7 +2952,8 @@ def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
gridsize : int, optional
Number of bins.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
Expand Down