Skip to content

Commit

Permalink
DOC: Improve pandas.DataFrame.plot.density docstring
Browse files Browse the repository at this point in the history
This docstring is quite similar to the one of
[PR20041](pandas-dev#20041).
Unfortunately, I was not able to compute a kernel estimate of a
two-dimensional random variable. Hence, the example is more of an
analysis of some independent data series.
  • Loading branch information
jonas-schulze committed Mar 10, 2018
1 parent 4131149 commit 8e3e3ac
Showing 1 changed file with 60 additions and 9 deletions.
69 changes: 60 additions & 9 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2791,26 +2791,77 @@ def hist(self, by=None, bins=10, **kwds):

def kde(self, bw_method=None, ind=None, **kwds):
"""
Kernel Density Estimate plot
Generate Kernel Density Estimate plot using Gaussian kernels.
In statistics, kernel density estimation (KDE) is a non-parametric way
to estimate the probability density function (PDF) of a random
variable. This function uses Gaussian kernels and includes automatic
bandwith determination.
Parameters
----------
bw_method: str, scalar or callable, optional
The method used to calculate the estimator bandwidth. This can be
bw_method : str, scalar or callable, optional
The method used to calculate the estimator bandwidth. This can be
'scott', 'silverman', a scalar constant or a callable.
If None (default), 'scott' is used.
See :class:`scipy.stats.gaussian_kde` for more information.
ind : NumPy array or integer, optional
Evaluation points. If None (default), 1000 equally spaced points
are used. If `ind` is a NumPy array, the kde is evaluated at the
points passed. If `ind` is an integer, `ind` number of equally
spaced points are used.
`**kwds` : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Evaluation points for the estimated PDF. If None (default),
1000 equally spaced points are used. If `ind` is a NumPy array, the
kde is evaluated at the points passed. If `ind` is an integer,
`ind` number of equally spaced points are used.
**kwds : optional
Additional keyword arguments are documented in
:meth:`pandas.Series.plot`.
Returns
-------
axes : matplotlib.AxesSubplot or np.array of them
See also
--------
scipy.stats.gaussian_kde : Representation of a kernel-density
estimate using Gaussian kernels. This is the function used
internally to estimate the PDF.
:meth:`pandas.Series.plot.kde` : Generate a KDE plot for a Series
Examples
--------
Given several Series of points randomly sampled from unknown
distributions, estimate their distribution using KDE with automatic
bandwidth determination and plot the results, evaluating them at
1000 equally spaced points (default):
.. plot::
:context: close-figs
>>> df = pd.DataFrame({
... 'x': [1, 2, 2.5, 3, 3.5, 4, 5],
... 'y': [4, 4, 4.5, 5, 5.5, 6, 6],
... })
>>> ax = df.plot.kde()
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to overfitting, while using a large bandwidth value can result
in underfitting:
.. plot::
:context: close-figs
>>> ax = df.plot.kde(bw_method=0.3)
.. plot::
:context: close-figs
>>> ax = df.plot.kde(bw_method=3)
Finally, the `ind` parameter determines the evaluation points for the
plot of the estimated PDF:
.. plot::
:context: close-figs
>>> ax = df.plot.kde(ind=[1, 2, 3, 4, 5, 6])
"""
return self(kind='kde', bw_method=bw_method, ind=ind, **kwds)

Expand Down

0 comments on commit 8e3e3ac

Please sign in to comment.