Skip to content
33 changes: 18 additions & 15 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8348,7 +8348,7 @@ def abs(self):

def describe(self, percentiles=None, include=None, exclude=None):
"""
Generates descriptive statistics that summarize the central tendency,
Generate descriptive statistics that summarize the central tendency,
dispersion and shape of a dataset's distribution, excluding
``NaN`` values.

Expand Down Expand Up @@ -8392,7 +8392,18 @@ def describe(self, percentiles=None, include=None, exclude=None):

Returns
-------
summary: Series/DataFrame of summary statistics
Series or DataFrame
Summary statistics of the Series or Dataframe provided.

See Also
--------
DataFrame.count: Count number of non-NA/null observations.
DataFrame.max: Maximum of the values in the object.
DataFrame.min: Minimum of the values in the object.
DataFrame.mean: Mean of the values.
DataFrame.std: Standard deviation of the obersvations.
DataFrame.select_dtypes: Subset of a DataFrame including/excluding
columns based on their dtype.

Notes
-----
Expand Down Expand Up @@ -8436,6 +8447,7 @@ def describe(self, percentiles=None, include=None, exclude=None):
50% 2.0
75% 2.5
max 3.0
dtype: float64

Describing a categorical ``Series``.

Expand Down Expand Up @@ -8466,9 +8478,9 @@ def describe(self, percentiles=None, include=None, exclude=None):
Describing a ``DataFrame``. By default only numeric fields
are returned.

>>> df = pd.DataFrame({ 'object': ['a', 'b', 'c'],
... 'numeric': [1, 2, 3],
... 'categorical': pd.Categorical(['d','e','f'])
>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
... 'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe()
numeric
Expand Down Expand Up @@ -8554,7 +8566,7 @@ def describe(self, percentiles=None, include=None, exclude=None):
Excluding object columns from a ``DataFrame`` description.

>>> df.describe(exclude=[np.object])
categorical numeric
categorical numeric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the validation script, I occasionally get a failure

Line 210, in pandas.DataFrame.describe
Failed example:
    df.describe(exclude=[np.number])
Expected:
           categorical object
    count            3      3
    unique           3      3
    top              f      c
    freq             1      1
Got:
           categorical object
    count            3      3
    unique           3      3
    top              f      a
    freq             1      1

Did you see this at all? This likely is an issue in the method itself, and not the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i do see this error but its flaky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, it's probably some kind of non-stable sorting inside the describe method, and nothing wrong with the docstring. It may be best to just include the docstring, and open a new issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strange thing is that just doing

pd.DataFrame({"A": pd.Categorical(['d', 'e', 'f']), "B": ['a', 'b', 'c'], 'C': [1, 2, 3]}).describe(exclude=['number'])

seems deterministic.

count 3 3.0
unique 3 NaN
top f NaN
Expand All @@ -8566,15 +8578,6 @@ def describe(self, percentiles=None, include=None, exclude=None):
50% NaN 2.0
75% NaN 2.5
max NaN 3.0

See Also
--------
DataFrame.count
DataFrame.max
DataFrame.min
DataFrame.mean
DataFrame.std
DataFrame.select_dtypes
"""
if self.ndim >= 3:
msg = "describe is not implemented on Panel objects."
Expand Down