Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate level keyword for dataframe and series aggregations #40869

Merged
merged 6 commits into from
Apr 13, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/user_guide/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,7 @@ Using the parameter ``level`` in the :meth:`~DataFrame.reindex` and
values across a level. For instance:

.. ipython:: python
:okwarning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would just remove this example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may instead show a groupby example of this (but then i would put it in the groupby section), you can leave a pointer if you'd like here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ok, I would keep it there, there are level examples above and below this snippet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But changed to groupby


midx = pd.MultiIndex(
levels=[["zero", "one"], ["x", "y"]], codes=[[1, 1, 0, 0], [1, 0, 1, 0]]
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,7 @@ even if some categories are not present in the data:
``DataFrame`` methods like :meth:`DataFrame.sum` also show "unused" categories.

.. ipython:: python
:okwarning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this so its a regular group op

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


columns = pd.Categorical(
["One", "One", "Two"], categories=["One", "Two", "Three"], ordered=True
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@ directly. Additionally, the resulting index will be named according to the
chosen level:

.. ipython:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

point the example from advanced.rst here (and make this a proper groupby example)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, since above is the groupby case

:okwarning:

s.sum(level="second")

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.15.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ Other enhancements:
- ``Series.all`` and ``Series.any`` now support the ``level`` and ``skipna`` parameters (:issue:`8302`):

.. ipython:: python
:okwarning:

s = pd.Series([False, True, False], index=[0, 0, 1])
s.any(level=0)
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -563,6 +563,7 @@ Deprecations
- Deprecated allowing partial failure in :meth:`Series.transform` and :meth:`DataFrame.transform` when ``func`` is list-like or dict-like and raises anything but ``TypeError``; ``func`` raising anything but a ``TypeError`` will raise in a future version (:issue:`40211`)
- Deprecated support for ``np.ma.mrecords.MaskedRecords`` in the :class:`DataFrame` constructor, pass ``{name: data[name] for name in data.dtype.names}`` instead (:issue:`40363`)
- Deprecated the use of ``**kwargs`` in :class:`.ExcelWriter`; use the keyword argument ``engine_kwargs`` instead (:issue:`40430`)
- Deprecated the ``level`` keyword for :class:`DataFrame` and :class:`Series` aggregations; use groupby instead (:issue:`39983`)

.. ---------------------------------------------------------------------------

Expand Down
7 changes: 7 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -9479,6 +9479,13 @@ def count(
"""
axis = self._get_axis_number(axis)
if level is not None:
warnings.warn(
"Using the level keyword in DataFrame and Series aggregations is "
"deprecated and will be removed in a future version. Use groupby "
jreback marked this conversation as resolved.
Show resolved Hide resolved
"instead.",
FutureWarning,
stacklevel=2,
)
return self._count_level(level, axis=axis, numeric_only=numeric_only)

if numeric_only:
Expand Down
35 changes: 35 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -10260,6 +10260,13 @@ def _logical_func(
):
nv.validate_logical_func((), kwargs, fname=name)
if level is not None:
warnings.warn(
"Using the level keyword in DataFrame and Series aggregations is "
jreback marked this conversation as resolved.
Show resolved Hide resolved
"deprecated and will be removed in a future version. Use groupby "
"instead.",
FutureWarning,
stacklevel=4,
)
if bool_only is not None:
raise NotImplementedError(
"Option bool_only is not implemented with option level."
Expand Down Expand Up @@ -10351,6 +10358,13 @@ def _stat_function_ddof(
if axis is None:
axis = self._stat_axis_number
if level is not None:
warnings.warn(
"Using the level keyword in DataFrame and Series aggregations is "
"deprecated and will be removed in a future version. Use groupby "
"instead.",
FutureWarning,
stacklevel=4,
)
return self._agg_by_level(
name, axis=axis, level=level, skipna=skipna, ddof=ddof
)
Expand Down Expand Up @@ -10399,6 +10413,13 @@ def _stat_function(
if axis is None:
axis = self._stat_axis_number
if level is not None:
warnings.warn(
"Using the level keyword in DataFrame and Series aggregations is "
"deprecated and will be removed in a future version. Use groupby "
"instead.",
FutureWarning,
stacklevel=4,
)
return self._agg_by_level(
name, axis=axis, level=level, skipna=skipna, numeric_only=numeric_only
)
Expand Down Expand Up @@ -10461,6 +10482,13 @@ def _min_count_stat_function(
if axis is None:
axis = self._stat_axis_number
if level is not None:
warnings.warn(
"Using the level keyword in DataFrame and Series aggregations is "
"deprecated and will be removed in a future version. Use groupby "
"instead.",
FutureWarning,
stacklevel=4,
)
return self._agg_by_level(
name,
axis=axis,
Expand Down Expand Up @@ -10538,6 +10566,13 @@ def mad(self, axis=None, skipna=None, level=None):
if axis is None:
axis = self._stat_axis_number
if level is not None:
warnings.warn(
"Using the level keyword in DataFrame and Series aggregations is "
"deprecated and will be removed in a future version. Use groupby "
"instead.",
FutureWarning,
stacklevel=3,
)
return self._agg_by_level("mad", axis=axis, level=level, skipna=skipna)

data = self._get_numeric_data()
Expand Down
12 changes: 10 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1894,8 +1894,16 @@ def count(self, level=None):
"""
if level is None:
return notna(self._values).sum()
elif not isinstance(self.index, MultiIndex):
raise ValueError("Series.count level is only valid with a MultiIndex")
else:
warnings.warn(
"Using the level keyword in DataFrame and Series aggregations is "
"deprecated and will be removed in a future version. Use groupby "
"instead.",
FutureWarning,
jreback marked this conversation as resolved.
Show resolved Hide resolved
stacklevel=2,
)
if not isinstance(self.index, MultiIndex):
raise ValueError("Series.count level is only valid with a MultiIndex")

index = self.index
assert isinstance(index, MultiIndex) # for mypy
Expand Down
36 changes: 24 additions & 12 deletions pandas/tests/frame/methods/test_count.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,22 @@ def test_count_multiindex(self, multiindex_dataframe_random_data):
frame = frame.copy()
frame.index.names = ["a", "b"]

result = frame.count(level="b")
expected = frame.count(level=1)
with tm.assert_produces_warning(FutureWarning):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move these to a new test_count_with_level_deprecated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, done

result = frame.count(level="b")
with tm.assert_produces_warning(FutureWarning):
expected = frame.count(level=1)
tm.assert_frame_equal(result, expected, check_names=False)

result = frame.count(level="a")
expected = frame.count(level=0)
with tm.assert_produces_warning(FutureWarning):
result = frame.count(level="a")
with tm.assert_produces_warning(FutureWarning):
expected = frame.count(level=0)
tm.assert_frame_equal(result, expected, check_names=False)

msg = "Level x not found"
with pytest.raises(KeyError, match=msg):
frame.count(level="x")
with tm.assert_produces_warning(FutureWarning):
frame.count(level="x")

def test_count(self):
# corner case
Expand Down Expand Up @@ -64,12 +69,14 @@ def test_count_level_corner(self, multiindex_dataframe_random_data):
frame = multiindex_dataframe_random_data

ser = frame["A"][:0]
result = ser.count(level=0)
with tm.assert_produces_warning(FutureWarning):
result = ser.count(level=0)
expected = Series(0, index=ser.index.levels[0], name="A")
tm.assert_series_equal(result, expected)

df = frame[:0]
result = df.count(level=0)
with tm.assert_produces_warning(FutureWarning):
result = df.count(level=0)
expected = (
DataFrame(
index=ser.index.levels[0].set_names(["first"]), columns=df.columns
Expand All @@ -90,7 +97,8 @@ def test_count_index_with_nan(self):
)

# count on row labels
res = df.set_index(["Person", "Single"]).count(level="Person")
with tm.assert_produces_warning(FutureWarning):
res = df.set_index(["Person", "Single"]).count(level="Person")
expected = DataFrame(
index=Index(["John", "Myla"], name="Person"),
columns=Index(["Age"]),
Expand All @@ -99,7 +107,8 @@ def test_count_index_with_nan(self):
tm.assert_frame_equal(res, expected)

# count on column labels
res = df.set_index(["Person", "Single"]).T.count(level="Person", axis=1)
with tm.assert_produces_warning(FutureWarning):
res = df.set_index(["Person", "Single"]).T.count(level="Person", axis=1)
expected = DataFrame(
columns=Index(["John", "Myla"], name="Person"),
index=Index(["Age"]),
Expand All @@ -118,7 +127,8 @@ def test_count_level(
def _check_counts(frame, axis=0):
index = frame._get_axis(axis)
for i in range(index.nlevels):
result = frame.count(axis=axis, level=i)
with tm.assert_produces_warning(FutureWarning):
result = frame.count(axis=axis, level=i)
expected = frame.groupby(axis=axis, level=i).count()
expected = expected.reindex_like(result).astype("i8")
tm.assert_frame_equal(result, expected)
Expand All @@ -136,8 +146,10 @@ def _check_counts(frame, axis=0):
# can't call with level on regular DataFrame
df = tm.makeTimeDataFrame()
with pytest.raises(TypeError, match="hierarchical"):
df.count(level=0)
with tm.assert_produces_warning(FutureWarning):
df.count(level=0)

frame["D"] = "foo"
result = frame.count(level=0, numeric_only=True)
with tm.assert_produces_warning(FutureWarning):
result = frame.count(level=0, numeric_only=True)
tm.assert_index_equal(result.columns, Index(list("ABC"), name="exp"))
43 changes: 38 additions & 5 deletions pandas/tests/frame/test_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,7 +580,8 @@ def test_kurt(self):
df = DataFrame(np.random.randn(6, 3), index=index)

kurt = df.kurt()
kurt2 = df.kurt(level=0).xs("bar")
with tm.assert_produces_warning(FutureWarning):
kurt2 = df.kurt(level=0).xs("bar")
tm.assert_series_equal(kurt, kurt2, check_names=False)
assert kurt.name is None
assert kurt2.name == "bar"
Expand Down Expand Up @@ -1240,7 +1241,8 @@ def test_any_all_level_axis_none_raises(self, method):
)
xpr = "Must specify 'axis' when aggregating by level."
with pytest.raises(ValueError, match=xpr):
getattr(df, method)(axis=None, level="out")
with tm.assert_produces_warning(FutureWarning):
getattr(df, method)(axis=None, level="out")

# ---------------------------------------------------------------------
# Unsorted
Expand Down Expand Up @@ -1365,11 +1367,13 @@ def test_frame_any_all_with_level(self):
],
)

result = df.any(level=0)
with tm.assert_produces_warning(FutureWarning):
result = df.any(level=0)
ex = DataFrame({"data": [False, True]}, index=["one", "two"])
tm.assert_frame_equal(result, ex)

result = df.all(level=0)
with tm.assert_produces_warning(FutureWarning):
result = df.all(level=0)
ex = DataFrame({"data": [False, False]}, index=["one", "two"])
tm.assert_frame_equal(result, ex)

Expand All @@ -1390,6 +1394,34 @@ def test_frame_any_with_timedelta(self):
expected = Series(data=[False, True])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"func",
[
"any",
"all",
"count",
"sum",
"prod",
"max",
"min",
"mean",
"median",
"skew",
"kurt",
"sem",
"var",
"std",
"mad",
],
)
def test_reductions_deprecation_level_argument(self, frame_or_series, func):
# GH#39983
obj = frame_or_series(
[1, 2, 3], index=MultiIndex.from_arrays([[1, 2, 3], [4, 5, 6]])
)
with tm.assert_produces_warning(FutureWarning):
getattr(obj, func)(level=0)


class TestNuisanceColumns:
@pytest.mark.parametrize("method", ["any", "all"])
Expand Down Expand Up @@ -1556,7 +1588,8 @@ def test_groupy_regular_arithmetic_equivalent(meth):
)
expected = df.copy()

result = getattr(df, meth)(level=0)
with tm.assert_produces_warning(FutureWarning):
result = getattr(df, meth)(level=0)
tm.assert_frame_equal(result, expected)

result = getattr(df.groupby(level=0), meth)(numeric_only=False)
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/frame/test_subclass.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,7 +599,8 @@ def test_subclassed_count(self):
list(zip(list("WWXX"), list("yzyz"))), names=["www", "yyy"]
),
)
result = df.count(level=1)
with tm.assert_produces_warning(FutureWarning):
result = df.count(level=1)
assert isinstance(result, tm.SubclassedDataFrame)

df = tm.SubclassedDataFrame()
Expand Down
1 change: 0 additions & 1 deletion pandas/tests/generic/test_finalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@
pytest.param(
(pd.Series, ([0],), operator.methodcaller("to_frame")), marks=pytest.mark.xfail
),
(pd.Series, (0, mi), operator.methodcaller("count", level="A")),
(pd.Series, ([0, 0],), operator.methodcaller("drop_duplicates")),
(pd.Series, ([0, 0],), operator.methodcaller("duplicated")),
(pd.Series, ([0, 0],), operator.methodcaller("round")),
Expand Down
6 changes: 4 additions & 2 deletions pandas/tests/groupby/test_allowlist.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,14 +208,16 @@ def test_regression_allowlist_methods(raw_frame, op, level, axis, skipna, sort):
if op in AGG_FUNCTIONS_WITH_SKIPNA:
grouped = frame.groupby(level=level, axis=axis, sort=sort)
result = getattr(grouped, op)(skipna=skipna)
expected = getattr(frame, op)(level=level, axis=axis, skipna=skipna)
with tm.assert_produces_warning(FutureWarning):
expected = getattr(frame, op)(level=level, axis=axis, skipna=skipna)
if sort:
expected = expected.sort_index(axis=axis, level=level)
tm.assert_frame_equal(result, expected)
else:
grouped = frame.groupby(level=level, axis=axis, sort=sort)
result = getattr(grouped, op)()
expected = getattr(frame, op)(level=level, axis=axis)
with tm.assert_produces_warning(FutureWarning):
expected = getattr(frame, op)(level=level, axis=axis)
if sort:
expected = expected.sort_index(axis=axis, level=level)
tm.assert_frame_equal(result, expected)
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -980,7 +980,8 @@ def test_groupby_complex():
result = a.groupby(level=0).sum()
tm.assert_series_equal(result, expected)

result = a.sum(level=0)
with tm.assert_produces_warning(FutureWarning):
result = a.sum(level=0)
tm.assert_series_equal(result, expected)


Expand Down