Skip to content

Commit

Permalink
Backport PR #55620 on branch 2.1.x (BUG: Groupby not keeping object d…
Browse files Browse the repository at this point in the history
…type when infer string is set) (#55629)

Backport PR #55620: BUG: Groupby not keeping object dtype when infer string is set

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
  • Loading branch information
meeseeksmachine and phofl committed Oct 22, 2023
1 parent 3c1fe5c commit c26935c
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 4 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Fixed regressions

Bug fixes
~~~~~~~~~
- Fixed bug in :class:`.DataFrameGroupBy` reductions not preserving object dtype when ``infer_string`` is set (:issue:`55620`)
- Fixed bug in :meth:`Categorical.equals` if other has arrow backed string dtype (:issue:`55364`)
- Fixed bug in :meth:`DataFrame.__setitem__` not inferring string dtype for zero-dimensional array with ``infer_string=True`` (:issue:`55366`)
- Fixed bug in :meth:`DataFrame.idxmin` and :meth:`DataFrame.idxmax` raising for arrow dtypes (:issue:`55368`)
Expand Down
3 changes: 1 addition & 2 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1855,8 +1855,7 @@ def _agg_py_fallback(
ser = Series(values, copy=False)
else:
# We only get here with values.dtype == object
# TODO: special case not needed with ArrayManager
df = DataFrame(values.T)
df = DataFrame(values.T, dtype=values.dtype)
# bc we split object blocks in grouped_reduce, we have only 1 col
# otherwise we'd have to worry about block-splitting GH#39329
assert df.shape[1] == 1
Expand Down
16 changes: 14 additions & 2 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import numpy as np
import pytest

from pandas.compat import pa_version_under7p0
from pandas.errors import (
PerformanceWarning,
SpecificationError,
Expand Down Expand Up @@ -2513,13 +2514,24 @@ def test_groupby_column_index_name_lost(func):
tm.assert_index_equal(result, expected)


def test_groupby_duplicate_columns():
@pytest.mark.parametrize(
"infer_string",
[
False,
pytest.param(
True,
marks=pytest.mark.skipif(pa_version_under7p0, reason="arrow not installed"),
),
],
)
def test_groupby_duplicate_columns(infer_string):
# GH: 31735
df = DataFrame(
{"A": ["f", "e", "g", "h"], "B": ["a", "b", "c", "d"], "C": [1, 2, 3, 4]}
).astype(object)
df.columns = ["A", "B", "B"]
result = df.groupby([0, 0, 0, 0]).min()
with pd.option_context("future.infer_string", infer_string):
result = df.groupby([0, 0, 0, 0]).min()
expected = DataFrame(
[["e", "a", 1]], index=np.array([0]), columns=["A", "B", "B"], dtype=object
)
Expand Down

0 comments on commit c26935c

Please sign in to comment.