Skip to content

Commit

Permalink
Backport PR #51895 on branch 2.0.x (BUG: Fix getitem dtype preservati…
Browse files Browse the repository at this point in the history
…on with multiindexes) (#53121)

* BUG: Fix getitem dtype preservation with multiindexes (#51895)

* BUG/TST fix dtype preservation with multindex

* lint

* Update pandas/tests/indexing/multiindex/test_multiindex.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

* cleanups

* switch to iloc, reindex fails in some cases

* suggestions from code review

* address code review comments

Co-Authored-By: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

---------

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

(cherry picked from commit 194b6bb)

* Add whatsnew

---------

Co-authored-by: Matt Richards <45483497+m-richards@users.noreply.github.com>
  • Loading branch information
phofl and m-richards committed May 8, 2023
1 parent 291acfb commit 5962f0e
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 12 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Bug fixes
- Bug in :meth:`DataFrame.convert_dtypes` ignores ``convert_*`` keywords when set to False ``dtype_backend="pyarrow"`` (:issue:`52872`)
- Bug in :meth:`Series.describe` treating pyarrow-backed timestamps and timedeltas as categorical data (:issue:`53001`)
- Bug in :meth:`pd.array` raising for ``NumPy`` array and ``pa.large_string`` or ``pa.large_binary`` (:issue:`52590`)
- Bug in :meth:`DataFrame.__getitem__` not preserving dtypes for :class:`MultiIndex` partial keys (:issue:`51895`)
-

.. ---------------------------------------------------------------------------
Expand Down
14 changes: 2 additions & 12 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3816,18 +3816,8 @@ def _getitem_multilevel(self, key):
if isinstance(loc, (slice, np.ndarray)):
new_columns = self.columns[loc]
result_columns = maybe_droplevels(new_columns, key)
if self._is_mixed_type:
result = self.reindex(columns=new_columns)
result.columns = result_columns
else:
new_values = self._values[:, loc]
result = self._constructor(
new_values, index=self.index, columns=result_columns, copy=False
)
if using_copy_on_write() and isinstance(loc, slice):
result._mgr.add_references(self._mgr) # type: ignore[arg-type]

result = result.__finalize__(self)
result = self.iloc[:, loc]
result.columns = result_columns

# If there is only one column being returned, and its name is
# either an empty string, or a tuple with an empty string as its
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/indexing/multiindex/test_multiindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@

import pandas as pd
from pandas import (
CategoricalDtype,
DataFrame,
Index,
MultiIndex,
Series,
)
import pandas._testing as tm
from pandas.core.arrays.boolean import BooleanDtype


class TestMultiIndexBasic:
Expand Down Expand Up @@ -206,3 +208,21 @@ def test_multiindex_with_na_missing_key(self):
)
with pytest.raises(KeyError, match="missing_key"):
df[[("missing_key",)]]

def test_multiindex_dtype_preservation(self):
# GH51261
columns = MultiIndex.from_tuples([("A", "B")], names=["lvl1", "lvl2"])
df = DataFrame(["value"], columns=columns).astype("category")
df_no_multiindex = df["A"]
assert isinstance(df_no_multiindex["B"].dtype, CategoricalDtype)

# geopandas 1763 analogue
df = DataFrame(
[[1, 0], [0, 1]],
columns=[
["foo", "foo"],
["location", "location"],
["x", "y"],
],
).assign(bools=Series([True, False], dtype="boolean"))
assert isinstance(df["bools"].dtype, BooleanDtype)

0 comments on commit 5962f0e

Please sign in to comment.