Skip to content

BUG: Regression in DataFrame.__setitem__ with DataFrame, Categorical level of MultiIndex in pandas 3.x #62518

@TomAugspurger

Description

@TomAugspurger

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

meta = pd.DataFrame(columns=pd.MultiIndex.from_arrays([['a', 'a', 'z', 'z'], pd.Categorical([1, 2, 1, 2])]), dtype=object)
meta['z'] = meta['z'].astype('int64')

Issue Description

The above script runs fine with pandas==2.3.3

❯ uv run --python=3.13 --with pandas python debug.py  # no errors

But with pandas nightly, it errors

❯ uv run --python=3.13 --extra-index-url=https://pypi.anaconda.org/scientific-python-nightly-wheels/simple --prerelease=allow --with "pandas>=3.0.0.dev0" python debug.py

with

Traceback (most recent call last):
  File "/Users/toaugspurger/gh/debug.py", line 4, in <module>
    meta['z'] = meta['z'].astype('int64')
    ~~~~^^^^^
  File "/Users/toaugspurger/.cache/uv/archive-v0/UTaM244H3ORq6tmIZVRZk/lib/python3.13/site-packages/pandas/core/frame.py", line 4322, in __setitem__
    self._set_item_frame_value(key, value)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/Users/toaugspurger/.cache/uv/archive-v0/UTaM244H3ORq6tmIZVRZk/lib/python3.13/site-packages/pandas/core/frame.py", line 4457, in _set_item_frame_value
    and not cols_droplevel.any()
            ~~~~~~~~~~~~~~~~~~^^
  File "/Users/toaugspurger/.cache/uv/archive-v0/UTaM244H3ORq6tmIZVRZk/lib/python3.13/site-packages/pandas/core/indexes/base.py", line 7285, in any
    return vals._reduce("any")
           ~~~~~~~~~~~~^^^^^^^
  File "/Users/toaugspurger/.cache/uv/archive-v0/UTaM244H3ORq6tmIZVRZk/lib/python3.13/site-packages/pandas/core/arrays/categorical.py", line 2425, in _reduce
    result = super()._reduce(name, skipna=skipna, keepdims=keepdims, **kwargs)
  File "/Users/toaugspurger/.cache/uv/archive-v0/UTaM244H3ORq6tmIZVRZk/lib/python3.13/site-packages/pandas/core/arrays/base.py", line 2225, in _reduce
    raise TypeError(
    ...<2 lines>...
    )
TypeError: 'Categorical' with dtype category does not support operation 'any'

Expected Behavior

Ehh I don't know, this is some odd code. In https://github.com/dask/dask/blob/3fb02e610cf2baf3ee570888e5e49159ee60c0e5/dask/dataframe/dask_expr/_reductions.py#L694-L701, dask is attempting to create an empty dataframe with certain dtypes. I can try to update that to workaround this if needed.

Installed Versions

3.0.0.dev0+2463.gd8159471b1

Metadata

Metadata

Labels

BugMultiIndexRegressionFunctionality that used to work in a prior pandas version

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions