BUG: Assigning Categorical to parts of column hangs indefinitely #52927

MarcoGorelli · 2023-04-26T11:01:51Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df = pd.DataFrame({"a": [1, 1, 1, 1, 1], "b": ["a", "a", "a", "a", "a"]})
df.loc[1:2, "a"] = pd.Categorical([2, 2], categories=[1, 2])

Issue Description

It hangs forever. Or at least, longer than I'm patient enough to wait

Expected Behavior

Based on

pandas/doc/source/user_guide/categorical.rst

Line 779 in 272b735

    
           Assigning a ``Categorical`` to parts of a column of other types will use the values:

In [8]: df
Out[8]: 
   a  b
0  1  a
1  2  a
2  2  a
3  1  a
4  1  a

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.8.16.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.33
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : 2023.1.0
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.7
None

The text was updated successfully, but these errors were encountered:

phofl · 2023-04-26T11:59:59Z

I am getting a RecursionError at least

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1047, in setitem
    casted = np_can_hold_element(values.dtype, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/cast.py", line 1739, in np_can_hold_element
    raise LossySetitemError
pandas.errors.LossySetitemError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2023.1/scratches/scratch.py", line 473, in <module>
    df.loc[1:2, "a"] = pd.Categorical([2, 2], categories=[1, 2])
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 834, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 1813, in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 1857, in _setitem_with_indexer_split_path
    self._setitem_single_column(ilocs[0], value, pi)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 2012, in _setitem_single_column
    self.obj._mgr.column_setitem(loc, plane_indexer, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/managers.py", line 1378, in column_setitem
    new_mgr = col_mgr.setitem((idx,), value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/managers.py", line 399, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/managers.py", line 357, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1051, in setitem
    return nb.setitem(indexer, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1051, in setitem
    return nb.setitem(indexer, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1051, in setitem
    return nb.setitem(indexer, value)
  [Previous line repeated 980 more times]
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1050, in setitem
    nb = self.coerce_to_target_dtype(value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 425, in coerce_to_target_dtype
    return self.astype(new_dtype, copy=False)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 513, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/astype.py", line 239, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/astype.py", line 174, in astype_array
    if is_dtype_equal(values.dtype, dtype):
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/common.py", line 609, in is_dtype_equal
    source = _get_dtype(source)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/common.py", line 1417, in _get_dtype
    if isinstance(arr_or_dtype, np.dtype):
RecursionError: maximum recursion depth exceeded while calling a Python object

jbrockmendel · 2023-04-27T21:05:27Z

Haven't looked at the underlying issue, but RecursionErrors in this chunk of code have come up a few times. Might be worth adding a check in coerce_to_target_dtype

if self.dtype == new_dtype:
    raise AssertionError("Something has gone wrong. Please report a bug")

Raphencoder · 2023-06-27T09:17:31Z

@jbrockmendel I have the same behavior with the RecursionError and the pandas.errors.LossySetitemError, this bug is really annoying because it doesn't keep a trace of the problematic line of code that need to be change.
Is there any recommandation or tips in order to debug this kind of behavior ?

jbrockmendel · 2023-07-02T16:34:13Z

@Raphencoder short-term i'd suggest patching Block.coerce_to_target_dtype as suggested here. Longer-term this is something we'll have to fix.

MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member Categorical Categorical Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2023

MarcoGorelli mentioned this issue Apr 26, 2023

PDEP6 implementation pt 1: block.setitem, block.putmask #50626

Merged

jbrockmendel mentioned this issue Sep 19, 2023

BUG: RecursionError in loc.setitem #55201

Merged

5 tasks

mroeschke closed this as completed in #55201 Sep 19, 2023

LucaMarconato mentioned this issue Oct 28, 2023

DataFrameView in pandas 2.1.2 changed behavior, breaking AnnData in several ways scverse/anndata#1210

Closed

3 tasks

rob-sil mentioned this issue Dec 13, 2023

BUG: Infinite recursion with outer merge on categorical key column if left and right dtypes are mismatching non-nullable numeric kinds #56376

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Assigning Categorical to parts of column hangs indefinitely #52927

BUG: Assigning Categorical to parts of column hangs indefinitely #52927

MarcoGorelli commented Apr 26, 2023

INSTALLED VERSIONS

phofl commented Apr 26, 2023

jbrockmendel commented Apr 27, 2023

Raphencoder commented Jun 27, 2023 •

edited

Loading

jbrockmendel commented Jul 2, 2023

BUG: Assigning Categorical to parts of column hangs indefinitely #52927

BUG: Assigning Categorical to parts of column hangs indefinitely #52927

Comments

MarcoGorelli commented Apr 26, 2023

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Apr 26, 2023

jbrockmendel commented Apr 27, 2023

Raphencoder commented Jun 27, 2023 • edited Loading

jbrockmendel commented Jul 2, 2023

Raphencoder commented Jun 27, 2023 •

edited

Loading