Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assigning Categorical to parts of column hangs indefinitely #52927

Closed
3 tasks done
MarcoGorelli opened this issue Apr 26, 2023 · 4 comments · Fixed by #55201
Closed
3 tasks done

BUG: Assigning Categorical to parts of column hangs indefinitely #52927

MarcoGorelli opened this issue Apr 26, 2023 · 4 comments · Fixed by #55201
Labels
Bug Categorical Categorical Data Type

Comments

@MarcoGorelli
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df = pd.DataFrame({"a": [1, 1, 1, 1, 1], "b": ["a", "a", "a", "a", "a"]})
df.loc[1:2, "a"] = pd.Categorical([2, 2], categories=[1, 2])

Issue Description

It hangs forever. Or at least, longer than I'm patient enough to wait

Expected Behavior

Based on

Assigning a ``Categorical`` to parts of a column of other types will use the values:

In [8]: df
Out[8]: 
   a  b
0  1  a
1  2  a
2  2  a
3  1  a
4  1  a

Installed Versions

INSTALLED VERSIONS

commit : 2e218d1
python : 3.8.16.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.5.3
numpy : 1.24.2
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.33
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.6.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : 2023.1.0
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.7
None

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member Categorical Categorical Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2023
@phofl
Copy link
Member

phofl commented Apr 26, 2023

I am getting a RecursionError at least

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1047, in setitem
    casted = np_can_hold_element(values.dtype, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/cast.py", line 1739, in np_can_hold_element
    raise LossySetitemError
pandas.errors.LossySetitemError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2023.1/scratches/scratch.py", line 473, in <module>
    df.loc[1:2, "a"] = pd.Categorical([2, 2], categories=[1, 2])
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 834, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 1813, in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 1857, in _setitem_with_indexer_split_path
    self._setitem_single_column(ilocs[0], value, pi)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/indexing.py", line 2012, in _setitem_single_column
    self.obj._mgr.column_setitem(loc, plane_indexer, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/managers.py", line 1378, in column_setitem
    new_mgr = col_mgr.setitem((idx,), value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/managers.py", line 399, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/managers.py", line 357, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1051, in setitem
    return nb.setitem(indexer, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1051, in setitem
    return nb.setitem(indexer, value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1051, in setitem
    return nb.setitem(indexer, value)
  [Previous line repeated 980 more times]
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1050, in setitem
    nb = self.coerce_to_target_dtype(value)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 425, in coerce_to_target_dtype
    return self.astype(new_dtype, copy=False)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 513, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/astype.py", line 239, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/astype.py", line 174, in astype_array
    if is_dtype_equal(values.dtype, dtype):
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/common.py", line 609, in is_dtype_equal
    source = _get_dtype(source)
  File "/Users/patrick/PycharmProjects/pandas/pandas/core/dtypes/common.py", line 1417, in _get_dtype
    if isinstance(arr_or_dtype, np.dtype):
RecursionError: maximum recursion depth exceeded while calling a Python object

@jbrockmendel
Copy link
Member

Haven't looked at the underlying issue, but RecursionErrors in this chunk of code have come up a few times. Might be worth adding a check in coerce_to_target_dtype

if self.dtype == new_dtype:
    raise AssertionError("Something has gone wrong. Please report a bug")

@Raphencoder
Copy link

Raphencoder commented Jun 27, 2023

@jbrockmendel I have the same behavior with the RecursionError and the pandas.errors.LossySetitemError, this bug is really annoying because it doesn't keep a trace of the problematic line of code that need to be change.
Is there any recommandation or tips in order to debug this kind of behavior ?

@jbrockmendel
Copy link
Member

@Raphencoder short-term i'd suggest patching Block.coerce_to_target_dtype as suggested here. Longer-term this is something we'll have to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants