Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Arithmetic operators in 1.5 throw RecursionError when dataframes have a mix of str and string typed levels #49769

Closed
3 tasks done
dk1010101 opened this issue Nov 18, 2022 · 2 comments · Fixed by #49776
Closed
3 tasks done
Assignees
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@dk1010101
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from calendar import month_abbr
import numpy as np

break_stuff = True

pd.show_versions()

midx = pd.MultiIndex.from_tuples([(yr, mo) for yr in ['2010', '2021'] for mo in [month_abbr[i] for i in range(1, 13)]], names=['MI1', 'MI2'])

if break_stuff:
    midx2 = midx.set_levels([midx.levels[0].astype('string'), midx.levels[1]])  # change outer level dtype to string
else:
    midx2 = midx

idx = pd.Index(['a', 'b', 'c', 'd'], name='Category')
data = np.random.rand(len(idx), len(midx))
x = pd.DataFrame(data, index=idx, columns=midx)
y = pd.DataFrame(data, index=idx, columns=midx2) 

z = y - x  # boom!

Issue Description

If one has identically structured data frames where one has some column names as type str and the other of type string, doing any arithmentic operations using operators (confirmed with +, -, /, *) results in the infinite recursion: RecursionError: maximum recursion depth exceeded while calling a Python object. This did not happen with 1.4.X.

The recursion happens in frame_arith_method_with_reindex.py line 366, from what i can see.

Expected Behavior

Expected behaviour is that the operators would work, and in the above example to return a dataframe with all zeros.

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.9.13.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 7, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : English_United Kingdom.1252

pandas : 1.5.1
numpy : 1.23.4
pytz : 2022.6
dateutil : 2.8.2
setuptools : 58.1.0
pip : 22.3.1
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli : None
fastparquet : 0.8.3
fsspec : 2022.11.0
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 10.0.0
pyreadstat : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : None
tables : 3.7.0
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : None

@dk1010101 dk1010101 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 18, 2022
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Nov 18, 2022

Thanks @dk1010101 for the report

git bisect show:

443f2b1 is the first bad commit

BUG: Multiindex.equals not commutative for ea dtype (#46047)

https://www.kaggle.com/code/marcogorelli/pandas-regression-example?scriptVersionId=111378614

@MarcoGorelli
Copy link
Member

cc @phofl

@phofl phofl self-assigned this Nov 18, 2022
@phofl phofl added this to the 1.5.3 milestone Nov 18, 2022
@phofl phofl added Regression Functionality that used to work in a prior pandas version Numeric Operations Arithmetic, Comparison, and Logical operations NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 18, 2022
@phofl phofl modified the milestones: 1.5.3, 1.5.2 Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants