Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.Index.Intersection fails with multiple data frames #58818

Closed
2 of 3 tasks
Ge0rges opened this issue May 24, 2024 · 3 comments
Closed
2 of 3 tasks

BUG: pd.Index.Intersection fails with multiple data frames #58818

Ge0rges opened this issue May 24, 2024 · 3 comments
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Index Related to the Index class or subclasses setops union, intersection, difference, symmetric_difference

Comments

@Ge0rges
Copy link

Ge0rges commented May 24, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

 df1 = pd.DataFrame({"name": ["1", "3", "c"], "value": [1, 2, 3]})
 df2 = pd.DataFrame({"name": ["b", "c", "d"], "value": [4, 5, 6]})
 df3 = pd.DataFrame({"name": ["c", "d", "e"], "value": [7, 8, 9]})

 df1.set_index("name", inplace=True)
 df2.set_index("name", inplace=True)
 df3.set_index("name", inplace=True)

 common_index = df1.index.intersection([df1.index, df2.index, df3.index])


### Issue Description

Intersection unexpectedly crashes with traceback:

Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1534, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/GeorgesKanaan/Documents/Development/Methylation/willis_dmr_analysis.py", line 158, in
common_index = df1.index.intersection([df1.index, df2.index, df3.index])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/GeorgesKanaan/micromamba/envs/jupyter/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3532, in intersection
result = self._intersection(other, sort=sort)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/GeorgesKanaan/micromamba/envs/jupyter/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3561, in _intersection
res_values = self._intersection_via_get_indexer(other, sort=sort)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/GeorgesKanaan/micromamba/envs/jupyter/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3582, in _intersection_via_get_indexer
right_unique = other.unique()
^^^^^^^^^^^^^^
File "/Users/GeorgesKanaan/micromamba/envs/jupyter/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3065, in unique
if self.is_unique:
^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.get
File "/Users/GeorgesKanaan/micromamba/envs/jupyter/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 2346, in is_unique
return self._engine.is_unique
^^^^^^^^^^^^^^^^^^^^^^
File "index.pyx", line 266, in pandas._libs.index.IndexEngine.is_unique.get
File "index.pyx", line 271, in pandas._libs.index.IndexEngine._do_unique_check
File "index.pyx", line 333, in pandas._libs.index.IndexEngine._ensure_mapping_populated
File "pandas/_libs/hashtable_class_helper.pxi", line 7115, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'Index'


### Expected Behavior

I should get out a list of indices common to all the indices passed.

### Installed Versions

<details>

INSTALLED VERSIONS
------------------
commit                : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python                : 3.12.3.final.0
python-bits           : 64
OS                    : Darwin
OS-release            : 23.5.0
Version               : Darwin Kernel Version 23.5.0: Wed May  1 20:16:51 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8103
machine               : arm64
processor             : arm
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.2
numpy                 : 1.26.4
pytz                  : 2023.3.post1
dateutil              : 2.8.2
setuptools            : 68.2.2
pip                   : 23.3.1
Cython                : 3.0.10
pytest                : None
hypothesis            : None
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : None
lxml.etree            : None
html5lib              : None
pymysql               : None
psycopg2              : None
jinja2                : 3.1.2
IPython               : 8.17.2
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.12.2
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : 3.8.4
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : 3.1.2
pandas_gbq            : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.13.0
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2023.3
qtpy                  : None
pyqt5                 : None
</details>
@Ge0rges Ge0rges added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 24, 2024
@Aloqeely
Copy link
Member

Thanks for the report! Index.intersection only compares 2 Index objects at a time, so you can't directly call it to compare 3 or more indexes, you can however chain the intersection method (call it multiple times on the result) to find the intersection of more than two indexes.

common_index = df1.index.intersection(df2.index).intersection(df3.index)

@Aloqeely Aloqeely added Index Related to the Index class or subclasses Closing Candidate May be closeable, needs more eyeballs setops union, intersection, difference, symmetric_difference and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 24, 2024
@tiago-firmino
Copy link

take

@Ge0rges
Copy link
Author

Ge0rges commented May 24, 2024

Thank you for the clarification I misunderstood the docs when the input mentioned array-like my mistake.

@Ge0rges Ge0rges closed this as completed May 24, 2024
@tiago-firmino tiago-firmino removed their assignment May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Index Related to the Index class or subclasses setops union, intersection, difference, symmetric_difference
Projects
None yet
Development

No branches or pull requests

3 participants