Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Getting "Out of bounds on buffer access" error when .loc indexing a non-unique index with Int64 dtype #57027

Closed
3 tasks done
dalejung opened this issue Jan 23, 2024 · 0 comments · Fixed by #57061
Closed
3 tasks done
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@dalejung
Copy link
Contributor

dalejung commented Jan 23, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

ids = list(range(11))
index = pd.Index(ids * 1000)  # create a non-unique index
df = pd.DataFrame({'val': row_keys * 2}, index=index)
df.index = df.index.astype('Int64')
df.loc[ids]  # Errors

df.loc[ids[:5]]  # does not error since resulting DataFrame is 5k rows.

Issue Description

Indexing a DataFrame with an Int64 index will error with IndexError: Out of bounds on buffer access (axis 0) if the resulting dataframe has more than 10k values.

This does not error when the index is a regular int64 dtype. This does not occur if the index is unique.

Expected Behavior

Return the indexed dataframe. This works on 2.1.4.

Installed Versions

INSTALLED VERSIONS

commit : 5bd2c2e
python : 3.11.3.final.0
python-bits : 64
OS : Linux
OS-release : 6.7.0-arch3-1
Version : #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+167.g499662fa3d.dirty
numpy : 1.24.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.3.0
pip : 22.2.2
Cython : 3.0.8
pytest : 7.4.0
hypothesis : 6.47.1
sphinx : 6.2.1
blosc : None
feather : None
xlsxwriter : 3.1.0
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : 2.9.6
jinja2 : 3.1.2
IPython : 8.18.0.dev
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : 2023.4.0
fsspec : 2023.5.0
gcsfs : 2023.5.0
matplotlib : 3.7.1
numba : 0.57.0
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.0.dev408+g0b5ef14f3
pyreadstat : 1.2.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2023.5.0
scipy : 1.10.1
sqlalchemy : 2.0.14
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.5.0
xlrd : 2.0.1
zstandard : 0.21.0
tzdata : 2023.3
qtpy : 2.4.0
pyqt5 : None

@dalejung dalejung added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 23, 2024
@lukemanley lukemanley added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2024
@lukemanley lukemanley added this to the 2.2.1 milestone Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants