Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: get_indexer methods return int64 instead of intp arrays #36359

Closed
3 tasks done
alexhlim opened this issue Sep 14, 2020 · 1 comment · Fixed by #36431
Closed
3 tasks done

BUG: get_indexer methods return int64 instead of intp arrays #36359

alexhlim opened this issue Sep 14, 2020 · 1 comment · Fixed by #36431
Assignees
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@alexhlim
Copy link
Member

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas as pd
>>> ax1 = pd.Index([1, 2, 3])
>>> ax2 = pd.Index([1, 1, 2])
>>> ans1 = ax1.get_indexer([1])
>>> ans2 = ax2.get_indexer_non_unique([1])
>>> print(ans1, ans1.dtype)
[0] int64
>>> print(ans2[0], ans2[0].dtype, ans2[1], ans2[1].dtype)
[0 1] int64 [] int64

Problem description

Found in #35498. When looking at the implementation of the get_indexer or get_indexer_non_unique in pandas/_libs/index.pyx, I noticed that the returned array dtype will always be int64. Since these methods return indices arrays, I believe that intp is a more appropriate type because it will choose a size depending on ssize_t, which is guaranteed to be large enough to represent all possible indices in the array.

Expected Output

[0] intp
[0 1] intp [] intp

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 7b14cf6b0b9dbcddce7b9bb22a81c73bdebc1be8
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.19.76-linuxkit
Version          : #1 SMP Tue May 26 11:42:35 UTC 2020
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : C.UTF-8
LANG             : C.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.0rc0+406.g7b14cf6b0
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 45.2.0.post20200210
Cython           : 0.29.21
pytest           : 5.4.3
hypothesis       : 5.20.2
sphinx           : 3.1.1
blosc            : None
feather          : None
xlsxwriter       : 1.2.9
lxml.etree       : 4.4.1
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.16.1
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : 1.3.2
fsspec           : 0.7.4
fastparquet      : 0.4.1
gcsfs            : 0.6.2
matplotlib       : 3.2.1
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : None
pyarrow          : 0.16.0
pytables         : None
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.0
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.50.1
@alexhlim alexhlim added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 14, 2020
@alexhlim
Copy link
Member Author

take

@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 14, 2020
@jreback jreback added this to the 1.2 milestone Sep 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants