Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String methods get() does not insert NaNs on negative index out of range #17704

Closed
louise-davies opened this issue Sep 28, 2017 · 2 comments · Fixed by #17741
Closed

String methods get() does not insert NaNs on negative index out of range #17704

louise-davies opened this issue Sep 28, 2017 · 2 comments · Fixed by #17741
Labels
Bug Strings String extension data type and string data
Milestone

Comments

@louise-davies
Copy link

louise-davies commented Sep 28, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
df1 = pd.DataFrame([["1_2_3_4_5"], ["6_7_8_9_10"], ["11_12"]], columns=["test"])

# access using positive index
df1["test"].str.split("_").str[2]
# output:
# 0        3
# 1        8
# 2      NaN

# access using negative index
df1["test"].str.split("_").str[-3]
# output:
# IndexError: list index out of range

Problem description

Since accessing an array using a negative index is supported if all items in the Series are the correct length for it, and if you access using a positive index that is out of range for any item in the series instead outputs a NaN for that item, I would think that the behaviour should be unified and that access using a negative index also outputs NaNs if an item in the Series is shorter than expected.

Expected Output

df1["test"].str.split("_").str[-3]
# output:
# 0        3
# 1        8
# 2      NaN

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-132-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.2.7
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.14
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@sinhrks sinhrks added Bug Strings String extension data type and string data labels Sep 29, 2017
@sinhrks
Copy link
Member

sinhrks commented Sep 29, 2017

Thx for the report. Yeah we should fix it to check negative boundary. PR is welcome.

f = lambda x: x[i] if len(x) > i else np.nan

@jreback jreback added this to the Next Major Release milestone Oct 1, 2017
@bobhaffner
Copy link
Contributor

I can submit a PR for this

@jreback jreback modified the milestones: Next Major Release, 0.21.0 Oct 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants