Unexpected behavior of the str attribute which is working with Series of list #25240

fran6w · 2019-02-09T15:15:29Z

Code Sample, a copy-pastable example if possible

import pandas as pd
s1 = pd.Series(['AA/aa', 'BB/bb', 'CC/cc'])
s2 = s1.str.split('/')
s2.str[0]

Result:
0 AA
1 BB
2 CC
dtype: object

Problem description

In this example, the second 'str' attribute is applyied to a Series of list and not to a Series of string.
Then the [] operator works fine with each list an retrieve their first element...

As it is an unexpected working behavior, one may wonder if it is secure to code like this (instead of working with an apply + lambda for instance). This works also with Series of dict, and probably with any object implementing the [] operator.

Expected Output

Warning or error?
Or a word in the documentation?

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.3.2
pip: 19.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: 0.10.9
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.5.12
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.2.3
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2019-02-09T23:00:06Z

Hmm...I'm not sure why the str accessor is even available in s2, since the values are list, not string.

That looks like a bug to me.

cc @jreback

fran6w · 2019-02-10T11:51:59Z

In fact, I have looked at the file core/string.py, lines 1729+, try:

import pd
help(pd.Series.str.get)

The str_get() function is documented and explains that it is able to extract element from each component at specified position. Examples with strings, list, tuple, dict.

My opinion is that the str accessor has indeed a broader use than it is explained in the main pandas documentation, e.g., https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.html

gfyoung · 2019-02-10T19:27:59Z

@fran6w : Ah, I see. In this case, I think we just need to document this much more thoroughly at the link you provided above. Thus, I'm less inclined to believe your example is buggy and is in fact expected behavior, in light of the behavior of str_get.

cc @jreback

fran6w · 2019-02-11T08:24:07Z

Indeed. Updating the documentation would be great.

BTW, if you take this example below, the global behavior of the str accessor remains strange. For instance, the contains(regex=False) method works fine for Series of list (or dict).

import pandas as pd
s1 = pd.Series(['AA/aa', 'BB/bb', 'CC/cc'])
s2 = s1.str.split('/')
s2.str.contains('AA', regex=False)

Result:
0 True
1 False
2 False
dtype: bool

In fact, the str accessor works fine in cases where the "string" function implementation used after "str" is compatible the actual objects type in the Series... In the case of contains(regex=False), the branch of the code uses a lambda (f = lambda x: pat in x) which appears to work with list or dict as well.

IMHO, those are working side effects...

gfyoung added Strings String extension data type and string data Index Related to the Index class or subclasses Indexing Related to indexing on series/frames, not to indexes themselves Bug and removed Index Related to the Index class or subclasses labels Feb 9, 2019

gfyoung added the Docs label Feb 10, 2019

mroeschke removed the Bug label May 3, 2020

jbrockmendel added Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected behavior of the str attribute which is working with Series of list #25240

Unexpected behavior of the str attribute which is working with Series of list #25240

fran6w commented Feb 9, 2019

INSTALLED VERSIONS

gfyoung commented Feb 9, 2019 •

edited

fran6w commented Feb 10, 2019

gfyoung commented Feb 10, 2019

fran6w commented Feb 11, 2019

Unexpected behavior of the str attribute which is working with Series of list #25240

Unexpected behavior of the str attribute which is working with Series of list #25240

Comments

fran6w commented Feb 9, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

gfyoung commented Feb 9, 2019 • edited

fran6w commented Feb 10, 2019

gfyoung commented Feb 10, 2019

fran6w commented Feb 11, 2019

Output of `pd.show_versions()`

gfyoung commented Feb 9, 2019 •

edited