Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.str._validate should infer for Series, not raise for all-na Index #23163

Closed
h-vetinari opened this issue Oct 15, 2018 · 0 comments

Comments

@h-vetinari
Copy link
Contributor

commented Oct 15, 2018

While working on #22725, a couple of work-arounds were necessary to correctly raise on wrong data types hiding as objects, e.g.

>>> pd.Series([1, 2, 3], dtype=object).str.cat([1, 2, 3])

However, already the .str accessor itself should raise on __init__ resp. the internal _validate method (this is closely related to #23011), i.e. instead of

>>> pd.Series([1,2,3], dtype=object).str
<pandas.core.strings.StringMethods object at 0x000002A4C70AE198>

it should be

>>> pd.Series([1,2,3], dtype=object).str
AttributeError

Interestingly, Index does correctly infer already in .str._validate:

>>> pd.Index([1,2,3], dtype=object).str
AttributeError: Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')

However, there is another nit about that that I want to fix at the same time as the inferral for Series - namely that an all-na object Index (or Series) should not raise the AttributeError (currently it does because all-na gets inferred as float). There are legitimate cases where a selection of string data may be all-na (by alignment or whatever), and if the dtype is object then this shouldn't fail.

Edit: xref #9343 #13877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.