Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dtype promotion in Series[ExtensionArray].__setitem__? #24020

Closed
TomAugspurger opened this issue Nov 30, 2018 · 5 comments
Closed

Allow dtype promotion in Series[ExtensionArray].__setitem__? #24020

TomAugspurger opened this issue Nov 30, 2018 · 5 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action

Comments

@TomAugspurger
Copy link
Contributor

Pandas typically allows for __setitem__ to change an object's dtype.

In [5]: a = pd.Series([1, 2])

In [6]: a[0] = 'a'

In [7]: a
Out[7]:
0    a
1    2
dtype: object

This typically won't work for ExtensionArrays:

In [8]: b = pd.Series([1, 2], dtype='Int64')

In [9]: b[0] = 'a'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-a40f9ead6ccf> in <module>
----> 1 b[0] = 'a'

...

TypeError: <U1 cannot be converted to an IntegerDtype

So, two questions:

  1. Do we want to allow this kind of type promotion for EA-backed series?
  2. If so, how do we design it? Presumably something like ExtensionArray._can_hold_item(item: Type[scalar, Sequence[scalar]]) -> bool. Or maybe the return could be True for "I can hold this", False for "No, raise an exception", and something else (a dtype?) for "Yes, but astype me to something else first)".

I'm needing to work around this for DatetimeTZArray, since we do allow setting with a new timezone (upcasting to object).

In [8]: ser = pd.Series(pd.date_range("2000", periods=4, tz='UTC'))

In [9]: ser.dtype
Out[9]: datetime64[ns, UTC]

In [10]: ser[0] = pd.Timestamp('2000', tz='US/Central')

In [11]: ser.dtype
Out[11]: dtype('O')
@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 30, 2018
@TomAugspurger TomAugspurger changed the title Allow dtype promotion in Series[ExtensionArray].__setitem__ Allow dtype promotion in Series[ExtensionArray].__setitem__? Nov 30, 2018
@h-vetinari
Copy link
Contributor

This seems like it might fit the method pandas.core.dtypes.cast.maybe_promote, which does effectively that. It's quite broken and needs a major overhaul anyway - I've started by writing a docstring and tests for it in #23982

@mroeschke
Copy link
Member

Similar issue encountered in #26041 (comment) with loc[:, col] instead of __setitem__ (probably hits the same path)

I'm +0.25 to coercing to object. It'd be great if the assignment "just works", but the object dtype tends to surprise people it seems.

@jbrockmendel
Copy link
Member

xref #39584

@jbrockmendel
Copy link
Member

@MarcoGorelli is this closable as "no"?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Jul 27, 2023
@MarcoGorelli
Copy link
Member

I'd say so yes, since pdep6 we've intentionally gone in the opposite direction

closing for now then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Dtype Conversions Unexpected or buggy dtype conversions Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

5 participants