ExtensionArray Ops discussion #19577
Split from #19520
How to handle things like
Neither the Python 2 or Python 3 object defaults are appropriate, so I think we should make it abstract or provide a default implementation for some / all of these.
We should be able to do a default implementation for the simple case of binop(extension_array, extension_array) by
@jorisvandenbossche raises the point that for some extension arrays, casting to an object ndarray can be expensive. It'd be unfortunate to do the casting if we're just going to raise a
How much coercion do we want to attempt? I would vote to stay simple and only attempt a comparison if
Otherwise we return NotImplemented. Subclasses can of course choose more or less aggressive coercion rules.
When boxed in a
referenced this issue
Feb 7, 2018
Migrated from #19520:
I've been planning on moving the Index/Series arithmetic/comparison ops into Array subclasses, so that
Most of the comparison ops are ready to make that jump; I can prioritize it if getting that done quickly will help you out.
One caveat I'm concerned about is that the existing implementations in the Index subclasses have gotten tangled up with a bunch of unrelated Index machinery. Ideally I'd like these operations to go into self-contained mixin classes that rely on constructor methods, but are independent of slicing/concat/reindex/dropna/... mentioned above.
@TomAugspurger two quick namespace-gameplan questions.
Assuming the arith/comparison methods currently in DatetimeIndexOpsMixin/DTI/TDI/PI get moved into analogous array classes, do you envision these getting a) mixed into the appropriate Index/Block subclasses or b) accessed via composition? If the latter, what name (
Because the datetimelike methods wrap some of the base Index methods, some of those will need to move up too. Where do you envision something like BaseArray living?
@TomAugspurger So now I've hit upon this issue of how to deal with the binary operators, and the question is what should these operators return (in the case of the arithmetic operators). Consider the
So if the
So there are 3 options as I see it:
I think I'd favor (3), but I could live with (2). I'd prefer to not do (1), as that is a lot of effort that it seems people would have to repeat.
I think initially we should go for option 1, and make sure that pandas actually dispatches to it. But I don't fully understand your option 3. Can you clarify a bit more?
I don't think option 2 is actually an option (IMO it is also independent of deciding where the actual operation is implemented), because, as you mention, the result of an operation does not necessarily need to be of the same type (additional example: substraction of datetimes gives timedelta. That is definitely a case we need to handle)
@jorisvandenbossche Yes, I see now that option 2 isn't viable.
My idea on option 3 is something like this. The subclass implements a
The default behavior is to just do an element-by-element operation. If the class of the underlying dtype has implemented the operator, then it gets called automatically, and all is well.