Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make ExtensionArray a Protocol #57633

Open
1 of 3 tasks
WillAyd opened this issue Feb 26, 2024 · 7 comments
Open
1 of 3 tasks

ENH: Make ExtensionArray a Protocol #57633

WillAyd opened this issue Feb 26, 2024 · 7 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@WillAyd
Copy link
Member

WillAyd commented Feb 26, 2024

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The current pandas ExtensionArray is a standard Python class, and throughout our code base we do things like isinstance(obj, ExtensionArray) to determine at runtime if an object is an instance of the ExtensionArray.

While this works for classes implemented purely in Python that may inherit from a Python class, it does not work with extension classes that are implemented in either Cython, pybind11, nanobind, etc... See https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#subclassing for documentation of this limitation in Cython

As such, unless you implement your extension purely in Python it will not work correctly as an ExtensionArray

Feature Description

PEP 544 describes the runtime_checkable decorator that in theory can solve this issue without any major changes to our code base (ignoring any performance implications for now)

Alternative Solutions

Not sure there are any - I may be wrong but I do not think extension types in Python can inherit from Python types

Additional Context

No response

@WillAyd WillAyd added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 26, 2024
@jbrockmendel
Copy link
Member

Not sure there are any - I may be wrong but I do not think extension types in Python can inherit from Python types

Can't you

cdef class ActualImplementation:
    [most of the implementation]

class MyEA(ActualImplementation, ExtensionArray):
    pass

That's basically what we do with NDArrayBacked.

@WillAyd
Copy link
Member Author

WillAyd commented Feb 26, 2024

That works until you try to call an method of the extension class. MyEA().copy() will return an instance of ActualImplementation not of MyEA

@WillAyd
Copy link
Member Author

WillAyd commented Feb 26, 2024

Ah I take that back - OK cool I'll have to look more into what Cython is doing to make that maintain the MyEA type. Was not getting this with nanobind so must be a Cython feature:

import numpy as np

from pandas.api.extensions import ExtensionArray
from pandas._libs.arrays import NDArrayBacked


class MyEA(NDArrayBacked, ExtensionArray):
    ...

arr = MyEA(np.arange(3), np.int64)
assert type(arr) == type(arr.copy())

@jbrockmendel
Copy link
Member

jbrockmendel commented Feb 26, 2024 via email

@twoertwein
Copy link
Member

PEP 544 describes the runtime_checkable decorator that in theory can solve this issue without any major changes to our code base (ignoring any performance implications for now)

I think isinstance checks on a protocol are more expensive than on concrete classes: comparing all symbols (protocol) vs just checking __mro__ (concrete class)

@WillAyd
Copy link
Member Author

WillAyd commented Feb 26, 2024

Yea there is going to be some performance overhead, I think especially before Python 3.12. How much that matters I don't know - I am under the impression we aren't doing these checks in a tight loop but if you have ideas on what to benchmark happy to profile

@jbrockmendel
Copy link
Member

We aren’t doing the checks in a tight loop, but we are doing them everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants