Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct delegation of Series methods to ExtensionArrays #21305

Open
xhochy opened this issue Jun 3, 2018 · 4 comments
Open

Direct delegation of Series methods to ExtensionArrays #21305

xhochy opened this issue Jun 3, 2018 · 4 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@xhochy
Copy link
Contributor

xhochy commented Jun 3, 2018

During the implementation of non-numpy backed ExtensionArrays I quite often run into the case where it is simpler for me to write a complete re-implementation of the method defined on pd.Series instead of using the current implementation that only delegates part of the work. It would probably make sense to introduce some sort of delegation mechanism, either we continue the delegation like in

def unique(self):
or we could possibly add really general interface like NumPy's __array_ufunc__: https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array_ufunc__

My use case where this arises currently is coming from #21296 and pd.Series.argsort but I expect that there will be much more cases in this direction while I continue to implement the ExtensionArray interface for Arrow Arrays.

@jorisvandenbossche jorisvandenbossche added ExtensionArray Extending pandas with custom dtypes or arrays. API Design labels Jun 4, 2018
@jorisvandenbossche
Copy link
Member

Quick comment: for the argsort case, I think this could be solved by changing np.argsort(values, ..) to values.argsort(..) in the Series.argsort implementation? (if this is blocking you, fix certainly welcome)

But indeed, we should discuss this more in general.

@TomAugspurger
Copy link
Contributor

In the abstract, I'm also interested in this. The set of methods that are dispatched to currently is pretty ad-hoc (essentially enough to get df.groupby('extension_array').mean() working :)

@jbrockmendel
Copy link
Member

2 thoughts here

  1. Since the OP we've added many private EA methods that we dispatch to under the hood (EA._where, EA._putmask, EA._quantile). We could address many of these cases by leaning heavily on that pattern.
  2. Implementing something like __pandas_ufunc__ or __pandas_priority__ might be helpful for eg REGR: comparison op with dask data structure fails #38946

@jbrockmendel
Copy link
Member

I don't think there's any appetite for adding an __array_ufunc__-like mechanism, but we are definitely moving in the direction of more methods being defined on the EAs and being directly delegated to.

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

5 participants