Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: pandas extension API #514

Closed
ericmjl opened this issue Dec 13, 2018 · 3 comments
Closed

Feature request: pandas extension API #514

ericmjl opened this issue Dec 13, 2018 · 3 comments
Labels
cuDF (Python) Affects Python cuDF API. feature request New feature or request

Comments

@ericmjl
Copy link

ericmjl commented Dec 13, 2018

Feature request

pandas has an extension API that allows us to extend pandas very flexibly. As an example, I developed pyjanitor, which uses pandas-flavor underneath the hood, which enables functions to be registered as if they were native to a pandas dataframe. I'm hoping to see the pandas extension API implemented in cuDF, to allow for other custom functions to be attached to cuDF dataframes.

Having met with some of the Boston NVIDIA folks (Raghav, Rory, and Jennifer visiting from NYC), I wanted to put this up on the cuDF issue tracker for the record. No rush on this, totally understand that there are other priorities.

@kkraus14 kkraus14 added feature request New feature or request pandas labels Dec 13, 2018
@mike-wendt mike-wendt added this to Needs triage in Feature Planning Dec 14, 2018
@randerzander randerzander moved this from Needs prioritizing to Backlog in Feature Planning Dec 20, 2018
@kkraus14 kkraus14 added the cuDF (Python) Affects Python cuDF API. label Jul 3, 2019
@vyasr
Copy link
Contributor

vyasr commented Jul 13, 2022

Half of this request was implemented in #6302, resolving #6216, which requests a subset of this issue, accessors. We now support accessors. Due to the differences between CPU and GPU execution, I don't know if there's any way that we'll ever be able to support ExtensionArray in the generic way that pandas does. Even if we could, it would likely be painfully slow. I'm inclined to close this in favor of focusing on more specific requests such as our somewhat recent additions of first-class list/struct dtypes that can be represented and operated on efficiently on the GPU.

@shwina what do you think?
@ericmjl did the addition of accessors cover the primary use case that you were hoping to address? (Also hi Eric!)

@vyasr
Copy link
Contributor

vyasr commented Jul 21, 2022

I'm going to close this since I think that we have realistically accomplished what we can here. Unlike pandas, there is no way for us to implement all our operations in terms of a few primitives the way that pandas ExtensionArray does. If we find more specific dtypes of use we can always add direct support for them, but unfortunately I don't see the ExtensionArray concept as being feasible to implement in cuDF.

@vyasr vyasr closed this as completed Jul 21, 2022
Feature Planning automation moved this from Backlog to Closed Jul 21, 2022
@ericmjl
Copy link
Author

ericmjl commented Jul 22, 2022

@vyasr thanks for helping shepherd the closure of this issue! Yes, I think the accessors is sufficient for what I was thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuDF (Python) Affects Python cuDF API. feature request New feature or request
Projects
No open projects
Development

No branches or pull requests

3 participants