Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Configurable output type in Python API #637

Closed
4 tasks
dantegd opened this issue May 28, 2019 · 2 comments · Fixed by #1635
Closed
4 tasks

[FEA] Configurable output type in Python API #637

dantegd opened this issue May 28, 2019 · 2 comments · Fixed by #1635
Labels
0 - Backlog In queue waiting for assignment Cython / Python Cython or Python issue tests Unit testing for project

Comments

@dantegd
Copy link
Member

dantegd commented May 28, 2019

Is your feature request related to a problem? Please describe.
The output type of the Python API is fixed, depending on the algorithm, to be either cuDF series, dataframes or numba device arrays (usually cuDF objects).

Describe the solution you'd like
There are two main improvements to be had:

  1. Allow the output to be set by the user to be either cuDF, Numpy, Numba or CuPy (the formats supported for input after [REVIEW] Cuda Array Interface input and data input code cleanup #612)
  2. Perhaps make the default be a numba device array to not incur in the cost of building a DataFrame always? This is for sure open for discussion.

Tasks

  • Create output utility function to remove repeated code
  • Add tests for output function
  • Update existing code to use the function
  • Update docstrings
@dantegd dantegd added 0 - Backlog In queue waiting for assignment tests Unit testing for project Cython / Python Cython or Python issue labels May 28, 2019
@JohnZed
Copy link
Contributor

JohnZed commented Jul 15, 2019

This is a tricky one, but super valuable so thanks for bringing it up. The current approach (a mix of cudf types, cuda arrays, occasional numpy) is not very clear.

I suspect that always using the same output format (numba device arrays) would provide the most consistency for our APIs, at the cost of some usability for those who are using this as a drop-in replacement for sklearn. It also seems like the laziest approach from a compute perspective (nice!).

If we move to that approach (getting everything perfectly consistent) and then want to add a wrapper later to export to different types, I think we'll still be taking a step in the right direction now. It wouldn't be too hard to eventually convert to a property like

@property
def coef_(self):
   return self.to_preferred_format(self.__coef)

where the preferred format is determined by a param in the constructor.

I think if we support multiple internal formats, that will eventually complicate our own functions like score, predict, feature importance, etc.?

@dantegd
Copy link
Member Author

dantegd commented Aug 19, 2019

Definitely can be considered part of #1001 due to the current inconsistent behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment Cython / Python Cython or Python issue tests Unit testing for project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants