[FEA] Configurable output type in Python API #637

dantegd · 2019-05-28T16:06:27Z

Is your feature request related to a problem? Please describe.
The output type of the Python API is fixed, depending on the algorithm, to be either cuDF series, dataframes or numba device arrays (usually cuDF objects).

Describe the solution you'd like
There are two main improvements to be had:

Allow the output to be set by the user to be either cuDF, Numpy, Numba or CuPy (the formats supported for input after [REVIEW] Cuda Array Interface input and data input code cleanup #612)
Perhaps make the default be a numba device array to not incur in the cost of building a DataFrame always? This is for sure open for discussion.

Tasks

Create output utility function to remove repeated code
Add tests for output function
Update existing code to use the function
Update docstrings

JohnZed · 2019-07-15T18:37:05Z

This is a tricky one, but super valuable so thanks for bringing it up. The current approach (a mix of cudf types, cuda arrays, occasional numpy) is not very clear.

I suspect that always using the same output format (numba device arrays) would provide the most consistency for our APIs, at the cost of some usability for those who are using this as a drop-in replacement for sklearn. It also seems like the laziest approach from a compute perspective (nice!).

If we move to that approach (getting everything perfectly consistent) and then want to add a wrapper later to export to different types, I think we'll still be taking a step in the right direction now. It wouldn't be too hard to eventually convert to a property like

@property
def coef_(self):
   return self.to_preferred_format(self.__coef)

where the preferred format is determined by a param in the constructor.

I think if we support multiple internal formats, that will eventually complicate our own functions like score, predict, feature importance, etc.?

dantegd · 2019-08-19T13:45:59Z

Definitely can be considered part of #1001 due to the current inconsistent behavior

dantegd added 0 - Backlog In queue waiting for assignment tests Unit testing for project Cython / Python Cython or Python issue labels May 28, 2019

dantegd mentioned this issue Jun 21, 2019

[BUG/TASK] Make the memory structures in the Python layer consistent #749

Closed

5 tasks

tfeher mentioned this issue Aug 26, 2019

Support Vector Machine #912

Merged

dantegd mentioned this issue Dec 28, 2019

[FEA] Consistent output conversions for all models #1521

Closed

dantegd mentioned this issue Feb 5, 2020

[REVIEW] cuML Array shim and configurable output added to base and cluster methods #1635

Merged

11 tasks

cjnolet closed this as completed in #1635 Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Configurable output type in Python API #637

[FEA] Configurable output type in Python API #637

dantegd commented May 28, 2019

JohnZed commented Jul 15, 2019

dantegd commented Aug 19, 2019

[FEA] Configurable output type in Python API #637

[FEA] Configurable output type in Python API #637

Comments

dantegd commented May 28, 2019

JohnZed commented Jul 15, 2019

dantegd commented Aug 19, 2019