Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Allow DeviceNDArray as input data #369

Closed
quasiben opened this issue Mar 18, 2019 · 2 comments

Comments

@quasiben
Copy link
Contributor

commented Mar 18, 2019

when using cuml fit methods you can only input cudf and numpy array objects. When constructing a cudf dataframe with many columns this can be a slow process. Instead, we can move data to the device with rmm:

gpu_data = rmm.to_device(np_data)

gpu_data is now an object of type numba.cuda.cudadrv.devicearray.DeviceNDArray which is what backs a cudf Dataframe. However, most fit methods only handle cudf.DataFrames and np.npdarrays.

Allows methods in cuml to work directly with rmm.to_device would speed things along and, I believe, make for a nicer UX compared with:

record_data = (('fea%d'%i, np_data[:,i]) for i in range(np_data.shape[1]))
gdf = cudf.DataFrame(record_data)
@cjnolet

This comment has been minimized.

Copy link
Collaborator

commented Mar 26, 2019

This is a great idea. I’ve been wondering why we have not supported this already. It’s low hanging fruit.

@dantegd dantegd added this to Issue-Needs prioritizing in v0.7 Release via automation Mar 26, 2019

@dantegd dantegd removed this from Needs prioritizing in Feature Planning Mar 26, 2019

@dantegd

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

We hadn't put the time before for this due to there not being a use case that pushed us into doing it (i.e. most gpu workflows so far have been cuDF into cuML). Now we have a good reasons to push for this in the next version.

That said, to properly use rmm.to_device as @quasiben describes it'll depend on cuML finishing the adoption of #247 in all the algorithms, but besides that this addition constitutes very little changes to the cython classes.

@dantegd dantegd moved this from Issue-Needs prioritizing to Issue-P0 in v0.7 Release Apr 12, 2019

@dantegd dantegd added 1 - On Deck and removed 0 - Backlog labels Apr 12, 2019

@cjnolet cjnolet moved this from Issue-P0 to Issue-P2 in v0.7 Release Apr 16, 2019

@cjnolet cjnolet added this to Issue-Needs prioritizing in v0.8 Release via automation May 1, 2019

@cjnolet cjnolet removed this from Issue-P2 in v0.7 Release May 1, 2019

@JohnZed JohnZed moved this from Issue-Needs prioritizing to Issue-P2 in v0.8 Release May 14, 2019

@dantegd dantegd closed this in #612 Jun 11, 2019

v0.8 Release automation moved this from Issue-P2 to Done Jun 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
3 participants
You can’t perform that action at this time.