Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable training single GPU cuML models using Dask DataFrames and Series #4300

Merged
merged 18 commits into from
Nov 15, 2021

Conversation

ChrisJar
Copy link
Contributor

@ChrisJar ChrisJar commented Oct 21, 2021

This PR makes it possible to train single GPU cuML models using Dask DataFrames and Series by converting the Dask data-structures to their cudf counterparts before training. This will allow using Dask-SQL with cuML models.

Tests added for logistic regression, currently working on adding more

Depends on #4317

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Oct 21, 2021
@caryr35 caryr35 added this to PR-WIP in v21.12 Release via automation Oct 21, 2021
python/cuml/common/input_utils.py Outdated Show resolved Hide resolved
python/cuml/common/input_utils.py Outdated Show resolved Hide resolved
python/cuml/common/input_utils.py Outdated Show resolved Hide resolved
v21.12 Release automation moved this from PR-WIP to PR-Needs review Oct 21, 2021
@dantegd dantegd added the 4 - Waiting on Author Waiting for author to respond to review label Oct 21, 2021
@ChrisJar ChrisJar marked this pull request as ready for review November 2, 2021 15:53
@ChrisJar ChrisJar requested a review from a team as a code owner November 2, 2021 15:53
@dantegd
Copy link
Member

dantegd commented Nov 4, 2021

rerun tests

1 similar comment
@dantegd
Copy link
Member

dantegd commented Nov 9, 2021

rerun tests

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small comments remaining

@@ -556,8 +563,12 @@ def convert_dtype(X,
if the conversion would lose information.
"""

if isinstance(X, (dask_cudf.core.Series, dask_cudf.core.DataFrame)):
# TODO: Warn, but not when using dask_sql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a github issue to track this?

python/cuml/common/input_utils.py Outdated Show resolved Hide resolved
@dantegd dantegd added the Experimental Used to denote experimental features label Nov 13, 2021
@dantegd dantegd added feature request New feature or request non-breaking Non-breaking change labels Nov 13, 2021
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@4d3410a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.12    #4300   +/-   ##
===============================================
  Coverage                ?   86.01%           
===============================================
  Files                   ?      231           
  Lines                   ?    18771           
  Branches                ?        0           
===============================================
  Hits                    ?    16146           
  Misses                  ?     2625           
  Partials                ?        0           
Flag Coverage Δ
dask 47.03% <0.00%> (?)
non-dask 78.70% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d3410a...cd15f1a. Read the comment docs.

@dantegd
Copy link
Member

dantegd commented Nov 15, 2021

@gpucibot merge

v21.12 Release automation moved this from PR-Needs review to PR-Reviewer approved Nov 15, 2021
@rapids-bot rapids-bot bot merged commit 9b015f8 into rapidsai:branch-21.12 Nov 15, 2021
v21.12 Release automation moved this from PR-Reviewer approved to Done Nov 15, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…es (rapidsai#4300)

This PR makes it possible to train single GPU cuML models using Dask DataFrames and Series by converting the Dask data-structures to their cudf counterparts before training. This will allow using Dask-SQL with cuML models.

Tests added for logistic regression, currently working on adding more

Depends on rapidsai#4317

Authors:
  - https://github.com/ChrisJar
  - Sarah Yurick (https://github.com/sarahyurick)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4300
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review Cython / Python Cython or Python issue Experimental Used to denote experimental features feature request New feature or request non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants