Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNMG Logistic Regression (dask-glm wrapper) #3512

Merged
merged 70 commits into from
Apr 26, 2022

Conversation

daxiongshu
Copy link
Contributor

In this PR, I'll wrap dask-glm models so that it accepts dask_cudf Dataframes and behaves like other cuml.dask models. dask-glm provides three estimators: LogisticRegression, LinearRegression and PoissonRegression. MNMG LogisticRegression is requested by @beckernick . @JohnZed for visibility. Thank you all.

@daxiongshu daxiongshu requested a review from a team as a code owner February 18, 2021 05:11
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Feb 18, 2021
@JohnZed JohnZed changed the title [WIP] MNMG Logistic Regression [WIP] MNMG Logistic Regression (dask-glm wrapper) Feb 25, 2021
@daxiongshu daxiongshu requested a review from a team as a code owner March 19, 2021 20:18
@github-actions github-actions bot added the conda conda issue label Mar 19, 2021
@codecov-io
Copy link

Codecov Report

Merging #3512 (2fa2d54) into branch-0.19 (f5d86b9) will increase coverage by 7.96%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #3512      +/-   ##
===============================================
+ Coverage        72.91%   80.87%   +7.96%     
===============================================
  Files              214      229      +15     
  Lines            16856    17777     +921     
===============================================
+ Hits             12290    14378    +2088     
+ Misses            4566     3399    -1167     
Flag Coverage Δ
dask 45.20% <0.00%> (?)
non-dask 73.16% <0.00%> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...thon/cuml/dask/linear_model/logistic_regression.py 0.00% <0.00%> (ø)
python/cuml/common/numba_utils.py 0.00% <0.00%> (ø)
python/cuml/neighbors/__init__.py 100.00% <0.00%> (ø)
python/cuml/model_selection/__init__.py 100.00% <0.00%> (ø)
python/cuml/internals/global_settings.py 100.00% <0.00%> (ø)
python/cuml/neighbors/nearest_neighbors_mg.pyx 98.51% <0.00%> (ø)
python/cuml/decomposition/base_mg.pyx 100.00% <0.00%> (ø)
python/cuml/linear_model/base_mg.pyx 100.00% <0.00%> (ø)
python/cuml/cluster/dbscan_mg.pyx 100.00% <0.00%> (ø)
python/cuml/neighbors/kneighbors_classifier_mg.pyx 100.00% <0.00%> (ø)
... and 78 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5d86b9...2fa2d54. Read the comment docs.

@daxiongshu daxiongshu added non-breaking Non-breaking change feature request New feature or request labels Mar 21, 2021
@github-actions github-actions bot added the gpuCI gpuCI issue label Mar 21, 2021
CHANGELOG.md Outdated Show resolved Hide resolved
@daxiongshu
Copy link
Contributor Author

While I'm fixing the CI errors, I'd like to raise some issues with this PR. It lacks several functionalities of other cuml.dask.linear_models because it doesn't have an internal model.

  1. No get_combined_model() and _set_internal_model
  2. No mutli-gpu training and single-gpu inference.
  3. No single-gpu training and multi-gpu inference.
  4. No pickling of the model.

The functionalities 2) and 4) can be added quite easily but they will look different from other cuml.dask.linear_models.

Currently it subclassed the cuml.dask.common.base.BaseEstimator quite trivially. The only things it utilized from BaseEstimator is checking if client is valid.

This implementation also has some different hyperparameters from other cuml.dask.linear_models:

  • No hyperparameter normalized. dask-glm always normalizes inputs internally.
  • No hyperparameter delayed

So I have two questions:

  • Are these missing functionalities required by the customers? @beckernick
  • Is it necessary to subclass cuml.dask.common.base.BaseEstimator? Due to the differences, @JohnZed suggests that we put it into another namespace say cuml wrappers.

Thank you all. Please let me know.

@beckernick
Copy link
Member

Does this implementation implicitly support a GPU based LogisticRegression.decision_function via Dask dispatching?

Due to the differences, @JohnZed suggests that we put it into another namespace say cuml wrappers.

This sounds like a great idea!

@daxiongshu daxiongshu changed the base branch from branch-21.10 to branch-22.06 April 8, 2022 00:10
@dantegd
Copy link
Member

dantegd commented Apr 26, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6ca7254 into rapidsai:branch-22.06 Apr 26, 2022
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
In this PR, I'll wrap `dask-glm` models so that it accepts `dask_cudf Dataframes` and behaves like other `cuml.dask` models. `dask-glm` provides three estimators: `LogisticRegression`, `LinearRegression` and `PoissonRegression`. MNMG  `LogisticRegression` is requested by @beckernick . @JohnZed for visibility. Thank you all.

Authors:
  - Jiwei Liu (https://github.com/daxiongshu)
  - Nick Becker (https://github.com/beckernick)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#3512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Blocked Cannot progress due to external reasons conda conda issue Cython / Python Cython or Python issue feature request New feature or request gpuCI gpuCI issue non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants