Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: python api behaviour refactor #4207

Merged

Conversation

venkywonka
Copy link
Contributor

@venkywonka venkywonka commented Sep 14, 2021

This PR ⬇️

  • fixes [BUG] Random forest is not compatible with dask-ml GridsearchCV #4193 and fixes Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194 that relates to API incompatibility with dask-ml GridSearchCV
  • changes the behaviour of cuml RF in the following cases:
    • In the not-so-uncommon case when n_bins > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the n_bins as the number of training samples.
    • When .predict() is called using float64 data, instead of throwing an error asking user to explicitly specify predict_model="CPU" and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction.
  • Corresponding tests to capture the warnings from above added
  • the estimators now accept both numbers and strings as input for split_criterion parameter thus in parity with sklearn's API that takes in strings as criterion.
  • split_algo and use_experimental_backend parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF).
  • num_classes parameter of predict and score methods have also been similarly removed

@venkywonka venkywonka requested a review from a team as a code owner September 14, 2021 11:12
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021
@venkywonka venkywonka added 2 - In Progress Currenty a work in progress breaking Breaking change Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. doc Documentation improvement Improvement / enhancement to an existing function and removed Cython / Python Cython or Python issue labels Sep 14, 2021
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021
@venkywonka venkywonka removed doc Documentation Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. labels Sep 14, 2021
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021
@caryr35 caryr35 added this to PR-WIP in v21.10 Release via automation Sep 14, 2021
@caryr35 caryr35 moved this from PR-WIP to PR-Needs review in v21.10 Release Sep 14, 2021
@venkywonka venkywonka added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Sep 14, 2021
@venkywonka
Copy link
Contributor Author

rerun tests

@dantegd
Copy link
Member

dantegd commented Sep 18, 2021

@venkywonka I just reproduced the issue of CI in plain branch-21.10 locally, so on Monday we'll work on unblocking CI

@venkywonka
Copy link
Contributor Author

that's great @dantegd, thank you 🙏

@dantegd
Copy link
Member

dantegd commented Sep 19, 2021

rerun tests

@dantegd
Copy link
Member

dantegd commented Sep 19, 2021

The latest libcumlprims package should solve all issues

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, just had one comment, though I could deal with in in #4196 after merging this

"the number of samples used for training. "
"Changing `n_bins` to number of training samples."
in str(w[-1].message))
print(str(w[-1].message))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(str(w[-1].message))

I don't think it is necessary to print the message, maybe only if it is wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yea, that's on me will get rid of it, dante

v21.10 Release automation moved this from PR-Needs review to PR-Reviewer approved Sep 20, 2021
@dantegd
Copy link
Member

dantegd commented Sep 20, 2021

@gpucibot merge

@dantegd
Copy link
Member

dantegd commented Sep 21, 2021

rerun tests

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@36b3746). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #4207   +/-   ##
===============================================
  Coverage                ?   86.07%           
===============================================
  Files                   ?      231           
  Lines                   ?    18633           
  Branches                ?        0           
===============================================
  Hits                    ?    16039           
  Misses                  ?     2594           
  Partials                ?        0           
Flag Coverage Δ
dask 47.05% <0.00%> (?)
non-dask 78.74% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36b3746...0b4e7f0. Read the comment docs.

@rapids-bot rapids-bot bot merged commit b375320 into rapidsai:branch-21.10 Sep 21, 2021
v21.10 Release automation moved this from PR-Reviewer approved to Done Sep 21, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
This PR ⬇️ 
* fixes rapidsai#4193 and fixes rapidsai#4194 that relates to API incompatibility with dask-ml GridSearchCV
* changes the behaviour of cuml RF in the following cases:
    * In the not-so-uncommon case when `n_bins` > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the `n_bins` as the number of training samples. 
    * When `.predict()` is called using `float64` data, instead of throwing an error asking user to explicitly specify `predict_model="CPU"` and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction.
 * Corresponding tests to capture the warnings from above added
 * the estimators now accept both numbers and strings as input for `split_criterion` parameter thus in parity with sklearn's API that takes in strings as criterion.
 * `split_algo` and `use_experimental_backend` parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF). 
 * `num_classes` parameter of predict and score methods have also been similarly removed

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Rory Mitchell (https://github.com/RAMitchell)

URL: rapidsai#4207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function
Projects
No open projects
4 participants