RF: python api behaviour refactor #4207

venkywonka · 2021-09-14T11:12:03Z

This PR ⬇️

fixes [BUG] Random forest is not compatible with dask-ml GridsearchCV #4193 and fixes Am i using randomForest Classifier with gridsearch wrong & is xgboost supported? #4194 that relates to API incompatibility with dask-ml GridSearchCV
changes the behaviour of cuml RF in the following cases:
- In the not-so-uncommon case when n_bins > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the n_bins as the number of training samples.
- When .predict() is called using float64 data, instead of throwing an error asking user to explicitly specify predict_model="CPU" and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction.
Corresponding tests to capture the warnings from above added
the estimators now accept both numbers and strings as input for split_criterion parameter thus in parity with sklearn's API that takes in strings as criterion.
split_algo and use_experimental_backend parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF).
num_classes parameter of predict and score methods have also been similarly removed

venkywonka · 2021-09-18T05:59:12Z

rerun tests

dantegd · 2021-09-18T17:26:40Z

@venkywonka I just reproduced the issue of CI in plain branch-21.10 locally, so on Monday we'll work on unblocking CI

venkywonka · 2021-09-18T19:12:44Z

that's great @dantegd, thank you 🙏

dantegd · 2021-09-19T17:07:30Z

rerun tests

dantegd · 2021-09-19T17:07:51Z

The latest libcumlprims package should solve all issues

dantegd

Pre-approving, just had one comment, though I could deal with in in #4196 after merging this

dantegd · 2021-09-20T15:35:01Z

python/cuml/test/test_random_forest.py

+               "the number of samples used for training. "
+               "Changing `n_bins` to number of training samples."
+               in str(w[-1].message))
+        print(str(w[-1].message))


Suggested change

print(str(w[-1].message))

I don't think it is necessary to print the message, maybe only if it is wrong?

oh yea, that's on me will get rid of it, dante

dantegd · 2021-09-20T19:39:28Z

@gpucibot merge

dantegd · 2021-09-21T03:35:25Z

rerun tests

codecov-commenter · 2021-09-21T06:07:23Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@36b3746). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.10    #4207   +/-   ##
===============================================
  Coverage                ?   86.07%           
===============================================
  Files                   ?      231           
  Lines                   ?    18633           
  Branches                ?        0           
===============================================
  Hits                    ?    16039           
  Misses                  ?     2594           
  Partials                ?        0

Flag	Coverage Δ
dask	`47.05% <0.00%> (?)`
non-dask	`78.74% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36b3746...0b4e7f0. Read the comment docs.

This PR ⬇️ * fixes rapidsai#4193 and fixes rapidsai#4194 that relates to API incompatibility with dask-ml GridSearchCV * changes the behaviour of cuml RF in the following cases: * In the not-so-uncommon case when `n_bins` > number of rows in training sample, instead of throwing error and exiting, the estimator is made to print a warning and use the `n_bins` as the number of training samples. * When `.predict()` is called using `float64` data, instead of throwing an error asking user to explicitly specify `predict_model="CPU"` and rerun, a warning is displayed and implicity defaults to CPU-based prediction from the default GPU-based prediction. * Corresponding tests to capture the warnings from above added * the estimators now accept both numbers and strings as input for `split_criterion` parameter thus in parity with sklearn's API that takes in strings as criterion. * `split_algo` and `use_experimental_backend` parameters of the estimator class have now been completely removed from both documentation and warnings after deprecation in previous releases (from both single-gpu and dask RF). * `num_classes` parameter of predict and score methods have also been similarly removed Authors: - Venkat (https://github.com/venkywonka) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Rory Mitchell (https://github.com/RAMitchell) URL: rapidsai#4207

python api behaviour refactor

207fa3a

venkywonka requested a review from a team as a code owner September 14, 2021 11:12

github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021

flake8 fix

baaa425

github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021

fix a failing test

0ae902d

venkywonka removed doc Documentation Cython / Python Cython or Python issue Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. labels Sep 14, 2021

github-actions bot added the Cython / Python Cython or Python issue label Sep 14, 2021

caryr35 added this to PR-WIP in v21.10 Release via automation Sep 14, 2021

caryr35 moved this from PR-WIP to PR-Needs review in v21.10 Release Sep 14, 2021

venkywonka added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Sep 14, 2021

dantegd approved these changes Sep 20, 2021

View reviewed changes

v21.10 Release automation moved this from PR-Needs review to PR-Reviewer approved Sep 20, 2021

prune print

0b4e7f0

RAMitchell approved these changes Sep 21, 2021

View reviewed changes

rapids-bot bot merged commit b375320 into rapidsai:branch-21.10 Sep 21, 2021

v21.10 Release automation moved this from PR-Reviewer approved to Done Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RF: python api behaviour refactor #4207

RF: python api behaviour refactor #4207

venkywonka commented Sep 14, 2021 •

edited

venkywonka commented Sep 18, 2021

dantegd commented Sep 18, 2021 •

edited

venkywonka commented Sep 18, 2021

dantegd commented Sep 19, 2021

dantegd commented Sep 19, 2021

dantegd left a comment

dantegd Sep 20, 2021

venkywonka Sep 20, 2021

dantegd commented Sep 20, 2021

dantegd commented Sep 21, 2021

codecov-commenter commented Sep 21, 2021

RF: python api behaviour refactor #4207

RF: python api behaviour refactor #4207

Conversation

venkywonka commented Sep 14, 2021 • edited

venkywonka commented Sep 18, 2021

dantegd commented Sep 18, 2021 • edited

venkywonka commented Sep 18, 2021

dantegd commented Sep 19, 2021

dantegd commented Sep 19, 2021

dantegd left a comment

Choose a reason for hiding this comment

dantegd Sep 20, 2021

Choose a reason for hiding this comment

venkywonka Sep 20, 2021

Choose a reason for hiding this comment

dantegd commented Sep 20, 2021

dantegd commented Sep 21, 2021

codecov-commenter commented Sep 21, 2021

Codecov Report

venkywonka commented Sep 14, 2021 •

edited

dantegd commented Sep 18, 2021 •

edited