Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: Make experimental-backend default for regression tasks and deprecate old-backend. #3872

Merged

Conversation

venkywonka
Copy link
Contributor

@venkywonka venkywonka commented May 18, 2021

Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks. Overall LGTM.

cpp/src/decisiontree/decisiontree_impl.cuh Outdated Show resolved Hide resolved
python/cuml/ensemble/randomforestclassifier.pyx Outdated Show resolved Hide resolved
python/cuml/ensemble/randomforestregressor.pyx Outdated Show resolved Hide resolved
@teju85
Copy link
Member

teju85 commented May 18, 2021

This should only be merged after #3845 !

@teju85 teju85 added breaking Breaking change improvement Improvement / enhancement to an existing function labels May 18, 2021
@dantegd dantegd added the 0 - Blocked Cannot progress due to external reasons label May 18, 2021
@teju85 teju85 added the 5 - Merge After Dependencies Depends on another PR: do not merge out of order label May 18, 2021
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Copy link
Contributor

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should get removed:

CUML_LOG_WARN("Using experimental backend for growing trees\n");

@venkywonka
Copy link
Contributor Author

This should get removed:

CUML_LOG_WARN("Using experimental backend for growing trees\n");

I thought so too, @RAMitchell ,but wanted confirmation since that output was being tested; will remove it 👍

@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels May 19, 2021
@dantegd dantegd marked this pull request as ready for review May 27, 2021 16:17
@dantegd dantegd requested review from a team as code owners May 27, 2021 16:17
@hcho3 hcho3 self-requested a review May 27, 2021 16:19
Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

@venkywonka
Copy link
Contributor Author

@venkywonka The dask issues in CI seem to have cleared up. @but now the fails seem relevant to the PR:

=========================== short test summary info ============================
FAILED cuml/test/test_api.py::test_fit_function[RandomForestClassifier] - Val...
FAILED cuml/test/test_pickle.py::test_small_rf[10-20-100-RandomForestClassifier-float32]
FAILED cuml/test/test_random_forest.py::test_rf_regression[special_reg0-1-log2-True-100-float32-1.0]
FAILED cuml/test/test_random_forest.py::test_rf_regression[special_reg0-1-sqrt-True-100-float32-1.0] 

yes @dantegd , some unforeseen changes because of changing default n_bins=128 ... and others that are still mysterious.... diggin it

seems like it was a combination of 2, one of them due to the quoted reason, second is detailed in the issue #3910 . My latest commit closes #3910

@dantegd
Copy link
Member

dantegd commented May 28, 2021

rerun tests

@dantegd dantegd added this to PR-WIP in v21.06 Release via automation May 28, 2021
@dantegd
Copy link
Member

dantegd commented May 29, 2021

rerun tests

@dantegd
Copy link
Member

dantegd commented May 29, 2021

It seems like all the test jobs timed out in the same test (Jenkins UI doesn’t show it):

cuml/test/test_random_forest.py::test_rf_regression[special_reg0-1-log2-True-100-float32-1.0] 
Build timed out (after 40 minutes). Marking the build as failed.

@venkywonka
Copy link
Contributor Author

It seems like all the test jobs timed out in the same test (Jenkins UI doesn’t show it):

cuml/test/test_random_forest.py::test_rf_regression[special_reg0-1-log2-True-100-float32-1.0] 
Build timed out (after 40 minutes). Marking the build as failed.

I added more tests for the experimental backend in this PR (here), which exposed an unusual problem: seems like setting n_bins=100 hangs the regressor (but n_bins=128 did not)! (Not only in this PR but also branch-21.06 build!).
Will create an issue shortly and dig in deeper.
The scope of solving the aforementioned issue is beyond this PR (as the issue has nothing to do with this PR, this PR just happened to expose it).
Hence, for now, have made n_bins=128 in the tests so it does not hang...

@dantegd
Copy link
Member

dantegd commented Jun 1, 2021

@venkywonka do we have a list of values that could make the tests hang?

@venkywonka
Copy link
Contributor Author

venkywonka commented Jun 1, 2021

@venkywonka do we have a list of values that could make the tests hang?

Thanks to @vinaydes , we discovered the bug, it was a regression in regression (again) (sorry)
It is detailed in issue #3919 and has to do with some buggy code that's on me!
TLDR; for all values of n_bins > 64 and n_bins % 64 != 0 this will hang due to deadlock caused due to this line.

I have patched the fix in this PR #3921 and that should resolve this deadly bug

v21.06 Release automation moved this from PR-WIP to PR-Needs review Jun 1, 2021
@@ -217,10 +221,14 @@ def test_rf_classification(small_clf, datatype, split_algo,
(1, 'sqrt', False, 100),
(1, 1.0, True, 17),
(1, 1.0, True, 32),
(0, 1.0, True, 16),
(1, 1.0, True, 11),
(0, 'auto', True, 128),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic news on solving the bug @venkywonka (and @vinaydes) I think we should just add some test cases like n_bins 100 back to have better test coverage and we’re good to merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep will add them right away!

@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 0 - Blocked Cannot progress due to external reasons 5 - Merge After Dependencies Depends on another PR: do not merge out of order labels Jun 1, 2021
@@ -341,7 +366,7 @@ def test_rf_classification_float64(small_clf, datatype, convert_dtype):
fil_preds = np.reshape(fil_preds, np.shape(cu_preds))

fil_acc = accuracy_score(y_test, fil_preds)
assert fil_acc >= (cu_acc - 0.02)
assert fil_acc >= (cu_acc - 0.07)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be add a comment here as reminder to restore the old tolerance value later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will do

v21.06 Release automation moved this from PR-Needs review to PR-Reviewer approved Jun 2, 2021
@dantegd
Copy link
Member

dantegd commented Jun 2, 2021

@gpucibot merge

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@b30d527). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #3872   +/-   ##
===============================================
  Coverage                ?   77.30%           
===============================================
  Files                   ?      215           
  Lines                   ?    17042           
  Branches                ?        0           
===============================================
  Hits                    ?    13174           
  Misses                  ?     3868           
  Partials                ?        0           
Flag Coverage Δ
non-dask 77.30% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b30d527...20af360. Read the comment docs.

@rapids-bot rapids-bot bot merged commit fc23461 into rapidsai:branch-21.06 Jun 2, 2021
v21.06 Release automation moved this from PR-Reviewer approved to Done Jun 2, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…ate old-backend. (rapidsai#3872)

* This PR follows rapidsai#3845 and resolves rapidsai#3520
* Makes new-backend default for regression tasks. Now, for both classification and regression tasks, experimental-backend (new-backend) is better than old 😀
* Adds deprecation warning when using old-backend in the C++ DecisionTree layer so that the warning reflects for the decision trees C++ API too.
* Sets default `n_bins` to 128 
* Some docs update
* Some python tests update

Authors:
  - Venkat (https://github.com/venkywonka)
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Thejaswi. N. S (https://github.com/teju85)
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#3872
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review breaking Breaking change CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

[FEA] Promote experimental RF backend to default for regression as well
8 participants