Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate #4780

VamsiTallam95 · 2022-06-17T02:32:56Z

This PR closes #4478 by transforming non-consecutive labels outside of [0,n) to consecutive labels inside [0,n) similar to what Scikit-learn does under the hood.

Closes #691

cjnolet · 2022-06-17T04:18:09Z

python/cuml/ensemble/randomforest_common.pyx

-                    raise ValueError("The labels need "
-                                     "to be consecutive values from "
-                                     "0 to the number of unique label values")
+                    self.classes_unorder = cp.unique(y_m).tolist()


We should be reusing existing primitives where at all possible and using the make_monotonic primitive to do this. That allows us to optimize this specific operation once and have it benefit all uses.

…fea_4478

dantegd

Looks good, just had a couple of comments

dantegd · 2022-06-29T20:53:07Z

python/cuml/tests/test_random_forest.py

+@pytest.mark.parametrize("datatype", [np.float32, np.float64])
+@pytest.mark.parametrize("max_features", [1.0, "auto", "log2", "sqrt"])
+@pytest.mark.parametrize("b", [0, 5, -5, 10])
+@pytest.mark.parametrize("a", [1, 2, 3])


I don't think we need this full matrix of tests for testing the monotonic case, one combination for each datatype would be enough?

Sure, I will fix one value for a, b and max_features.

dantegd · 2022-06-29T20:54:00Z

python/cuml/ensemble/randomforest_common.pyx

+                    y_m, _ = make_monotonic(y_m)
+                    break



I wonder if the logic for the loop might be better to have in make_monotonic, perhaps with a parameter like make_monotonic(array, check_already_monotonic=True) so that every use of the prim is cleaner? What do you think?

I'd vote we make that change in RAFT, plumb raft::label::make_monotonic to Python (rapidsai/raft#640), and then make a follow up PR to use that in cuML and remove this prim.

Context: #4478 (comment)

What do you think? With that said, we could just do both :)

I did not modify the loop as I wanted to make least possible changes to the code. However, we can make use of check_lables primitives to see if the labels are already monotonic. I can make the change for cleaner implementation.

@beckernick I am on board with that idea.

ayushdg · 2022-06-29T23:52:45Z

rerun tests

VamsiTallam95 · 2022-07-10T01:42:27Z

rerun tests

VamsiTallam95 · 2022-07-12T18:09:13Z

rerun tests

VamsiTallam95 · 2022-07-14T21:06:54Z

rerun tests

codecov-commenter · 2022-07-15T00:47:15Z

Codecov Report

Merging #4780 (8cd8d3a) into branch-22.08 (b26fe7e) will increase coverage by 0.00%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           branch-22.08    #4780   +/-   ##
=============================================
  Coverage         77.62%   77.62%           
=============================================
  Files               180      180           
  Lines             11382    11384    +2     
=============================================
+ Hits               8835     8837    +2     
  Misses             2547     2547

Flag	Coverage Δ
dask	`45.52% <ø> (+<0.01%)`	⬆️
non-dask	`67.26% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/cuml/metrics/__init__.py	`100.00% <0.00%> (ø)`
python/cuml/metrics/cluster/__init__.py	`100.00% <0.00%> (ø)`
python/cuml/thirdparty_adapters/adapters.py	`91.48% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b26fe7e...8cd8d3a. Read the comment docs.

beckernick · 2022-07-25T22:07:46Z

Is this ready for another round of reviews?

VamsiTallam95 · 2022-07-26T19:05:50Z

Its ready!

lowener

LGTM

lowener · 2022-08-31T21:45:45Z

rerun tests

lowener · 2022-09-01T10:21:04Z

rerun tests

beckernick · 2022-09-08T13:53:29Z

Changing the title before merging, as this PR only applies this change to random forest models.

beckernick · 2022-09-12T14:50:11Z

This will also close #691

dantegd · 2022-09-29T20:00:16Z

@gpucibot merge

…ive labels where appropriate (rapidsai#4780) This PR closes rapidsai#4478 by transforming non-consecutive labels outside of [0,n) to consecutive labels inside [0,n) similar to what Scikit-learn does under the hood. Closes rapidsai#691 Authors: - https://github.com/VamsiTallam95 Approvers: - Micka (https://github.com/lowener) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4780

Vamsi Tallam added 2 commits June 16, 2022 19:05

feature addition

12efd59

feature addition

bbdb64d

VamsiTallam95 requested a review from a team as a code owner June 17, 2022 02:32

github-actions bot added the Cython / Python Cython or Python issue label Jun 17, 2022

VamsiTallam95 marked this pull request as draft June 17, 2022 02:33

cjnolet requested changes Jun 17, 2022

View reviewed changes

Vamsi Tallam added 3 commits June 19, 2022 13:21

reusing existing primitives

974446c

reusing existing primitives

de39e28

fixing style

7f934ac

beckernick added bug Something isn't working non-breaking Non-breaking change labels Jun 21, 2022

cjnolet added this to PR-WIP in v22.08 Release via automation Jun 22, 2022

VamsiTallam95 added 3 commits June 24, 2022 19:08

Merge branch 'branch-22.08' of https://github.com/rapidsai/cuml into …

b0b7389

…fea_4478

fixing Copyright check

e2ee5be

fixing docstring

2448689

VamsiTallam95 marked this pull request as ready for review June 27, 2022 15:07

VamsiTallam95 requested a review from cjnolet June 28, 2022 18:13

VamsiTallam95 added 3 commits June 28, 2022 11:13

Merge branch 'branch-22.08' of https://github.com/rapidsai/cuml into …

489329d

…fea_4478

Updated changelog

e8adf36

fixing docstring

9bd7a87

dantegd requested changes Jun 29, 2022

View reviewed changes

v22.08 Release automation moved this from PR-WIP to PR-Needs review Jun 29, 2022

VamsiTallam95 added 2 commits June 29, 2022 14:33

Incorporating feedback

396a498

fixing style

8cd8d3a

VamsiTallam95 requested a review from dantegd August 2, 2022 16:00

caryr35 added this to PR-WIP in v22.10 Release via automation Aug 9, 2022

caryr35 moved this from PR-WIP to PR-Needs review in v22.10 Release Aug 9, 2022

caryr35 removed this from PR-Needs review in v22.08 Release Aug 9, 2022

lowener approved these changes Aug 19, 2022

View reviewed changes

dantegd changed the base branch from branch-22.08 to branch-22.10 August 31, 2022 17:49

dantegd approved these changes Aug 31, 2022

View reviewed changes

beckernick changed the title ~~Transforms cuML estimators non-consecutive labels to consecutive labels where appropriate~~ Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate Sep 8, 2022

cjnolet approved these changes Sep 29, 2022

View reviewed changes

v22.10 Release automation moved this from PR-Needs review to PR-Reviewer approved Sep 29, 2022

rapids-bot bot merged commit 96da84c into rapidsai:branch-22.10 Sep 29, 2022

v22.10 Release automation moved this from PR-Reviewer approved to Done Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate #4780

Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate #4780

VamsiTallam95 commented Jun 17, 2022 •

edited by beckernick

cjnolet Jun 17, 2022

dantegd left a comment

dantegd Jun 29, 2022

VamsiTallam95 Jun 29, 2022

dantegd Jun 29, 2022

beckernick Jun 29, 2022 •

edited

VamsiTallam95 Jun 29, 2022

cjnolet Jun 29, 2022

ayushdg commented Jun 29, 2022

VamsiTallam95 commented Jul 10, 2022

VamsiTallam95 commented Jul 12, 2022

VamsiTallam95 commented Jul 14, 2022

codecov-commenter commented Jul 15, 2022

beckernick commented Jul 25, 2022

VamsiTallam95 commented Jul 26, 2022

lowener left a comment

lowener commented Aug 31, 2022

lowener commented Sep 1, 2022

beckernick commented Sep 8, 2022

beckernick commented Sep 12, 2022

dantegd commented Sep 29, 2022

Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate #4780

Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate #4780

Conversation

VamsiTallam95 commented Jun 17, 2022 • edited by beckernick

cjnolet Jun 17, 2022

Choose a reason for hiding this comment

dantegd left a comment

Choose a reason for hiding this comment

dantegd Jun 29, 2022

Choose a reason for hiding this comment

VamsiTallam95 Jun 29, 2022

Choose a reason for hiding this comment

dantegd Jun 29, 2022

Choose a reason for hiding this comment

beckernick Jun 29, 2022 • edited

Choose a reason for hiding this comment

VamsiTallam95 Jun 29, 2022

Choose a reason for hiding this comment

cjnolet Jun 29, 2022

Choose a reason for hiding this comment

ayushdg commented Jun 29, 2022

VamsiTallam95 commented Jul 10, 2022

VamsiTallam95 commented Jul 12, 2022

VamsiTallam95 commented Jul 14, 2022

codecov-commenter commented Jul 15, 2022

Codecov Report

beckernick commented Jul 25, 2022

VamsiTallam95 commented Jul 26, 2022

lowener left a comment

Choose a reason for hiding this comment

lowener commented Aug 31, 2022

lowener commented Sep 1, 2022

beckernick commented Sep 8, 2022

beckernick commented Sep 12, 2022

dantegd commented Sep 29, 2022

VamsiTallam95 commented Jun 17, 2022 •

edited by beckernick

beckernick Jun 29, 2022 •

edited