Enable probability output from RF binary classifier (alternative implementaton) #3869

hcho3 · 2021-05-17T16:06:22Z

Alternative implementation of #3862 that does not depend on #3854
Closes #3764
Closes #2518

vinaydes

Apart from my comments the PR looks good to me.

vinaydes · 2021-05-18T10:01:53Z

cpp/src/decisiontree/batched-levelalgo/kernels.cuh

+    int max_class_idx = 0;
+    int max_count = 0;
+    int total_count = 0;
+    for (int i = 0; i < input.nclasses; ++i) {


This loop was executed collaboratively by threads in the block. Now it is executed redundantly by all the threads in the block. Any particular reasons for that?

Two reasons:

I was following Refactor to extract random forest objectives #3854, which also redundantly computes the sum over the classes.

This PR requires the computation of total_count, which is the sum of all elements of shist. If the loop were to run collaboratively, I'd need to define an extra data structure to perform reduction for total_count.

total_count can be calculated with a call to cub::BlockReduce. Assuming nclasses is small this is small penalty. We could change it in future if that assumption breaks.
In that case, can the whole loop be moved inside the if (tid == 0) block on line 91?

@vinaydes Thanks. What do you think of #3854, where the summing is performed redundantly in all threads? For example:

cuml/cpp/src/decisiontree/batched-levelalgo/metrics.cuh

Lines 112 to 123 in bf5007c

static DI LabelT LeafPrediction(BinT* shist, int nclasses) {

int class_idx = 0;

int count = 0;

for (int i = 0; i < nclasses; i++) {

auto current_count = shist[i].x;

if (current_count > count) {

class_idx = i;

count = current_count;

}

}

return class_idx;

}

If I understand correctly, in #3854, LeafPrediction() is called on line kernels.cuh#L141. It is already inside tid == 0 block. I could be wrong if it is called at some other place too.

@vinaydes Got it. In that case, I can move this loop inside the tid == 0 block?

vinaydes · 2021-05-18T10:04:18Z

cpp/src/decisiontree/batched-levelalgo/node.cuh

    info.prediction = pred;
    info.colid = Leaf;
-    info.quesval = DataT(0);          // don't care for leaf nodes
+    info.quesval = aux;


Is this reuse necessary? I understand quesval is unused in leaf node but may be better to use separate member variable for this.

I wanted to avoid introducing a new member variable, since we want to introduce a new data structure to store the probability distribution for multi-class classifiers.

codecov-commenter · 2021-05-20T02:36:16Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@4d06991). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.06    #3869   +/-   ##
===============================================
  Coverage                ?   85.41%           
===============================================
  Files                   ?      227           
  Lines                   ?    17315           
  Branches                ?        0           
===============================================
  Hits                    ?    14790           
  Misses                  ?     2525           
  Partials                ?        0

Flag	Coverage Δ
dask	`48.93% <0.00%> (?)`
non-dask	`77.36% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d06991...e2a0ed3. Read the comment docs.

dantegd · 2021-05-24T16:14:46Z

rerun tests

dantegd

Changes lgtm, @vinaydes was wondering if you have any further comments or does this look good to merge?

vinaydes · 2021-05-27T16:02:31Z

Good to merge 👍

dantegd · 2021-05-27T16:59:38Z

@gpucibot merge

…ive implementaton) (rapidsai#3869)" This reverts commit 92484fb.

Reverts #3869, as it was shown to reduce the test accuracy in some cases. Closes #3910 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #3933

…ementaton) (rapidsai#3869) Alternative implementation of rapidsai#3862 that does not depend on rapidsai#3854 Closes rapidsai#3764 Closes rapidsai#2518 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Vinay Deshpande (https://github.com/vinaydes) URL: rapidsai#3869

Reverts rapidsai#3869, as it was shown to reduce the test accuracy in some cases. Closes rapidsai#3910 Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3933

Enable probability output from RF binary classifier

4fdb87b

hcho3 requested a review from a team as a code owner May 17, 2021 16:06

github-actions bot added the CUDA/C++ label May 17, 2021

hcho3 added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels May 17, 2021

vinaydes suggested changes May 18, 2021

View reviewed changes

hcho3 added 2 commits May 19, 2021 16:38

Handle RF from old backend; handle an edge case

3bd378c

Update Copyright year

e2a0ed3

hcho3 mentioned this pull request May 20, 2021

Enable probability output from RF binary classifier #3862

Closed

caryr35 added this to PR-WIP in v21.06 Release via automation May 21, 2021

Address reviewer's feedback

d6705fc

dantegd added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels May 22, 2021

v21.06 Release automation moved this from PR-WIP to PR-Reviewer approved May 27, 2021

dantegd approved these changes May 27, 2021

View reviewed changes

vinaydes approved these changes May 27, 2021

View reviewed changes

rapids-bot bot merged commit 92484fb into rapidsai:branch-21.06 May 27, 2021

v21.06 Release automation moved this from PR-Reviewer approved to Done May 27, 2021

venkywonka mentioned this pull request May 28, 2021

[BUG] RF : regression in GPU predict accuracy compared to CPU predict and scikit-learn predict #3910

Closed

hcho3 deleted the rf_binary_prob_output_prototype_alt branch June 2, 2021 17:39

hcho3 added a commit to hcho3/cuml that referenced this pull request Jun 2, 2021

Revert "Enable probability output from RF binary classifier (alternat…

add1c49

…ive implementaton) (rapidsai#3869)" This reverts commit 92484fb.

hcho3 mentioned this pull request Jun 2, 2021

Revert #3869 #3933

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable probability output from RF binary classifier (alternative implementaton) #3869

Enable probability output from RF binary classifier (alternative implementaton) #3869

hcho3 commented May 17, 2021 •

edited by dantegd

vinaydes left a comment

vinaydes May 18, 2021

hcho3 May 19, 2021 •

edited

vinaydes May 20, 2021

hcho3 May 20, 2021

vinaydes May 21, 2021

hcho3 May 21, 2021

vinaydes May 18, 2021

hcho3 May 19, 2021

codecov-commenter commented May 20, 2021

dantegd commented May 24, 2021

dantegd left a comment

vinaydes commented May 27, 2021

dantegd commented May 27, 2021

	static DI LabelT LeafPrediction(BinT* shist, int nclasses) {
	int class_idx = 0;
	int count = 0;
	for (int i = 0; i < nclasses; i++) {
	auto current_count = shist[i].x;
	if (current_count > count) {
	class_idx = i;
	count = current_count;
	}
	}
	return class_idx;
	}

Enable probability output from RF binary classifier (alternative implementaton) #3869

Enable probability output from RF binary classifier (alternative implementaton) #3869

Conversation

hcho3 commented May 17, 2021 • edited by dantegd

vinaydes left a comment

Choose a reason for hiding this comment

vinaydes May 18, 2021

Choose a reason for hiding this comment

hcho3 May 19, 2021 • edited

Choose a reason for hiding this comment

vinaydes May 20, 2021

Choose a reason for hiding this comment

hcho3 May 20, 2021

Choose a reason for hiding this comment

vinaydes May 21, 2021

Choose a reason for hiding this comment

hcho3 May 21, 2021

Choose a reason for hiding this comment

vinaydes May 18, 2021

Choose a reason for hiding this comment

hcho3 May 19, 2021

Choose a reason for hiding this comment

codecov-commenter commented May 20, 2021

Codecov Report

dantegd commented May 24, 2021

dantegd left a comment

Choose a reason for hiding this comment

vinaydes commented May 27, 2021

dantegd commented May 27, 2021

hcho3 commented May 17, 2021 •

edited by dantegd

hcho3 May 19, 2021 •

edited