[REVIEW] RF: code re-organization to enhance build parallelism #4299

venkywonka · 2021-10-21T09:37:33Z

This PR separates the Decision tree kernels into separate Translation Units (TU) and explicitly instantiates templates.
This is helpful in 2 ways:

refactoring top-level RF/DT code now would not require recompilation of the kernels
Since they are separated into different TUs and linked, they can leverage build parallelism (4x improvement in rebuild times after touching kernel definitions)

Rebuilding by running time ./build.sh libcuml -v -n PARALLEL_LEVEL=20 after touching RF kernels comparison:
(Note: using --ccache doesn't matter here, assuming after touching RF kernels the state of the code-base is completely new and not part of ccache's hashed index)

This PR

real    0m20.054s             
user    2m28.436s                                               
sys     0m14.241s

branch-21.12

real    1m21.197s                                                                                                                    
user    2m5.751s                                                                                                                     
sys     0m6.050s

Some other changes include renaming and reorganizing files, pruning headers and cleaning up some code

Things to do:

split DT Kernels
benchmark for regressions

…ilation

RAMitchell

I don't like adding a bunch of files but against long compile times it seems like the lesser of two evils.

I think this should go ahead and we can look at other options later.

venkywonka · 2021-11-24T14:35:45Z

No regressions found on GBM-Bench. The algorithmic correctness, splits generated every iteration, and output tree are the same.
One thing to note is that, due to the code reorganization, the generated kernel code for the computeSplitKernel uses lesser registers than before boosting the occupancy in the GPU, but this doesn't translate to a significant boost/regression in performance.

The below is an average of 100 runs end-to-end times

…ilation

venkywonka · 2021-11-30T13:20:30Z

rerun tests

codecov-commenter · 2021-12-06T11:10:11Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.02@ed0e58c). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.02    #4299   +/-   ##
===============================================
  Coverage                ?   85.73%           
===============================================
  Files                   ?      236           
  Lines                   ?    19314           
  Branches                ?        0           
===============================================
  Hits                    ?    16558           
  Misses                  ?     2756           
  Partials                ?        0

Flag	Coverage Δ
dask	`46.52% <0.00%> (?)`
non-dask	`78.62% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed0e58c...f20989f. Read the comment docs.

dantegd · 2021-12-06T15:43:53Z

@gpucibot merge

This PR separates the Decision tree kernels into separate Translation Units (TU) and explicitly instantiates templates. This is helpful in 2 ways: 1. refactoring top-level RF/DT code now would not require recompilation of the kernels 2. Since they are separated into different TUs and linked, they can leverage build parallelism (4x improvement in rebuild times after touching kernel definitions) Rebuilding by running `time ./build.sh libcuml -v -n PARALLEL_LEVEL=20` after touching RF kernels comparison: (Note: using `--ccache` doesn't matter here, assuming after touching RF kernels the state of the code-base is completely new and not part of ccache's hashed index) <details><summary>This PR</summary> ``` real 0m20.054s user 2m28.436s sys 0m14.241s ``` </details> <details><summary>branch-21.12</summary> ``` real 1m21.197s user 2m5.751s sys 0m6.050s ``` </details> Some other changes include renaming and reorganizing files, pruning headers and cleaning up some code Things to do: - [x] split DT Kernels - [x] benchmark for regressions Authors: - Venkat (https://github.com/venkywonka) Approvers: - Rory Mitchell (https://github.com/RAMitchell) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4299

venkywonka added 5 commits October 19, 2021 15:53

splitting kernels into seperate TUs

562f99e

delete kernels.cu

d35daab

isolate quantiles to seperate TU, move and rename some filess

fa77326

remove a template param from NodeSplitKernel and refactor specialization

da8e62b

reduce code duplication

80a960b

venkywonka requested review from a team as code owners October 21, 2021 09:37

github-actions bot added CMake CUDA/C++ labels Oct 21, 2021

venkywonka changed the title ~~RF: file re-organization to enhance build parallelism~~ [WIP] RF: file re-organization to enhance build parallelism Oct 21, 2021

venkywonka added the improvement Improvement / enhancement to an existing function label Oct 21, 2021

teju85 added Build or Dep Issues related to building the code or dependencies CUDA / C++ CUDA issue non-breaking Non-breaking change labels Oct 21, 2021

change naming convention

e5e5e77

caryr35 added this to PR-WIP in v21.12 Release via automation Oct 21, 2021

caryr35 moved this from PR-WIP to PR-Needs review in v21.12 Release Oct 21, 2021

dantegd requested a review from RAMitchell October 21, 2021 15:52

venkywonka changed the title ~~[WIP] RF: file re-organization to enhance build parallelism~~ [WIP] RF: code re-organization to enhance build parallelism Oct 22, 2021

Merge remote-tracking branch 'upstream/branch-21.12' into enh-rf-comp…

ef07c07

…ilation

venkywonka force-pushed the enh-rf-compilation branch from e4c501d to ef07c07 Compare October 26, 2021 12:06

RAMitchell approved these changes Nov 5, 2021

View reviewed changes

dantegd removed this from PR-Needs review in v21.12 Release Nov 18, 2021

dantegd added this to PR-WIP in v22.02 Release via automation Nov 18, 2021

venkywonka changed the base branch from branch-21.12 to branch-22.02 November 24, 2021 14:38

venkywonka changed the title ~~[WIP] RF: code re-organization to enhance build parallelism~~ [REVIEW] RF: code re-organization to enhance build parallelism Nov 26, 2021

Merge remote-tracking branch 'upstream/branch-22.02' into enh-rf-comp…

f20989f

…ilation

venkywonka force-pushed the enh-rf-compilation branch from f8f8784 to f20989f Compare December 6, 2021 05:54

dantegd approved these changes Dec 6, 2021

View reviewed changes

v22.02 Release automation moved this from PR-WIP to PR-Reviewer approved Dec 6, 2021

rapids-bot bot merged commit 4215577 into rapidsai:branch-22.02 Dec 6, 2021

v22.02 Release automation moved this from PR-Reviewer approved to Done Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] RF: code re-organization to enhance build parallelism #4299

[REVIEW] RF: code re-organization to enhance build parallelism #4299

venkywonka commented Oct 21, 2021 •

edited

RAMitchell left a comment

venkywonka commented Nov 24, 2021 •

edited

venkywonka commented Nov 30, 2021

codecov-commenter commented Dec 6, 2021

dantegd commented Dec 6, 2021

[REVIEW] RF: code re-organization to enhance build parallelism #4299

[REVIEW] RF: code re-organization to enhance build parallelism #4299

Conversation

venkywonka commented Oct 21, 2021 • edited

RAMitchell left a comment

Choose a reason for hiding this comment

venkywonka commented Nov 24, 2021 • edited

venkywonka commented Nov 30, 2021

codecov-commenter commented Dec 6, 2021

Codecov Report

dantegd commented Dec 6, 2021

venkywonka commented Oct 21, 2021 •

edited

venkywonka commented Nov 24, 2021 •

edited