-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] RF: code re-organization to enhance build parallelism #4299
[REVIEW] RF: code re-organization to enhance build parallelism #4299
Conversation
e4c501d
to
ef07c07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like adding a bunch of files but against long compile times it seems like the lesser of two evils.
I think this should go ahead and we can look at other options later.
No regressions found on GBM-Bench. The algorithmic correctness, splits generated every iteration, and output tree are the same. |
rerun tests |
f8f8784
to
f20989f
Compare
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #4299 +/- ##
===============================================
Coverage ? 85.73%
===============================================
Files ? 236
Lines ? 19314
Branches ? 0
===============================================
Hits ? 16558
Misses ? 2756
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
@gpucibot merge |
This PR separates the Decision tree kernels into separate Translation Units (TU) and explicitly instantiates templates. This is helpful in 2 ways: 1. refactoring top-level RF/DT code now would not require recompilation of the kernels 2. Since they are separated into different TUs and linked, they can leverage build parallelism (4x improvement in rebuild times after touching kernel definitions) Rebuilding by running `time ./build.sh libcuml -v -n PARALLEL_LEVEL=20` after touching RF kernels comparison: (Note: using `--ccache` doesn't matter here, assuming after touching RF kernels the state of the code-base is completely new and not part of ccache's hashed index) <details><summary>This PR</summary> ``` real 0m20.054s user 2m28.436s sys 0m14.241s ``` </details> <details><summary>branch-21.12</summary> ``` real 1m21.197s user 2m5.751s sys 0m6.050s ``` </details> Some other changes include renaming and reorganizing files, pruning headers and cleaning up some code Things to do: - [x] split DT Kernels - [x] benchmark for regressions Authors: - Venkat (https://github.com/venkywonka) Approvers: - Rory Mitchell (https://github.com/RAMitchell) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4299
This PR separates the Decision tree kernels into separate Translation Units (TU) and explicitly instantiates templates.
This is helpful in 2 ways:
Rebuilding by running
time ./build.sh libcuml -v -n PARALLEL_LEVEL=20
after touching RF kernels comparison:(Note: using
--ccache
doesn't matter here, assuming after touching RF kernels the state of the code-base is completely new and not part of ccache's hashed index)This PR
branch-21.12
Some other changes include renaming and reorganizing files, pruning headers and cleaning up some code
Things to do: