-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Move template param to member var to improve compile of hash/groupby.cu #6835
Conversation
Error 137 is "out of memory" |
Codecov Report
@@ Coverage Diff @@
## branch-0.17 #6835 +/- ##
============================================
Coverage 81.94% 81.94%
============================================
Files 96 96
Lines 16164 16164
============================================
Hits 13246 13246
Misses 2918 2918 Continue to review full report at Codecov.
|
@nvdbaranec is out this week so removing his review request. |
rerun tests |
The compile time/size of
cpp/src/groupby/hash/groupby.cu
is one of the top offenders for building libcudf.Current top 5 slowest compiles:
The two sort.cu files may be improved in a later PR. The
drop_duplicates.cu
is being addressed in #6822The simple change here is to
compute_single_pass_aggs
functor defined here:cudf/cpp/src/groupby/hash/groupby_kernels.cuh
Lines 65 to 66 in 591bead
The
skip_rows_with_nulls
template parameter is set to avoid calling (and inlining)cudf::bit_is_set()
. This function is minimal compared to thecudf::detail::aggregate_row
function that must be inlined twice to accommodate this template parameter. Simply changing this to a member variable means we still do not incur an extra call tocudf::bit_is_set()
when appropriate but also means we generate half as much device code for this specific function. Thecudf::detail::aggregate_row
code is quite significant.This change reduces the compile time for
hash/groupby.cu
from 16 minutes to 9 minutes. This moves it out of the top 5 (for now). This also reduces the size of the libcudf_base.so by ~5MB.There is no functional changes to any logic. The
gbenchmark/GROUPBY_BENCH
shows no change in performance.