-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade thrust version to 1.15 #9912
Upgrade thrust version to 1.15 #9912
Conversation
Compile times between 1.12 and 1.15 are comparable. libcudf library size with 1.15 is < 1MB smaller compared to 1.12 Runtime Performance results: ``` Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- COMPILED_BINARYOP/BM_compiled_binaryop<double, double, double, cudf::binary_operator::MOD>/100000/manual_time +0.0519 +0.0268 10 10 27 28 COMPILED_BINARYOP/BM_compiled_binaryop<int32_t, int64_t, double, cudf::binary_operator::PMOD>/10000/manual_time +0.0851 +0.0295 9 10 28 29 COMPILED_BINARYOP/BM_compiled_binaryop<int, int, int, cudf::binary_operator::SHIFT_LEFT>/10000/manual_time -0.0971 -0.0218 6 5 25 24 COMPILED_BINARYOP/BM_compiled_binaryop<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/10000/manual_time -0.0717 -0.0196 7 6 26 25 COMPILED_BINARYOP/BM_compiled_binaryop<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/10000/manual_time +0.0748 +0.0191 7 8 26 27 ReductionDictionary/float_max/10000/manual_time -0.0714 -0.0512 31498 29249 50665 48071 ReductionScan/int16_nulls/100000/manual_time -0.1073 -0.0562 17598 15710 35995 33974 Concatenate/BM_concatenate<int64_t, true>/4096/64/manual_time -0.1211 -0.0844 0 0 0 0 OrcWrite/floats_file_output/31/0/32/1/0/manual_time +0.0227 +0.0576 84 86 78 83 CopyIfElse/int16/4096/manual_time +0.0648 +0.0262 0 0 0 0 CopyIfElse/float64/262144/manual_time +0.0609 +0.0192 0 0 0 0 CopyIfElse/int16_no_nulls/4096/manual_time +0.1355 +0.0219 0 0 0 0 CopyIfElse/int16_no_nulls/32768/manual_time +0.1582 +0.0395 0 0 0 0 CopyIfElse/uint32_no_nulls/4096/manual_time +0.1300 +0.0213 0 0 0 0 CopyIfElse/uint32_no_nulls/32768/manual_time +0.0500 +0.0155 0 0 0 0 CopyIfElse/float64_no_nulls/4096/manual_time +0.0583 +0.0083 0 0 0 0 TypeDispatcher/fp64_bandwidth_host/1/1024/1/manual_time +0.1010 +0.0236 3291 3623 22245 22771 TypeDispatcher/fp64_bandwidth_host/2/1024/1/manual_time +0.1157 +0.0309 4888 5453 23798 24533 TypeDispatcher/fp64_bandwidth_host/4/1024/1/manual_time +0.0901 +0.0324 8053 8779 26780 27647 TypeDispatcher/fp64_bandwidth_host/1/2048/1/manual_time +0.1191 +0.0246 3335 3732 22378 22928 TypeDispatcher/fp64_bandwidth_host/2/2048/1/manual_time +0.1349 +0.0321 4890 5550 23839 24604 TypeDispatcher/fp64_bandwidth_host/4/2048/1/manual_time +0.0622 +0.0177 8820 9369 27717 28207 TypeDispatcher/fp64_bandwidth_host/1/4096/1/manual_time +0.0951 +0.0229 3387 3709 22375 22888 TypeDispatcher/fp64_bandwidth_host/2/4096/1/manual_time +0.0507 +0.0125 5509 5788 24491 24796 TypeDispatcher/fp64_bandwidth_device/4/1024/1/manual_time +0.0601 +0.0266 10775 11422 29432 30215 TypeDispatcher/fp64_bandwidth_device/1/2048/1/manual_time +0.0618 +0.0290 9258 9831 27967 28778 TypeDispatcher/fp64_bandwidth_device/2/2048/1/manual_time +0.0516 +0.0218 9773 10277 28453 29073 TypeDispatcher/fp64_bandwidth_device/1/4096/1/manual_time +0.0653 +0.0259 9265 9869 27956 28681 TypeDispatcher/fp64_bandwidth_no/1/1024/1/manual_time +0.0868 +0.0275 3441 3739 22367 22983 TypeDispatcher/fp64_bandwidth_no/2/1024/1/manual_time +0.1635 +0.0396 3690 4293 22609 23503 TypeDispatcher/fp64_bandwidth_no/4/1024/1/manual_time +0.1144 +0.0316 4523 5040 23445 24186 TypeDispatcher/fp64_bandwidth_no/1/2048/1/manual_time +0.1469 +0.0413 3497 4011 22382 23306 TypeDispatcher/fp64_bandwidth_no/2/2048/1/manual_time +0.0550 +0.0216 4097 4323 23012 23509 TypeDispatcher/fp64_bandwidth_no/2/4096/1/manual_time +0.0510 +0.0151 4418 4643 23447 23800 TypeDispatcher/fp64_bandwidth_no/8/4096/1/manual_time +0.1093 +0.0235 6787 7528 26000 26612 TypeDispatcher/fp64_bandwidth_no/4/8192/1/manual_time +0.0608 +0.0071 5435 5766 24657 24832 SetNullmask/SetNullMaskKernel/1048576/manual_time -0.0949 -0.0022 3800 3440 22750 22699 StringExtract/four/32768/32/manual_time +0.0578 +0.0570 1 1 1 1 StringExtract/eight/4096/32/manual_time -0.0973 -0.0969 5 4 5 4 StringExtract/eight/4096/128/manual_time -0.0882 -0.0879 5 5 5 5 StringExtract/eight/4096/512/manual_time -0.0780 -0.0777 5 5 5 5 StringExtract/eight/4096/2048/manual_time -0.0703 -0.0701 5 5 6 5 StringExtract/eight/4096/8192/manual_time -0.0718 -0.0716 7 6 7 6 UrlDecode/BM_url_decode<10>/10000000/100/manual_time +0.0576 +0.0577 37 39 37 39 UrlDecode/BM_url_decode<50>/100000000/10/manual_time +0.0599 +0.0599 140 148 140 148 Shift/shift_half_nullable_out/1048576/manual_time -0.0813 -0.0745 0 0 0 0 Sort<false>/unstable_no_nulls/1024/8/manual_time -0.2900 -0.2832 1 1 1 1 Sort<false>/unstable_no_nulls/4096/8/manual_time -0.3819 -0.3739 1 1 1 1 Sort<false>/unstable_no_nulls/32768/8/manual_time -0.2767 -0.2714 1 1 1 1 Sort<false>/unstable_no_nulls/262144/8/manual_time -0.1866 -0.1840 1 1 1 1 Sort<true>/stable_no_nulls/1024/8/manual_time -0.2920 -0.2852 1 1 1 1 Sort<true>/stable_no_nulls/4096/8/manual_time -0.3818 -0.3737 1 1 1 1 Sort<true>/stable_no_nulls/32768/8/manual_time -0.2777 -0.2722 1 1 1 1 Sort<true>/stable_no_nulls/262144/8/manual_time -0.1886 -0.1859 1 1 1 1 Sort<false>/unstable/1024/8/manual_time -0.2835 -0.2778 1 1 1 1 Sort<false>/unstable/4096/8/manual_time -0.3425 -0.3365 1 1 1 1 Sort<false>/unstable/32768/8/manual_time -0.2179 -0.2149 1 1 1 1 Sort<false>/unstable/262144/8/manual_time -0.0930 -0.0922 2 2 2 2 Sort<true>/stable/1024/8/manual_time -0.2874 -0.2817 1 1 1 1 Sort<true>/stable/4096/8/manual_time -0.3490 -0.3429 1 1 1 1 Sort<true>/stable/32768/8/manual_time -0.2232 -0.2202 1 1 1 1 Sort<true>/stable/262144/8/manual_time -0.0964 -0.0955 2 2 2 2 Search/Table/4/1000000/manual_time +0.0796 +0.0560 4 4 4 4 Gather/double_coalesce_x/8192/1/manual_time +0.0751 +0.0182 8503 9141 27316 27812 Gather/double_coalesce_x/16384/1/manual_time +0.0879 +0.0246 8641 9400 27247 27916 Gather/double_coalesce_x/8192/2/manual_time +0.0635 +0.0286 16209 17238 34700 35692 Gather/double_coalesce_x/1024/4/manual_time +0.0517 +0.0308 30234 31797 49033 50543 Gather/double_coalesce_x/4096/4/manual_time +0.0568 +0.0327 30768 32517 49409 51026 Gather/double_coalesce_x/2048/8/manual_time +0.0533 +0.0376 59436 62601 78176 81117 Gather/double_coalesce_o/16384/1/manual_time +0.0525 +0.0138 8921 9389 27447 27824 ReplaceClamp/float_no_nulls/10000/manual_time +0.0502 +0.0309 41663 43753 60740 62616 AST/BM_ast_transform<int32_t, TreeType::IMBALANCED_LEFT, true, true>/100000000/1/manual_time +0.0581 +0.0579 8 9 8 9 ```
Here are the significant changes to performance that the benchmark found:
I am going to re-run the |
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9912 +/- ##
================================================
- Coverage 10.49% 10.40% -0.09%
================================================
Files 119 119
Lines 20305 20503 +198
================================================
+ Hits 2130 2134 +4
- Misses 18175 18369 +194
Continue to review full report at Codecov.
|
After re-running the
|
I agree that your repeated benchmarks look fine. Two questions: 1) Are the negative standard deviations just typos? 2) Are any of the speedups you originally reported meaningful, or are those also just statistical noise? |
|
After running the sort benchmark for ten iterations we see consistent improvements in the following:
|
@gpucibot merge |
PR #10489 updated from Thrust 1.15 to Thrust 1.16. However, this appears to be causing conflicts with other repositories -- [cuSpatial](rapidsai/cuspatial#511 (comment)) and cuGraph have reported issues where their builds are finding Thrust 1.16 from libcudf instead of Thrust 1.15 which is [currently pinned by rapids-cmake](https://github.com/rapidsai/rapids-cmake/blob/06a657281cdd83781e49afcdbb39abc491eeab17/rapids-cmake/cpm/versions.json#L26). This PR is intended to unblock local builds and CI builds for other RAPIDS packages until we are able to identify the root cause (which may be due to CMake include path orderingsrapids-cmake). Last time Thrust was updated, [rapids-cmake was updated](rapidsai/rapids-cmake#138) one day before [libcudf was updated](#9912). That may explain why we didn't notice this problem with the 1.15 update. The plan I currently have in mind is: 1. Merge this PR to roll back libcudf to Thrust 1.15 (and revert the patch for Thrust 1.16 [10577](#10577)). This will hopefully unblock CI for cugraph and cuspatial. 2. Try to work out whatever issues with CMake / include paths may exist. 3. Prepare all rapids-cmake repos for Thrust 1.16 compatibility. I've [done this for RMM already](rapidsai/rmm#1011), and I am working on [PR 4675](rapidsai/cuml#4675) to cuML now. I am planning to make the same fixes for `#include`s in cuCollections, raft, cuSpatial, and cuGraph so they will be compatible with Thrust 1.16. 4. Try to upgrade libcudf to Thrust 1.16 again (and re-apply the updated patch). If (2) has been resolved, I hope we won't see any issues in other RAPIDS libraries 5. Upgrade rapids-cmake to Thrust 1.16. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Mark Harris (https://github.com/harrism) URL: #10586
PR rapidsai#10489 updated from Thrust 1.15 to Thrust 1.16. However, this appears to be causing conflicts with other repositories -- [cuSpatial](rapidsai/cuspatial#511 (comment)) and cuGraph have reported issues where their builds are finding Thrust 1.16 from libcudf instead of Thrust 1.15 which is [currently pinned by rapids-cmake](https://github.com/rapidsai/rapids-cmake/blob/06a657281cdd83781e49afcdbb39abc491eeab17/rapids-cmake/cpm/versions.json#L26). This PR is intended to unblock local builds and CI builds for other RAPIDS packages until we are able to identify the root cause (which may be due to CMake include path orderingsrapids-cmake). Last time Thrust was updated, [rapids-cmake was updated](rapidsai/rapids-cmake#138) one day before [libcudf was updated](rapidsai#9912). That may explain why we didn't notice this problem with the 1.15 update. The plan I currently have in mind is: 1. Merge this PR to roll back libcudf to Thrust 1.15 (and revert the patch for Thrust 1.16 [10577](rapidsai#10577)). This will hopefully unblock CI for cugraph and cuspatial. 2. Try to work out whatever issues with CMake / include paths may exist. 3. Prepare all rapids-cmake repos for Thrust 1.16 compatibility. I've [done this for RMM already](rapidsai/rmm#1011), and I am working on [PR 4675](rapidsai/cuml#4675) to cuML now. I am planning to make the same fixes for `#include`s in cuCollections, raft, cuSpatial, and cuGraph so they will be compatible with Thrust 1.16. 4. Try to upgrade libcudf to Thrust 1.16 again (and re-apply the updated patch). If (2) has been resolved, I hope we won't see any issues in other RAPIDS libraries 5. Upgrade rapids-cmake to Thrust 1.16. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Mark Harris (https://github.com/harrism) URL: rapidsai#10586
Compile times between 1.12 and 1.15 are comparable.
libcudf library size with 1.15 is < 1MB smaller compared to 1.12