-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Replace use of rmm::device_vector
with rmm::device_uvector
where possible
#7287
Comments
Volunteered to replace the |
Volunteering to replace the |
This PR refactors strings column factories to eliminate the use of `device_vector` and `std::vector` parameters, and to facility more use of `device_uvector` in calls to the factories. This is a small part of #7287 . Multiple versions of `make_strings_columns` take `device_vector` parameters. This PR expands the use of iterator and `device_span` versions to enable switching to `device_uvector` as described in #7287. It also adds new `make_device_uvector_async/sync` utility functions. This will help facilitate safe CUDA stream usage. Authors: - Mark Harris (@harrism) Approvers: - AJ Schmidt (@ajschmidt8) - Jake Hemstad (@jrhemstad) - David (@davidwendt) - Christopher Harris (@cwharris) URL: #7397
Reference #7287 This PR changes `cpp/src/text/*` source files to use `rmm::device_uvector` instead of `rmm:device_vector`. This allows keeping the memory operations on the provided kernel stream. Authors: - David (@davidwendt) Approvers: - Paul Taylor (@trxcllnt) - Mark Harris (@harrism) URL: #7512
This started to be a change converting some `device_vector` usages in `cpp/src/strings` source files to use `device_uvector` instead. The `cpp/src/strings/copying/copying.cu` source has the implementation for `cudf::strings::detail::copy_slice()` and used a `device_vector` to handle a `step` parameter. I can not longer find this parameter being used. I believe it was a hold over from porting nvstrings. So this PR mainly includes changes for removing this unneeded parameter which also removes the need for the `device_vector` or temporary memory in this function. And, it also includes changes to `attributes.cu` to use the `device_uvector` as well. ~~I'm marking this as non-breaking change since it is a change to a `detail` API and did not seem to be used anywhere in this repo.~~ Reference #7287 Authors: - David (@davidwendt) Approvers: - AJ Schmidt (@ajschmidt8) - Mike Wilson (@hyperbolic2346) - Jake Hemstad (@jrhemstad) URL: #7525
Working on replacing device_vector and std::vector with uvector and spans in concatenate. |
…/writer and cudf::io::column_buffer (#7614) Issue #7287 Replaces `device_vector` with `device_uvector` and `device_span`. Because `device_uvector` does not have a default constructor, some additional changes were required for `device_uvector` data members. Performance impact: this change makes a measurable difference in reader benchmarks. Most string column cases are sped up around **5%**, with other cases having a measurable, but less consistent improvement. Authors: - Vukasin Milovanovic (@vuule) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Kumar Aatish (@kaatish) URL: #7614
Contributes to #7287 This PR replaces `std::vector` with `host_span` in public and detail `cudf::contatenate` functions, and replaces `rmm::device_vector` with `rmm::device_uvector` in the concatenate implementations. It also strengthens the SFINAE restrictions on `cudf::host_span` and `cudf::device_span` so that they cannot be constructed from containers unless the container's value_type is the same as the span's value_type. This PR also - [x] Updates cython. - [x] benchmarks before and after Authors: - Mark Harris (@harrism) Approvers: - Jake Hemstad (@jrhemstad) - Vukasin Milovanovic (@vuule) - Ashwin Srinath (@shwina) URL: #7621
…ai#7512) Reference rapidsai#7287 This PR changes `cpp/src/text/*` source files to use `rmm::device_uvector` instead of `rmm:device_vector`. This allows keeping the memory operations on the provided kernel stream. Authors: - David (@davidwendt) Approvers: - Paul Taylor (@trxcllnt) - Mark Harris (@harrism) URL: rapidsai#7512
…dsai#7525) This started to be a change converting some `device_vector` usages in `cpp/src/strings` source files to use `device_uvector` instead. The `cpp/src/strings/copying/copying.cu` source has the implementation for `cudf::strings::detail::copy_slice()` and used a `device_vector` to handle a `step` parameter. I can not longer find this parameter being used. I believe it was a hold over from porting nvstrings. So this PR mainly includes changes for removing this unneeded parameter which also removes the need for the `device_vector` or temporary memory in this function. And, it also includes changes to `attributes.cu` to use the `device_uvector` as well. ~~I'm marking this as non-breaking change since it is a change to a `detail` API and did not seem to be used anywhere in this repo.~~ Reference rapidsai#7287 Authors: - David (@davidwendt) Approvers: - AJ Schmidt (@ajschmidt8) - Mike Wilson (@hyperbolic2346) - Jake Hemstad (@jrhemstad) URL: rapidsai#7525
addresses part of #7287 - replaced `std::vector const&` with `host_span<const>` - replaced `rmm::device_[u]vector const&` with `device_span<const>` - replaced `rmm::device_vector` with `rmm::device_uvector` Authors: - Karthikeyan (@karthikeyann) Approvers: - Jake Hemstad (@jrhemstad) - Nghia Truong (@ttnghia) - Mark Harris (@harrism) URL: #7698
Added a checklist. I'm working on gather. |
@vuule did you check off ORC? Can you link the PR that completes it? |
Removes `device_vector` in favour of either `device_uvector` or `device_buffer` as appropriate in parquet reader and writer. Contributes to #7287 Depends on #7758 Authors: - Devavret Makkar (https://github.com/devavret) - Mark Harris (https://github.com/harrism) Approvers: - Mark Harris (https://github.com/harrism) - Vukasin Milovanovic (https://github.com/vuule) - MithunR (https://github.com/mythrocks) - Mike Wilson (https://github.com/hyperbolic2346) URL: #7853
…/writer (#7805) Issue #7287 Replaces `device_vector` with `device_uvector` and `device_span`. Removed the `device_vector` data members. Performance impact: - Writer: None - Reader: ~up to 10% slower, will look into this.~ None Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Devavret Makkar (https://github.com/devavret) - Kumar Aatish (https://github.com/kaatish) - Mark Harris (https://github.com/harrism) URL: #7805
Converts `cudf::rank` to use `device_uvector` instead of `device_vector`. Also adds a benchmark for `cudf::rank`. Contributes to #7287 Performance is at most 1.7% slower and up to 12% faster. ``` Comparing /home/mharris/rapids/cudf/cpp/build/rank_before.json to /home/mharris/rapids/cudf/cpp/build/rank_after.json Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------- Rank/no_nulls/1024/manual_time +0.0170 +0.0140 0 0 0 0 Rank/no_nulls/4096/manual_time +0.0092 +0.0076 0 0 0 0 Rank/no_nulls/32768/manual_time +0.0034 +0.0042 0 0 0 0 Rank/no_nulls/262144/manual_time -0.0542 -0.0574 0 0 0 0 Rank/no_nulls/2097152/manual_time -0.0493 -0.0509 1 1 1 1 Rank/no_nulls/16777216/manual_time -0.1217 -0.1289 7 6 7 6 Rank/no_nulls/67108864/manual_time +0.0007 +0.0005 24 24 24 24 Rank/nulls/1024/manual_time -0.0392 -0.0370 0 0 0 0 Rank/nulls/4096/manual_time -0.0173 -0.0160 0 0 0 0 Rank/nulls/32768/manual_time +0.0021 +0.0027 0 0 0 0 Rank/nulls/262144/manual_time -0.0027 -0.0025 0 0 0 0 Rank/nulls/2097152/manual_time +0.0002 -0.0003 7 7 7 7 Rank/nulls/16777216/manual_time -0.0409 -0.0334 59 56 58 56 Rank/nulls/67108864/manual_time -0.0003 -0.0004 232 232 232 232 ``` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Karthikeyan (https://github.com/karthikeyann) - https://github.com/nvdbaranec - Keith Kraus (https://github.com/kkraus14) URL: #8029
…8043) Converts `cudf::merge` to use `device_uvector`. Contributes to #7287 Performance is improved: ``` Comparing /home/mharris/rapids/cudf/cpp/build/merge_before.json to /home/mharris/rapids/cudf/cpp/build/merge_after.json Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------- Merge/pow2tables/2/manual_time -0.0831 -0.0791 0 0 0 0 Merge/pow2tables/4/manual_time -0.0924 -0.0930 1 1 1 1 Merge/pow2tables/8/manual_time -0.0813 -0.0808 4 3 4 3 Merge/pow2tables/16/manual_time -0.1172 -0.1170 9 8 9 8 Merge/pow2tables/32/manual_time -0.0790 -0.0790 19 18 20 18 Merge/pow2tables/64/manual_time -0.0757 -0.0757 46 43 46 43 Merge/pow2tables/128/manual_time -0.0730 -0.0730 111 103 111 103 ``` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) URL: #8043
Converts internals of cudf::quantiles to use device_uvector instead of device_vector. Also adds a quantiles benchmark. Contributes to #7287 Performance is better (sometimes significantly, when the number of quantiles requested is larger). ``` Comparing /home/mharris/rapids/cudf/cpp/build/quantiles_before.json to /home/mharris/rapids/cudf/cpp/build/quantiles_after.json Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------- Quantiles/no_nulls/65536/1/1/manual_time -0.0234 -0.0185 0 0 0 0 Quantiles/no_nulls/262144/1/1/manual_time -0.0317 -0.0282 0 0 0 0 Quantiles/no_nulls/1048576/1/1/manual_time -0.0234 -0.0209 0 0 0 0 Quantiles/no_nulls/4194304/1/1/manual_time -0.0080 -0.0080 1 1 1 1 Quantiles/no_nulls/16777216/1/1/manual_time +0.0079 +0.0083 3 3 3 3 Quantiles/no_nulls/67108864/1/1/manual_time +0.0023 +0.0025 10 10 10 10 Quantiles/no_nulls/65536/4/1/manual_time -0.1138 -0.1115 1 1 1 1 Quantiles/no_nulls/262144/4/1/manual_time -0.0933 -0.0917 1 1 1 1 Quantiles/no_nulls/1048576/4/1/manual_time -0.0067 -0.0083 8 8 8 8 Quantiles/no_nulls/4194304/4/1/manual_time +0.0045 +0.0061 54 54 54 54 Quantiles/no_nulls/16777216/4/1/manual_time +0.0044 +0.0044 292 294 292 294 Quantiles/no_nulls/67108864/4/1/manual_time +0.0011 +0.0010 1506 1508 1506 1507 Quantiles/no_nulls/65536/8/1/manual_time -0.1127 -0.1099 1 1 1 1 Quantiles/no_nulls/262144/8/1/manual_time -0.0843 -0.0829 1 1 1 1 Quantiles/no_nulls/1048576/8/1/manual_time -0.0083 -0.0084 9 9 9 9 Quantiles/no_nulls/4194304/8/1/manual_time -0.0015 +0.0012 54 54 54 54 Quantiles/no_nulls/16777216/8/1/manual_time -0.0023 -0.0022 293 293 293 293 Quantiles/no_nulls/67108864/8/1/manual_time +0.0061 +0.0060 1515 1525 1515 1525 Quantiles/no_nulls/65536/1/4/manual_time -0.0231 -0.0183 0 0 0 0 Quantiles/no_nulls/262144/1/4/manual_time -0.0231 -0.0196 0 0 0 0 Quantiles/no_nulls/1048576/1/4/manual_time -0.0127 -0.0105 0 0 0 0 Quantiles/no_nulls/4194304/1/4/manual_time -0.0033 -0.0038 1 1 1 1 Quantiles/no_nulls/16777216/1/4/manual_time +0.0026 +0.0027 3 3 3 3 Quantiles/no_nulls/67108864/1/4/manual_time -0.0016 -0.0016 10 10 10 10 Quantiles/no_nulls/65536/4/4/manual_time -0.1157 -0.1127 1 1 1 1 Quantiles/no_nulls/262144/4/4/manual_time -0.0958 -0.0937 1 1 1 1 Quantiles/no_nulls/1048576/4/4/manual_time -0.0188 -0.0140 9 8 9 8 Quantiles/no_nulls/4194304/4/4/manual_time -0.0156 -0.0143 55 54 55 54 Quantiles/no_nulls/16777216/4/4/manual_time -0.0040 -0.0013 293 292 293 292 Quantiles/no_nulls/67108864/4/4/manual_time -0.0452 -0.0452 1583 1511 1583 1511 Quantiles/no_nulls/65536/8/4/manual_time -0.1081 -0.1061 1 1 1 1 Quantiles/no_nulls/262144/8/4/manual_time -0.0810 -0.0797 1 1 1 1 Quantiles/no_nulls/1048576/8/4/manual_time -0.0119 -0.0087 9 9 9 9 Quantiles/no_nulls/4194304/8/4/manual_time -0.0097 -0.0053 55 54 54 54 Quantiles/no_nulls/16777216/8/4/manual_time -0.0013 -0.0013 293 293 293 293 Quantiles/no_nulls/67108864/8/4/manual_time -0.0462 -0.0461 1595 1521 1595 1521 Quantiles/no_nulls/65536/1/12/manual_time -0.5136 -0.5258 0 0 0 0 Quantiles/no_nulls/262144/1/12/manual_time -0.8295 -0.8370 1 0 1 0 Quantiles/no_nulls/1048576/1/12/manual_time -0.6905 -0.7111 1 0 1 0 Quantiles/no_nulls/4194304/1/12/manual_time -0.5660 -0.6006 2 1 2 1 Quantiles/no_nulls/16777216/1/12/manual_time -0.0123 -0.0140 3 3 3 3 Quantiles/no_nulls/67108864/1/12/manual_time -0.2079 -0.2183 13 10 13 10 Quantiles/no_nulls/65536/4/12/manual_time -0.1085 -0.1061 1 1 1 1 Quantiles/no_nulls/262144/4/12/manual_time -0.0799 -0.0922 1 1 1 1 Quantiles/no_nulls/1048576/4/12/manual_time -0.0145 -0.0105 9 8 8 8 Quantiles/no_nulls/4194304/4/12/manual_time -0.0120 -0.0119 54 54 54 54 Quantiles/no_nulls/16777216/4/12/manual_time +0.0019 +0.0020 292 293 292 293 Quantiles/no_nulls/67108864/4/12/manual_time +0.0042 +0.0043 1506 1513 1506 1513 Quantiles/no_nulls/65536/8/12/manual_time -0.1718 -0.1722 1 1 1 1 Quantiles/no_nulls/262144/8/12/manual_time -0.2191 -0.2248 2 1 2 1 Quantiles/no_nulls/1048576/8/12/manual_time -0.0065 -0.0072 9 9 9 9 Quantiles/no_nulls/4194304/8/12/manual_time -0.1978 -0.1996 67 54 67 54 Quantiles/no_nulls/16777216/8/12/manual_time -0.0019 -0.0019 293 293 293 293 Quantiles/no_nulls/67108864/8/12/manual_time -0.1957 -0.1956 1892 1522 1891 1521 Quantiles/nulls/65536/1/1/manual_time -0.4734 -0.4950 1 0 1 0 Quantiles/nulls/262144/1/1/manual_time -0.2358 -0.2536 1 0 1 0 Quantiles/nulls/1048576/1/1/manual_time -0.0002 -0.0002 2 2 2 2 Quantiles/nulls/4194304/1/1/manual_time -0.2323 -0.2392 17 13 17 13 Quantiles/nulls/16777216/1/1/manual_time -0.0085 -0.0084 54 53 54 53 Quantiles/nulls/67108864/1/1/manual_time +0.0001 +0.0001 217 217 217 217 Quantiles/nulls/65536/4/1/manual_time -0.2901 -0.2992 2 1 2 1 Quantiles/nulls/262144/4/1/manual_time -0.4554 -0.4771 3 2 3 2 Quantiles/nulls/1048576/4/1/manual_time -0.1957 -0.2122 12 10 12 10 Quantiles/nulls/4194304/4/1/manual_time -0.1326 -0.1367 73 63 73 63 Quantiles/nulls/16777216/4/1/manual_time -0.1951 -0.1953 447 360 447 359 Quantiles/nulls/67108864/4/1/manual_time -0.0034 -0.0034 2015 2009 2015 2008 Quantiles/nulls/65536/8/1/manual_time -0.1251 -0.1259 2 2 2 2 Quantiles/nulls/262144/8/1/manual_time -0.0505 -0.0499 2 2 2 2 Quantiles/nulls/1048576/8/1/manual_time -0.3018 -0.3064 15 11 15 11 Quantiles/nulls/4194304/8/1/manual_time -0.0492 -0.0470 67 64 67 64 Quantiles/nulls/16777216/8/1/manual_time +0.0008 +0.0008 359 359 359 359 Quantiles/nulls/67108864/8/1/manual_time +0.0315 +0.0315 2004 2067 2004 2067 Quantiles/nulls/65536/1/4/manual_time -0.0476 -0.0499 0 0 0 0 Quantiles/nulls/262144/1/4/manual_time -0.0050 -0.0042 0 0 0 0 Quantiles/nulls/1048576/1/4/manual_time +0.0006 +0.0006 2 2 2 2 Quantiles/nulls/4194304/1/4/manual_time -0.0061 -0.0060 13 13 13 13 Quantiles/nulls/16777216/1/4/manual_time +0.0073 +0.0072 53 53 53 53 Quantiles/nulls/67108864/1/4/manual_time +0.0012 +0.0013 217 217 217 217 Quantiles/nulls/65536/4/4/manual_time -0.1419 -0.1431 1 1 1 1 Quantiles/nulls/262144/4/4/manual_time -0.3466 -0.3736 3 2 3 2 Quantiles/nulls/1048576/4/4/manual_time -0.2328 -0.2406 13 10 13 10 Quantiles/nulls/4194304/4/4/manual_time -0.0345 -0.0365 65 63 65 63 Quantiles/nulls/16777216/4/4/manual_time +0.0016 +0.0016 359 359 359 359 Quantiles/nulls/67108864/4/4/manual_time -0.0309 -0.0309 2066 2002 2065 2002 Quantiles/nulls/65536/8/4/manual_time -0.0733 -0.0724 2 2 2 2 Quantiles/nulls/262144/8/4/manual_time -0.2988 -0.3072 3 2 3 2 Quantiles/nulls/1048576/8/4/manual_time -0.0028 -0.0017 11 11 11 11 Quantiles/nulls/4194304/8/4/manual_time -0.0454 -0.0457 67 64 67 64 Quantiles/nulls/16777216/8/4/manual_time -0.1597 -0.1599 427 359 427 359 Quantiles/nulls/67108864/8/4/manual_time +0.0006 +0.0006 2004 2005 2004 2005 Quantiles/nulls/65536/1/12/manual_time -0.0331 -0.0346 0 0 0 0 Quantiles/nulls/262144/1/12/manual_time -0.0084 -0.0077 0 0 0 0 Quantiles/nulls/1048576/1/12/manual_time +0.0019 +0.0019 2 2 2 2 Quantiles/nulls/4194304/1/12/manual_time -0.0053 -0.0052 13 13 13 13 Quantiles/nulls/16777216/1/12/manual_time +0.0050 +0.0051 53 53 53 53 Quantiles/nulls/67108864/1/12/manual_time +0.0007 +0.0008 217 217 217 217 Quantiles/nulls/65536/4/12/manual_time -0.0030 -0.0034 1 1 1 1 Quantiles/nulls/262144/4/12/manual_time +0.0099 -0.0138 2 2 2 2 Quantiles/nulls/1048576/4/12/manual_time +0.0127 +0.0084 10 10 10 10 Quantiles/nulls/4194304/4/12/manual_time +0.0011 -0.0013 63 63 63 63 Quantiles/nulls/16777216/4/12/manual_time +0.0013 +0.0013 359 359 359 359 Quantiles/nulls/67108864/4/12/manual_time +0.0001 +0.0000 2001 2001 2001 2001 Quantiles/nulls/65536/8/12/manual_time -0.0004 -0.0001 2 2 2 2 Quantiles/nulls/262144/8/12/manual_time +0.0038 +0.0040 2 2 2 2 Quantiles/nulls/1048576/8/12/manual_time +0.0070 +0.0084 11 11 11 11 Quantiles/nulls/4194304/8/12/manual_time -0.0082 -0.0100 64 64 64 63 Quantiles/nulls/16777216/8/12/manual_time +0.0038 +0.0038 359 360 359 360 Quantiles/nulls/67108864/8/12/manual_time +0.0010 +0.0010 2003 2005 2003 2005 ``` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) URL: #8076
…8042) Converts `cudf::repeat` to use device_uvector as much as possible, and converts to using indexalator for computing offsets from repeat counts. Also adds a benchmark for `cudf::repeat`. Contributes to #7287 Performance is improved: ``` Comparing /home/mharris/rapids/cudf/cpp/build/repeat_before.json to /home/mharris/rapids/cudf/cpp/build/repeat_after.json Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------------- Repeat/double_nulls/1024/1/manual_time -0.1681 -0.1361 0 0 0 0 Repeat/double_nulls/4096/1/manual_time -0.1663 -0.1329 0 0 0 0 Repeat/double_nulls/32768/1/manual_time -0.1702 -0.1372 0 0 0 0 Repeat/double_nulls/262144/1/manual_time -0.1347 -0.1159 0 0 0 0 Repeat/double_nulls/2097152/1/manual_time -0.1053 -0.1010 0 0 0 0 Repeat/double_nulls/16777216/1/manual_time -0.0931 -0.0926 3 3 3 3 Repeat/double_nulls/67108864/1/manual_time -0.0869 -0.0868 12 11 12 11 Repeat/double_nulls/1024/8/manual_time -0.0539 -0.0462 0 0 0 0 Repeat/double_nulls/4096/8/manual_time -0.0951 -0.0839 0 0 0 0 Repeat/double_nulls/32768/8/manual_time -0.0905 -0.0821 0 0 0 0 Repeat/double_nulls/262144/8/manual_time -0.0499 -0.0453 0 0 0 0 Repeat/double_nulls/2097152/8/manual_time -0.0322 -0.0325 1 1 1 1 Repeat/double_nulls/16777216/8/manual_time -0.0267 -0.0272 10 9 10 9 Repeat/double_nulls/67108864/8/manual_time -0.0276 -0.0267 39 38 39 38 Repeat/double_no_nulls/1024/1/manual_time -0.2183 -0.1590 0 0 0 0 Repeat/double_no_nulls/4096/1/manual_time -0.2308 -0.1711 0 0 0 0 Repeat/double_no_nulls/32768/1/manual_time -0.2374 -0.1820 0 0 0 0 Repeat/double_no_nulls/262144/1/manual_time -0.1886 -0.1571 0 0 0 0 Repeat/double_no_nulls/2097152/1/manual_time -0.1266 -0.1200 0 0 0 0 Repeat/double_no_nulls/16777216/1/manual_time -0.1045 -0.1039 3 2 3 2 Repeat/double_no_nulls/67108864/1/manual_time -0.0945 -0.0944 11 10 11 10 Repeat/double_no_nulls/1024/8/manual_time -0.0595 -0.0490 0 0 0 0 Repeat/double_no_nulls/4096/8/manual_time -0.1201 -0.1028 0 0 0 0 Repeat/double_no_nulls/32768/8/manual_time -0.1085 -0.0968 0 0 0 0 Repeat/double_no_nulls/262144/8/manual_time -0.0670 -0.0635 0 0 0 0 Repeat/double_no_nulls/2097152/8/manual_time -0.0405 -0.0400 1 1 1 1 Repeat/double_no_nulls/16777216/8/manual_time -0.0352 -0.0354 8 8 8 8 Repeat/double_no_nulls/67108864/8/manual_time -0.0295 -0.0295 32 31 32 31 ``` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Nghia Truong (https://github.com/ttnghia) - Robert Maynard (https://github.com/robertmaynard) - Christopher Harris (https://github.com/cwharris) URL: #8042
…e_uvector (#8074) Converts internals of cudf::partition/hash_partition, hashing, and nested_loop_join to use device_uvector instead of device_vector. Changes `cudf::hash` API to take a `host_span` instead of a `std::vector`, hence marked as a breaking change. Contributes to #7287 Performance is improved in hash_partition for smaller sizes, and is similar for larger sizes. ``` (rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/hashing_before.json ~/cudf/cpp/build/hashing_after.json Comparing /home/mharris/rapids/cudf/cpp/build/hashing_before.json to /home/mharris/rapids/cudf/cpp/build/hashing_after.json Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------------------- Hashing/hash_partition/131072/1/64/manual_time -0.2817 -0.2422 0 0 0 0 Hashing/hash_partition/262144/1/64/manual_time -0.2561 -0.2208 0 0 0 0 Hashing/hash_partition/524288/1/64/manual_time -0.2269 -0.1992 0 0 0 0 Hashing/hash_partition/1048576/1/64/manual_time -0.1864 -0.1683 0 0 0 0 Hashing/hash_partition/2097152/1/64/manual_time -0.1394 -0.1323 0 0 0 0 Hashing/hash_partition/131072/1/128/manual_time -0.2732 -0.2340 0 0 0 0 Hashing/hash_partition/262144/1/128/manual_time -0.2561 -0.2218 0 0 0 0 Hashing/hash_partition/524288/1/128/manual_time -0.2223 -0.1946 0 0 0 0 Hashing/hash_partition/1048576/1/128/manual_time -0.1847 -0.1664 0 0 0 0 Hashing/hash_partition/2097152/1/128/manual_time -0.1345 -0.1260 0 0 0 0 Hashing/hash_partition/131072/1/256/manual_time -0.2626 -0.2280 0 0 0 0 Hashing/hash_partition/262144/1/256/manual_time -0.2545 -0.2206 0 0 0 0 Hashing/hash_partition/524288/1/256/manual_time -0.2178 -0.1918 0 0 0 0 Hashing/hash_partition/1048576/1/256/manual_time -0.1739 -0.1577 0 0 0 0 Hashing/hash_partition/2097152/1/256/manual_time -0.1389 -0.1316 0 0 0 0 Hashing/hash_partition/131072/1/512/manual_time -0.2492 -0.2171 0 0 0 0 Hashing/hash_partition/262144/1/512/manual_time -0.2383 -0.2060 0 0 0 0 Hashing/hash_partition/524288/1/512/manual_time -0.1971 -0.1758 0 0 0 0 Hashing/hash_partition/1048576/1/512/manual_time -0.1679 -0.1529 0 0 0 0 Hashing/hash_partition/2097152/1/512/manual_time -0.1426 -0.1394 0 0 0 0 Hashing/hash_partition/131072/1/1024/manual_time -0.2246 -0.1952 0 0 0 0 Hashing/hash_partition/262144/1/1024/manual_time -0.2153 -0.1880 0 0 0 0 Hashing/hash_partition/524288/1/1024/manual_time -0.1851 -0.1664 0 0 0 0 Hashing/hash_partition/1048576/1/1024/manual_time -0.1456 -0.1350 0 0 0 0 Hashing/hash_partition/2097152/1/1024/manual_time -0.1190 -0.1141 0 0 0 0 Hashing/hash_partition/131072/16/64/manual_time -0.0639 -0.0638 0 0 0 0 Hashing/hash_partition/262144/16/64/manual_time -0.0462 -0.0450 0 0 0 0 Hashing/hash_partition/524288/16/64/manual_time -0.0354 -0.0368 1 1 1 1 Hashing/hash_partition/1048576/16/64/manual_time -0.0152 -0.0182 1 1 1 1 Hashing/hash_partition/2097152/16/64/manual_time -0.0142 -0.0151 2 2 2 2 Hashing/hash_partition/131072/16/128/manual_time -0.0618 -0.0604 0 0 0 0 Hashing/hash_partition/262144/16/128/manual_time -0.0472 -0.0463 0 0 0 0 Hashing/hash_partition/524288/16/128/manual_time -0.0383 -0.0383 1 1 1 1 Hashing/hash_partition/1048576/16/128/manual_time -0.0172 -0.0173 1 1 1 1 Hashing/hash_partition/2097152/16/128/manual_time -0.0034 -0.0037 2 2 2 2 Hashing/hash_partition/131072/16/256/manual_time -0.0542 -0.0539 0 0 0 0 Hashing/hash_partition/262144/16/256/manual_time -0.0472 -0.0463 0 0 0 0 Hashing/hash_partition/524288/16/256/manual_time -0.0382 -0.0391 1 1 1 1 Hashing/hash_partition/1048576/16/256/manual_time -0.0304 -0.0306 1 1 1 1 Hashing/hash_partition/2097152/16/256/manual_time -0.0178 -0.0187 2 2 2 2 Hashing/hash_partition/131072/16/512/manual_time +0.0108 +0.0198 0 0 0 0 Hashing/hash_partition/262144/16/512/manual_time -0.0391 -0.0391 1 0 1 1 Hashing/hash_partition/524288/16/512/manual_time -0.0310 -0.0311 1 1 1 1 Hashing/hash_partition/1048576/16/512/manual_time -0.0273 -0.0372 1 1 1 1 Hashing/hash_partition/2097152/16/512/manual_time -0.0043 -0.0050 2 2 2 2 Hashing/hash_partition/131072/16/1024/manual_time -0.0426 -0.0435 1 1 1 1 Hashing/hash_partition/262144/16/1024/manual_time -0.0318 -0.0319 1 1 1 1 Hashing/hash_partition/524288/16/1024/manual_time -0.0267 -0.0273 1 1 1 1 Hashing/hash_partition/1048576/16/1024/manual_time -0.0169 -0.0179 2 2 2 2 Hashing/hash_partition/2097152/16/1024/manual_time -0.0120 -0.0127 3 3 3 3 Hashing/hash_partition/131072/256/64/manual_time +0.0106 +0.0098 4 4 4 4 Hashing/hash_partition/262144/256/64/manual_time +0.0084 +0.0073 6 6 6 6 Hashing/hash_partition/524288/256/64/manual_time +0.0054 +0.0055 9 9 9 9 Hashing/hash_partition/1048576/256/64/manual_time +0.0160 +0.0154 17 17 17 17 Hashing/hash_partition/2097152/256/64/manual_time -0.0066 -0.0067 33 32 33 32 Hashing/hash_partition/131072/256/128/manual_time +0.0121 +0.0118 4 4 4 4 Hashing/hash_partition/262144/256/128/manual_time +0.0071 +0.0068 6 6 6 6 Hashing/hash_partition/524288/256/128/manual_time +0.0042 +0.0041 9 9 9 9 Hashing/hash_partition/1048576/256/128/manual_time +0.0039 +0.0039 17 17 17 17 Hashing/hash_partition/2097152/256/128/manual_time +0.0074 +0.0073 33 33 33 33 Hashing/hash_partition/131072/256/256/manual_time +0.0276 +0.0264 5 5 5 5 Hashing/hash_partition/262144/256/256/manual_time +0.0072 +0.0068 6 6 6 6 Hashing/hash_partition/524288/256/256/manual_time +0.0022 +0.0020 9 9 9 9 Hashing/hash_partition/1048576/256/256/manual_time -0.0035 -0.0036 17 17 17 17 Hashing/hash_partition/2097152/256/256/manual_time -0.0042 -0.0042 34 34 34 34 Hashing/hash_partition/131072/256/512/manual_time -0.0526 -0.0588 6 6 6 6 Hashing/hash_partition/262144/256/512/manual_time +0.0065 +0.0059 7 7 7 7 Hashing/hash_partition/524288/256/512/manual_time +0.0058 +0.0054 11 11 11 11 Hashing/hash_partition/1048576/256/512/manual_time -0.0030 -0.0030 20 20 20 20 Hashing/hash_partition/2097152/256/512/manual_time +0.0042 +0.0041 37 37 37 37 Hashing/hash_partition/131072/256/1024/manual_time +0.0075 +0.0066 7 8 7 8 Hashing/hash_partition/262144/256/1024/manual_time +0.0033 +0.0025 9 9 9 9 Hashing/hash_partition/524288/256/1024/manual_time +0.0025 +0.0019 14 14 14 14 Hashing/hash_partition/2097152/256/1024/manual_time +0.0046 +0.0045 46 46 46 46 ``` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: #8074
Converts `unordered_multiset` to use device_uvector instead of device_vector. Also adds a cudf::contains benchmark to SEARCH_BENCHMARK. Contributes to #7287 Performance of `cudf::contains`, the only user of this class, is significantly improved: ``` (rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/contains_before.json ~/cudf/cpp/build/contains_after.json Comparing /home/mharris/rapids/cudf/cpp/build/contains_before.json to /home/mharris/rapids/cudf/cpp/build/contains_after.json Benchmark Time CPU Time Old Time New CPU Old CPU New -------------------------------------------------------------------------------------------------------------------------------------------------- Search/ColumnContains_AllValid/1024/manual_time -0.2608 -0.2074 0 0 0 0 Search/ColumnContains_AllValid/4096/manual_time -0.9039 -0.8854 1 0 1 0 Search/ColumnContains_AllValid/32768/manual_time -0.3135 -0.2648 0 0 0 0 Search/ColumnContains_AllValid/262144/manual_time -0.7520 -0.7421 0 0 0 0 Search/ColumnContains_AllValid/2097152/manual_time -0.2323 -0.2516 4 3 4 3 Search/ColumnContains_AllValid/16777216/manual_time -0.1821 -0.1856 40 32 40 32 Search/ColumnContains_AllValid/67108864/manual_time -0.1368 -0.1377 80 69 81 69 Search/ColumnContains_Nulls/1024/manual_time -0.2451 -0.1925 0 0 0 0 Search/ColumnContains_Nulls/4096/manual_time -0.2166 -0.1702 0 0 0 0 Search/ColumnContains_Nulls/32768/manual_time -0.1798 -0.1450 0 0 0 0 Search/ColumnContains_Nulls/262144/manual_time -0.1208 -0.1009 0 0 0 0 Search/ColumnContains_Nulls/2097152/manual_time -0.2312 -0.2696 3 2 3 2 Search/ColumnContains_Nulls/16777216/manual_time -0.2898 -0.2896 25 17 25 17 Search/ColumnContains_Nulls/67108864/manual_time -0.0884 -0.0891 68 62 68 62 ``` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Nghia Truong (https://github.com/ttnghia) - Devavret Makkar (https://github.com/devavret) URL: #8091
Converts grouped_rolling to use device_uvector instead of device_vector. Contributes to #7287 I have not yet added a benchmark for grouped_rolling because I don't understand it well enough to do so. Authors: - Mark Harris (https://github.com/harrism) Approvers: - David Wendt (https://github.com/davidwendt) - Robert Maynard (https://github.com/robertmaynard) - Dillon Cullinan (https://github.com/dillon-cullinan) URL: #8106
Converts remaining uses of device_vector in groupby to device_uvector. Contributes to #7287. Performance on what groupby benchmarks there are is not affected much. ``` (rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/groupby_before.json ~/cudf/cpp/build/groupby_after.json Comparing /home/mharris/rapids/cudf/cpp/build/groupby_before.json to /home/mharris/rapids/cudf/cpp/build/groupby_after.json Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------- Groupby/Basic/10000/manual_time -0.0528 -0.0452 0 0 0 0 Groupby/Basic/10000000/manual_time +0.0089 +0.0088 3 3 3 3 Groupby/PreSorted/10000000/manual_time -0.0004 -0.0004 8 8 8 8 Groupby/PreSortedNth/1000000/manual_time -0.0045 -0.0044 0 0 0 0 Groupby/PreSortedNth/10000000/manual_time +0.0007 +0.0008 0 0 0 0 Groupby/PreSortedNth/100000000/manual_time -0.0023 -0.0023 4 4 4 4 Groupby/Shift/1000000/manual_time +0.0024 +0.0028 0 0 0 0 Groupby/Shift/10000000/manual_time -0.0048 -0.0058 3 3 3 3 Groupby/Shift/100000000/manual_time +0.0007 +0.0007 37 37 37 37 ``` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) URL: #8148
Issue #7287 Replaces `device_vector` with `device_uvector`. Additional changes were needed to provide the stream parameter at construction time. Reduced the mutable state of the JSON reader. Other changes: move trie implementation to correct location and fixed naming and namespace. Because of changes to the trie, CSV and JSON are potentially impacted. Measured impact: - JSON - no impact - CSV - 5-10% faster - Parquet - no impact Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Mark Harris (https://github.com/harrism) - MithunR (https://github.com/mythrocks) - https://github.com/nvdbaranec URL: #8151
Converts all remaining tests to use device_uvector instead of device_vector. Contributes to #7287 Also converts a lot of `std::vector` in tests to `thrust::host_vector` to avoid problems with `vector<bool>`. Adds a new utility `cudf::detail::make_host_vector_async` (and sync version) which creates a `thrust::host_vector<T>` from a `device_span`. Also makes it possible to create a `host_span` from a `std::string`. Authors: - Mark Harris (https://github.com/harrism) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) URL: #8205
Converts all remaining benchmark code to use device_uvector instead of device_vector. Contributes to #7287 Authors: - Mark Harris (https://github.com/harrism) Approvers: - David Wendt (https://github.com/davidwendt) - Jake Hemstad (https://github.com/jrhemstad) URL: #8208
The only remaining use of Closing. |
Is your feature request related to a problem? Please describe.
rmm::device_vector
is an alias forthrust::device_vector
that uses the default memory resource for allocation. Being an alias, we are limited to the function signatures of whateverthrust::device_vector
provides. The problem is thatthrust::device_vector
does not take stream arguments for operations that involve allocations/copies/writes. This leads to problems like #2631.Describe the solution you'd like
Ideally we'd have a
rmm::device_vector
that takes stream arguments for every stream ordered operation. However, that would be a significant effort to design and implement such a class.We may be able to just get away with using
rmm::device_uvector
instead, as it is its own type with explicit stream arguments for all stream ordered operations. My suspicion is that we can replace (almost) all uses ofrmm::device_vector
withrmm::device_uvector
in libcudf (perhaps with some additional convenience stuff added tormm::device_uvector
).This issue will track the work to do the conversion of
device_vector
todevice_uvector
wherever possible. Here is a current list of modules that containdevice_vector
today (30 March 2021, branch-0.19):Describe alternatives you've considered
The alternative is designing and implementing a full-blown,
rmm::device_vector
class, but that's a lot more work.The text was updated successfully, but these errors were encountered: