Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Replace use of rmm::device_vector with rmm::device_uvector where possible #7287

Closed
26 tasks done
jrhemstad opened this issue Feb 3, 2021 · 7 comments
Closed
26 tasks done
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Feb 3, 2021

Is your feature request related to a problem? Please describe.

rmm::device_vector is an alias for thrust::device_vector that uses the default memory resource for allocation. Being an alias, we are limited to the function signatures of whatever thrust::device_vector provides. The problem is that thrust::device_vector does not take stream arguments for operations that involve allocations/copies/writes. This leads to problems like #2631.

Describe the solution you'd like

Ideally we'd have a rmm::device_vector that takes stream arguments for every stream ordered operation. However, that would be a significant effort to design and implement such a class.

We may be able to just get away with using rmm::device_uvector instead, as it is its own type with explicit stream arguments for all stream ordered operations. My suspicion is that we can replace (almost) all uses of rmm::device_vector with rmm::device_uvector in libcudf (perhaps with some additional convenience stuff added to rmm::device_uvector).

This issue will track the work to do the conversion of device_vector to device_uvector wherever possible. Here is a current list of modules that contain device_vector today (30 March 2021, branch-0.19):

Describe alternatives you've considered
The alternative is designing and implementing a full-blown, rmm::device_vector class, but that's a lot more work.

@jrhemstad jrhemstad added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. labels Feb 3, 2021
@jrhemstad jrhemstad added this to Issue-Needs prioritizing in v0.19 Release via automation Feb 3, 2021
@harrism harrism moved this from Issue-Needs prioritizing to Issue-P0 in v0.19 Release Feb 3, 2021
@vuule
Copy link
Contributor

vuule commented Feb 25, 2021

Volunteered to replace the device_vector in column_buffer. Will do this after #7397 is merged.

@davidwendt
Copy link
Contributor

Volunteering to replace the device_vector in the cpp/src/strings/ and cpp/src/text/ source files (total about 20 files)

rapids-bot bot pushed a commit that referenced this issue Mar 4, 2021
This PR refactors strings column factories to eliminate the use of `device_vector` and `std::vector` parameters, and to facility more use of `device_uvector` in calls to the factories. This is a small part of #7287 . Multiple versions of `make_strings_columns` take `device_vector` parameters. This PR expands the use of iterator and `device_span` versions to enable switching to `device_uvector` as described in #7287.  It also adds new `make_device_uvector_async/sync` utility functions.

This will help facilitate safe CUDA stream usage.

Authors:
  - Mark Harris (@harrism)

Approvers:
  - AJ Schmidt (@ajschmidt8)
  - Jake Hemstad (@jrhemstad)
  - David (@davidwendt)
  - Christopher Harris (@cwharris)

URL: #7397
rapids-bot bot pushed a commit that referenced this issue Mar 11, 2021
Reference #7287 
This PR changes `cpp/src/text/*` source files to use `rmm::device_uvector` instead of `rmm:device_vector`. This allows keeping the memory operations on the provided kernel stream.

Authors:
  - David (@davidwendt)

Approvers:
  - Paul Taylor (@trxcllnt)
  - Mark Harris (@harrism)

URL: #7512
rapids-bot bot pushed a commit that referenced this issue Mar 11, 2021
This started to be a change converting some `device_vector` usages in `cpp/src/strings` source files to use `device_uvector` instead. The `cpp/src/strings/copying/copying.cu` source has the implementation for `cudf::strings::detail::copy_slice()` and used a `device_vector` to handle a `step` parameter. I can not longer find this parameter being used. I believe it was a hold over from porting nvstrings. So this PR mainly includes changes for removing this unneeded parameter which also removes the need for the `device_vector` or temporary memory in this function.
And, it also includes changes to `attributes.cu` to use the `device_uvector` as well.

~~I'm marking this as non-breaking change since it is a change to a `detail` API and did not seem to be used anywhere in this repo.~~

Reference #7287

Authors:
  - David (@davidwendt)

Approvers:
  - AJ Schmidt (@ajschmidt8)
  - Mike Wilson (@hyperbolic2346)
  - Jake Hemstad (@jrhemstad)

URL: #7525
@harrism
Copy link
Member

harrism commented Mar 16, 2021

Working on replacing device_vector and std::vector with uvector and spans in concatenate.

@karthikeyann
Copy link
Contributor

Volunteering to replace the device_vector in groupby. Will do this after #7387 is merged.
Sort based groupby is already done in #7523

rapids-bot bot pushed a commit that referenced this issue Mar 20, 2021
…/writer and cudf::io::column_buffer (#7614)

Issue #7287

Replaces `device_vector` with `device_uvector` and `device_span`. Because `device_uvector` does not have a default constructor, some additional changes were required for `device_uvector` data members.

Performance impact: this change makes a measurable difference in reader benchmarks. Most string column cases are sped up around **5%**, with other cases having a measurable, but less consistent improvement.

Authors:
  - Vukasin Milovanovic (@vuule)

Approvers:
  - Ram (Ramakrishna Prabhu) (@rgsl888prabhu)
  - Kumar Aatish (@kaatish)

URL: #7614
rapids-bot bot pushed a commit that referenced this issue Mar 25, 2021
Contributes to #7287

This PR replaces `std::vector` with `host_span` in public and detail `cudf::contatenate` functions, and replaces `rmm::device_vector` with `rmm::device_uvector` in the concatenate implementations.

It also strengthens the SFINAE restrictions on `cudf::host_span` and `cudf::device_span` so that they cannot be constructed from containers unless the container's value_type is the same as the span's value_type.

This PR also
 - [x] Updates cython.
 - [x] benchmarks before and after

Authors:
  - Mark Harris (@harrism)

Approvers:
  - Jake Hemstad (@jrhemstad)
  - Vukasin Milovanovic (@vuule)
  - Ashwin Srinath (@shwina)

URL: #7621
hyperbolic2346 pushed a commit to hyperbolic2346/cudf that referenced this issue Mar 25, 2021
…ai#7512)

Reference rapidsai#7287 
This PR changes `cpp/src/text/*` source files to use `rmm::device_uvector` instead of `rmm:device_vector`. This allows keeping the memory operations on the provided kernel stream.

Authors:
  - David (@davidwendt)

Approvers:
  - Paul Taylor (@trxcllnt)
  - Mark Harris (@harrism)

URL: rapidsai#7512
hyperbolic2346 pushed a commit to hyperbolic2346/cudf that referenced this issue Mar 25, 2021
…dsai#7525)

This started to be a change converting some `device_vector` usages in `cpp/src/strings` source files to use `device_uvector` instead. The `cpp/src/strings/copying/copying.cu` source has the implementation for `cudf::strings::detail::copy_slice()` and used a `device_vector` to handle a `step` parameter. I can not longer find this parameter being used. I believe it was a hold over from porting nvstrings. So this PR mainly includes changes for removing this unneeded parameter which also removes the need for the `device_vector` or temporary memory in this function.
And, it also includes changes to `attributes.cu` to use the `device_uvector` as well.

~~I'm marking this as non-breaking change since it is a change to a `detail` API and did not seem to be used anywhere in this repo.~~

Reference rapidsai#7287

Authors:
  - David (@davidwendt)

Approvers:
  - AJ Schmidt (@ajschmidt8)
  - Mike Wilson (@hyperbolic2346)
  - Jake Hemstad (@jrhemstad)

URL: rapidsai#7525
rapids-bot bot pushed a commit that referenced this issue Mar 26, 2021
addresses part of #7287
- replaced `std::vector const&` with `host_span<const>`
- replaced `rmm::device_[u]vector const&` with `device_span<const>`
- replaced `rmm::device_vector` with `rmm::device_uvector`

Authors:
  - Karthikeyan (@karthikeyann)

Approvers:
  - Jake Hemstad (@jrhemstad)
  - Nghia Truong (@ttnghia)
  - Mark Harris (@harrism)

URL: #7698
@harrism
Copy link
Member

harrism commented Mar 30, 2021

Added a checklist. I'm working on gather.

@harrism
Copy link
Member

harrism commented Apr 1, 2021

@vuule did you check off ORC? Can you link the PR that completes it?

rapids-bot bot pushed a commit that referenced this issue Apr 17, 2021
Removes `device_vector` in favour of either `device_uvector` or `device_buffer` as appropriate in parquet reader and writer.

Contributes to #7287 
Depends on #7758

Authors:
  - Devavret Makkar (https://github.com/devavret)
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Vukasin Milovanovic (https://github.com/vuule)
  - MithunR (https://github.com/mythrocks)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #7853
@kkraus14 kkraus14 removed this from Issue-P0 in v0.19 Release Apr 22, 2021
@kkraus14 kkraus14 added this to Issue-Needs prioritizing in v21.06 Release via automation Apr 22, 2021
rapids-bot bot pushed a commit that referenced this issue Apr 22, 2021
…/writer (#7805)

Issue #7287

Replaces `device_vector` with `device_uvector` and `device_span`. Removed the `device_vector` data members.

Performance impact:

- Writer: None
- Reader: ~up to 10% slower, will look into this.~ None

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Devavret Makkar (https://github.com/devavret)
  - Kumar Aatish (https://github.com/kaatish)
  - Mark Harris (https://github.com/harrism)

URL: #7805
rapids-bot bot pushed a commit that referenced this issue Apr 26, 2021
Converts `cudf::rank` to use `device_uvector` instead of `device_vector`. 
Also adds a benchmark for `cudf::rank`.

Contributes to #7287 

Performance is at most 1.7% slower and up to 12% faster.

```
Comparing /home/mharris/rapids/cudf/cpp/build/rank_before.json to /home/mharris/rapids/cudf/cpp/build/rank_after.json
Benchmark                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------
Rank/no_nulls/1024/manual_time                    +0.0170         +0.0140             0             0             0             0
Rank/no_nulls/4096/manual_time                    +0.0092         +0.0076             0             0             0             0
Rank/no_nulls/32768/manual_time                   +0.0034         +0.0042             0             0             0             0
Rank/no_nulls/262144/manual_time                  -0.0542         -0.0574             0             0             0             0
Rank/no_nulls/2097152/manual_time                 -0.0493         -0.0509             1             1             1             1
Rank/no_nulls/16777216/manual_time                -0.1217         -0.1289             7             6             7             6
Rank/no_nulls/67108864/manual_time                +0.0007         +0.0005            24            24            24            24
Rank/nulls/1024/manual_time                       -0.0392         -0.0370             0             0             0             0
Rank/nulls/4096/manual_time                       -0.0173         -0.0160             0             0             0             0
Rank/nulls/32768/manual_time                      +0.0021         +0.0027             0             0             0             0
Rank/nulls/262144/manual_time                     -0.0027         -0.0025             0             0             0             0
Rank/nulls/2097152/manual_time                    +0.0002         -0.0003             7             7             7             7
Rank/nulls/16777216/manual_time                   -0.0409         -0.0334            59            56            58            56
Rank/nulls/67108864/manual_time                   -0.0003         -0.0004           232           232           232           232
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - https://github.com/nvdbaranec
  - Keith Kraus (https://github.com/kkraus14)

URL: #8029
rapids-bot bot pushed a commit that referenced this issue Apr 27, 2021
…8043)

Converts `cudf::merge` to use `device_uvector`.

Contributes to #7287

Performance is improved: 

```
Comparing /home/mharris/rapids/cudf/cpp/build/merge_before.json to /home/mharris/rapids/cudf/cpp/build/merge_after.json
Benchmark                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------
Merge/pow2tables/2/manual_time                  -0.0831         -0.0791             0             0             0             0
Merge/pow2tables/4/manual_time                  -0.0924         -0.0930             1             1             1             1
Merge/pow2tables/8/manual_time                  -0.0813         -0.0808             4             3             4             3
Merge/pow2tables/16/manual_time                 -0.1172         -0.1170             9             8             9             8
Merge/pow2tables/32/manual_time                 -0.0790         -0.0790            19            18            20            18
Merge/pow2tables/64/manual_time                 -0.0757         -0.0757            46            43            46            43
Merge/pow2tables/128/manual_time                -0.0730         -0.0730           111           103           111           103
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)

URL: #8043
rapids-bot bot pushed a commit that referenced this issue Apr 27, 2021
Converts internals of cudf::quantiles to use device_uvector instead of device_vector. Also adds a quantiles benchmark.

Contributes to #7287

Performance is better (sometimes significantly, when the number of quantiles requested is larger).

```
Comparing /home/mharris/rapids/cudf/cpp/build/quantiles_before.json to /home/mharris/rapids/cudf/cpp/build/quantiles_after.json
Benchmark                                                      Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------------------
Quantiles/no_nulls/65536/1/1/manual_time                    -0.0234         -0.0185             0             0             0             0
Quantiles/no_nulls/262144/1/1/manual_time                   -0.0317         -0.0282             0             0             0             0
Quantiles/no_nulls/1048576/1/1/manual_time                  -0.0234         -0.0209             0             0             0             0
Quantiles/no_nulls/4194304/1/1/manual_time                  -0.0080         -0.0080             1             1             1             1
Quantiles/no_nulls/16777216/1/1/manual_time                 +0.0079         +0.0083             3             3             3             3
Quantiles/no_nulls/67108864/1/1/manual_time                 +0.0023         +0.0025            10            10            10            10
Quantiles/no_nulls/65536/4/1/manual_time                    -0.1138         -0.1115             1             1             1             1
Quantiles/no_nulls/262144/4/1/manual_time                   -0.0933         -0.0917             1             1             1             1
Quantiles/no_nulls/1048576/4/1/manual_time                  -0.0067         -0.0083             8             8             8             8
Quantiles/no_nulls/4194304/4/1/manual_time                  +0.0045         +0.0061            54            54            54            54
Quantiles/no_nulls/16777216/4/1/manual_time                 +0.0044         +0.0044           292           294           292           294
Quantiles/no_nulls/67108864/4/1/manual_time                 +0.0011         +0.0010          1506          1508          1506          1507
Quantiles/no_nulls/65536/8/1/manual_time                    -0.1127         -0.1099             1             1             1             1
Quantiles/no_nulls/262144/8/1/manual_time                   -0.0843         -0.0829             1             1             1             1
Quantiles/no_nulls/1048576/8/1/manual_time                  -0.0083         -0.0084             9             9             9             9
Quantiles/no_nulls/4194304/8/1/manual_time                  -0.0015         +0.0012            54            54            54            54
Quantiles/no_nulls/16777216/8/1/manual_time                 -0.0023         -0.0022           293           293           293           293
Quantiles/no_nulls/67108864/8/1/manual_time                 +0.0061         +0.0060          1515          1525          1515          1525
Quantiles/no_nulls/65536/1/4/manual_time                    -0.0231         -0.0183             0             0             0             0
Quantiles/no_nulls/262144/1/4/manual_time                   -0.0231         -0.0196             0             0             0             0
Quantiles/no_nulls/1048576/1/4/manual_time                  -0.0127         -0.0105             0             0             0             0
Quantiles/no_nulls/4194304/1/4/manual_time                  -0.0033         -0.0038             1             1             1             1
Quantiles/no_nulls/16777216/1/4/manual_time                 +0.0026         +0.0027             3             3             3             3
Quantiles/no_nulls/67108864/1/4/manual_time                 -0.0016         -0.0016            10            10            10            10
Quantiles/no_nulls/65536/4/4/manual_time                    -0.1157         -0.1127             1             1             1             1
Quantiles/no_nulls/262144/4/4/manual_time                   -0.0958         -0.0937             1             1             1             1
Quantiles/no_nulls/1048576/4/4/manual_time                  -0.0188         -0.0140             9             8             9             8
Quantiles/no_nulls/4194304/4/4/manual_time                  -0.0156         -0.0143            55            54            55            54
Quantiles/no_nulls/16777216/4/4/manual_time                 -0.0040         -0.0013           293           292           293           292
Quantiles/no_nulls/67108864/4/4/manual_time                 -0.0452         -0.0452          1583          1511          1583          1511
Quantiles/no_nulls/65536/8/4/manual_time                    -0.1081         -0.1061             1             1             1             1
Quantiles/no_nulls/262144/8/4/manual_time                   -0.0810         -0.0797             1             1             1             1
Quantiles/no_nulls/1048576/8/4/manual_time                  -0.0119         -0.0087             9             9             9             9
Quantiles/no_nulls/4194304/8/4/manual_time                  -0.0097         -0.0053            55            54            54            54
Quantiles/no_nulls/16777216/8/4/manual_time                 -0.0013         -0.0013           293           293           293           293
Quantiles/no_nulls/67108864/8/4/manual_time                 -0.0462         -0.0461          1595          1521          1595          1521
Quantiles/no_nulls/65536/1/12/manual_time                   -0.5136         -0.5258             0             0             0             0
Quantiles/no_nulls/262144/1/12/manual_time                  -0.8295         -0.8370             1             0             1             0
Quantiles/no_nulls/1048576/1/12/manual_time                 -0.6905         -0.7111             1             0             1             0
Quantiles/no_nulls/4194304/1/12/manual_time                 -0.5660         -0.6006             2             1             2             1
Quantiles/no_nulls/16777216/1/12/manual_time                -0.0123         -0.0140             3             3             3             3
Quantiles/no_nulls/67108864/1/12/manual_time                -0.2079         -0.2183            13            10            13            10
Quantiles/no_nulls/65536/4/12/manual_time                   -0.1085         -0.1061             1             1             1             1
Quantiles/no_nulls/262144/4/12/manual_time                  -0.0799         -0.0922             1             1             1             1
Quantiles/no_nulls/1048576/4/12/manual_time                 -0.0145         -0.0105             9             8             8             8
Quantiles/no_nulls/4194304/4/12/manual_time                 -0.0120         -0.0119            54            54            54            54
Quantiles/no_nulls/16777216/4/12/manual_time                +0.0019         +0.0020           292           293           292           293
Quantiles/no_nulls/67108864/4/12/manual_time                +0.0042         +0.0043          1506          1513          1506          1513
Quantiles/no_nulls/65536/8/12/manual_time                   -0.1718         -0.1722             1             1             1             1
Quantiles/no_nulls/262144/8/12/manual_time                  -0.2191         -0.2248             2             1             2             1
Quantiles/no_nulls/1048576/8/12/manual_time                 -0.0065         -0.0072             9             9             9             9
Quantiles/no_nulls/4194304/8/12/manual_time                 -0.1978         -0.1996            67            54            67            54
Quantiles/no_nulls/16777216/8/12/manual_time                -0.0019         -0.0019           293           293           293           293
Quantiles/no_nulls/67108864/8/12/manual_time                -0.1957         -0.1956          1892          1522          1891          1521
Quantiles/nulls/65536/1/1/manual_time                       -0.4734         -0.4950             1             0             1             0
Quantiles/nulls/262144/1/1/manual_time                      -0.2358         -0.2536             1             0             1             0
Quantiles/nulls/1048576/1/1/manual_time                     -0.0002         -0.0002             2             2             2             2
Quantiles/nulls/4194304/1/1/manual_time                     -0.2323         -0.2392            17            13            17            13
Quantiles/nulls/16777216/1/1/manual_time                    -0.0085         -0.0084            54            53            54            53
Quantiles/nulls/67108864/1/1/manual_time                    +0.0001         +0.0001           217           217           217           217
Quantiles/nulls/65536/4/1/manual_time                       -0.2901         -0.2992             2             1             2             1
Quantiles/nulls/262144/4/1/manual_time                      -0.4554         -0.4771             3             2             3             2
Quantiles/nulls/1048576/4/1/manual_time                     -0.1957         -0.2122            12            10            12            10
Quantiles/nulls/4194304/4/1/manual_time                     -0.1326         -0.1367            73            63            73            63
Quantiles/nulls/16777216/4/1/manual_time                    -0.1951         -0.1953           447           360           447           359
Quantiles/nulls/67108864/4/1/manual_time                    -0.0034         -0.0034          2015          2009          2015          2008
Quantiles/nulls/65536/8/1/manual_time                       -0.1251         -0.1259             2             2             2             2
Quantiles/nulls/262144/8/1/manual_time                      -0.0505         -0.0499             2             2             2             2
Quantiles/nulls/1048576/8/1/manual_time                     -0.3018         -0.3064            15            11            15            11
Quantiles/nulls/4194304/8/1/manual_time                     -0.0492         -0.0470            67            64            67            64
Quantiles/nulls/16777216/8/1/manual_time                    +0.0008         +0.0008           359           359           359           359
Quantiles/nulls/67108864/8/1/manual_time                    +0.0315         +0.0315          2004          2067          2004          2067
Quantiles/nulls/65536/1/4/manual_time                       -0.0476         -0.0499             0             0             0             0
Quantiles/nulls/262144/1/4/manual_time                      -0.0050         -0.0042             0             0             0             0
Quantiles/nulls/1048576/1/4/manual_time                     +0.0006         +0.0006             2             2             2             2
Quantiles/nulls/4194304/1/4/manual_time                     -0.0061         -0.0060            13            13            13            13
Quantiles/nulls/16777216/1/4/manual_time                    +0.0073         +0.0072            53            53            53            53
Quantiles/nulls/67108864/1/4/manual_time                    +0.0012         +0.0013           217           217           217           217
Quantiles/nulls/65536/4/4/manual_time                       -0.1419         -0.1431             1             1             1             1
Quantiles/nulls/262144/4/4/manual_time                      -0.3466         -0.3736             3             2             3             2
Quantiles/nulls/1048576/4/4/manual_time                     -0.2328         -0.2406            13            10            13            10
Quantiles/nulls/4194304/4/4/manual_time                     -0.0345         -0.0365            65            63            65            63
Quantiles/nulls/16777216/4/4/manual_time                    +0.0016         +0.0016           359           359           359           359
Quantiles/nulls/67108864/4/4/manual_time                    -0.0309         -0.0309          2066          2002          2065          2002
Quantiles/nulls/65536/8/4/manual_time                       -0.0733         -0.0724             2             2             2             2
Quantiles/nulls/262144/8/4/manual_time                      -0.2988         -0.3072             3             2             3             2
Quantiles/nulls/1048576/8/4/manual_time                     -0.0028         -0.0017            11            11            11            11
Quantiles/nulls/4194304/8/4/manual_time                     -0.0454         -0.0457            67            64            67            64
Quantiles/nulls/16777216/8/4/manual_time                    -0.1597         -0.1599           427           359           427           359
Quantiles/nulls/67108864/8/4/manual_time                    +0.0006         +0.0006          2004          2005          2004          2005
Quantiles/nulls/65536/1/12/manual_time                      -0.0331         -0.0346             0             0             0             0
Quantiles/nulls/262144/1/12/manual_time                     -0.0084         -0.0077             0             0             0             0
Quantiles/nulls/1048576/1/12/manual_time                    +0.0019         +0.0019             2             2             2             2
Quantiles/nulls/4194304/1/12/manual_time                    -0.0053         -0.0052            13            13            13            13
Quantiles/nulls/16777216/1/12/manual_time                   +0.0050         +0.0051            53            53            53            53
Quantiles/nulls/67108864/1/12/manual_time                   +0.0007         +0.0008           217           217           217           217
Quantiles/nulls/65536/4/12/manual_time                      -0.0030         -0.0034             1             1             1             1
Quantiles/nulls/262144/4/12/manual_time                     +0.0099         -0.0138             2             2             2             2
Quantiles/nulls/1048576/4/12/manual_time                    +0.0127         +0.0084            10            10            10            10
Quantiles/nulls/4194304/4/12/manual_time                    +0.0011         -0.0013            63            63            63            63
Quantiles/nulls/16777216/4/12/manual_time                   +0.0013         +0.0013           359           359           359           359
Quantiles/nulls/67108864/4/12/manual_time                   +0.0001         +0.0000          2001          2001          2001          2001
Quantiles/nulls/65536/8/12/manual_time                      -0.0004         -0.0001             2             2             2             2
Quantiles/nulls/262144/8/12/manual_time                     +0.0038         +0.0040             2             2             2             2
Quantiles/nulls/1048576/8/12/manual_time                    +0.0070         +0.0084            11            11            11            11
Quantiles/nulls/4194304/8/12/manual_time                    -0.0082         -0.0100            64            64            64            63
Quantiles/nulls/16777216/8/12/manual_time                   +0.0038         +0.0038           359           360           359           360
Quantiles/nulls/67108864/8/12/manual_time                   +0.0010         +0.0010          2003          2005          2003          2005
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Nghia Truong (https://github.com/ttnghia)

URL: #8076
rapids-bot bot pushed a commit that referenced this issue Apr 28, 2021
…8042)

Converts `cudf::repeat` to use device_uvector as much as possible, and converts to using indexalator for computing offsets from repeat counts. 

Also adds a benchmark for `cudf::repeat`.

Contributes to #7287 


Performance is improved:

```
Comparing /home/mharris/rapids/cudf/cpp/build/repeat_before.json to /home/mharris/rapids/cudf/cpp/build/repeat_after.json
Benchmark                                                       Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------
Repeat/double_nulls/1024/1/manual_time                       -0.1681         -0.1361             0             0             0             0
Repeat/double_nulls/4096/1/manual_time                       -0.1663         -0.1329             0             0             0             0
Repeat/double_nulls/32768/1/manual_time                      -0.1702         -0.1372             0             0             0             0
Repeat/double_nulls/262144/1/manual_time                     -0.1347         -0.1159             0             0             0             0
Repeat/double_nulls/2097152/1/manual_time                    -0.1053         -0.1010             0             0             0             0
Repeat/double_nulls/16777216/1/manual_time                   -0.0931         -0.0926             3             3             3             3
Repeat/double_nulls/67108864/1/manual_time                   -0.0869         -0.0868            12            11            12            11
Repeat/double_nulls/1024/8/manual_time                       -0.0539         -0.0462             0             0             0             0
Repeat/double_nulls/4096/8/manual_time                       -0.0951         -0.0839             0             0             0             0
Repeat/double_nulls/32768/8/manual_time                      -0.0905         -0.0821             0             0             0             0
Repeat/double_nulls/262144/8/manual_time                     -0.0499         -0.0453             0             0             0             0
Repeat/double_nulls/2097152/8/manual_time                    -0.0322         -0.0325             1             1             1             1
Repeat/double_nulls/16777216/8/manual_time                   -0.0267         -0.0272            10             9            10             9
Repeat/double_nulls/67108864/8/manual_time                   -0.0276         -0.0267            39            38            39            38
Repeat/double_no_nulls/1024/1/manual_time                    -0.2183         -0.1590             0             0             0             0
Repeat/double_no_nulls/4096/1/manual_time                    -0.2308         -0.1711             0             0             0             0
Repeat/double_no_nulls/32768/1/manual_time                   -0.2374         -0.1820             0             0             0             0
Repeat/double_no_nulls/262144/1/manual_time                  -0.1886         -0.1571             0             0             0             0
Repeat/double_no_nulls/2097152/1/manual_time                 -0.1266         -0.1200             0             0             0             0
Repeat/double_no_nulls/16777216/1/manual_time                -0.1045         -0.1039             3             2             3             2
Repeat/double_no_nulls/67108864/1/manual_time                -0.0945         -0.0944            11            10            11            10
Repeat/double_no_nulls/1024/8/manual_time                    -0.0595         -0.0490             0             0             0             0
Repeat/double_no_nulls/4096/8/manual_time                    -0.1201         -0.1028             0             0             0             0
Repeat/double_no_nulls/32768/8/manual_time                   -0.1085         -0.0968             0             0             0             0
Repeat/double_no_nulls/262144/8/manual_time                  -0.0670         -0.0635             0             0             0             0
Repeat/double_no_nulls/2097152/8/manual_time                 -0.0405         -0.0400             1             1             1             1
Repeat/double_no_nulls/16777216/8/manual_time                -0.0352         -0.0354             8             8             8             8
Repeat/double_no_nulls/67108864/8/manual_time                -0.0295         -0.0295            32            31            32            31
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Robert Maynard (https://github.com/robertmaynard)
  - Christopher Harris (https://github.com/cwharris)

URL: #8042
rapids-bot bot pushed a commit that referenced this issue Apr 29, 2021
…e_uvector (#8074)

Converts internals of cudf::partition/hash_partition, hashing, and nested_loop_join to use device_uvector instead of device_vector.

Changes `cudf::hash` API to take a `host_span` instead of a `std::vector`, hence marked as a breaking change.

Contributes to #7287

Performance is improved in hash_partition for smaller sizes, and is similar for larger sizes.

```
(rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/hashing_before.json ~/cudf/cpp/build/hashing_after.json
Comparing /home/mharris/rapids/cudf/cpp/build/hashing_before.json to /home/mharris/rapids/cudf/cpp/build/hashing_after.json
Benchmark                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------
Hashing/hash_partition/131072/1/64/manual_time                     -0.2817         -0.2422             0             0             0             0
Hashing/hash_partition/262144/1/64/manual_time                     -0.2561         -0.2208             0             0             0             0
Hashing/hash_partition/524288/1/64/manual_time                     -0.2269         -0.1992             0             0             0             0
Hashing/hash_partition/1048576/1/64/manual_time                    -0.1864         -0.1683             0             0             0             0
Hashing/hash_partition/2097152/1/64/manual_time                    -0.1394         -0.1323             0             0             0             0
Hashing/hash_partition/131072/1/128/manual_time                    -0.2732         -0.2340             0             0             0             0
Hashing/hash_partition/262144/1/128/manual_time                    -0.2561         -0.2218             0             0             0             0
Hashing/hash_partition/524288/1/128/manual_time                    -0.2223         -0.1946             0             0             0             0
Hashing/hash_partition/1048576/1/128/manual_time                   -0.1847         -0.1664             0             0             0             0
Hashing/hash_partition/2097152/1/128/manual_time                   -0.1345         -0.1260             0             0             0             0
Hashing/hash_partition/131072/1/256/manual_time                    -0.2626         -0.2280             0             0             0             0
Hashing/hash_partition/262144/1/256/manual_time                    -0.2545         -0.2206             0             0             0             0
Hashing/hash_partition/524288/1/256/manual_time                    -0.2178         -0.1918             0             0             0             0
Hashing/hash_partition/1048576/1/256/manual_time                   -0.1739         -0.1577             0             0             0             0
Hashing/hash_partition/2097152/1/256/manual_time                   -0.1389         -0.1316             0             0             0             0
Hashing/hash_partition/131072/1/512/manual_time                    -0.2492         -0.2171             0             0             0             0
Hashing/hash_partition/262144/1/512/manual_time                    -0.2383         -0.2060             0             0             0             0
Hashing/hash_partition/524288/1/512/manual_time                    -0.1971         -0.1758             0             0             0             0
Hashing/hash_partition/1048576/1/512/manual_time                   -0.1679         -0.1529             0             0             0             0
Hashing/hash_partition/2097152/1/512/manual_time                   -0.1426         -0.1394             0             0             0             0
Hashing/hash_partition/131072/1/1024/manual_time                   -0.2246         -0.1952             0             0             0             0
Hashing/hash_partition/262144/1/1024/manual_time                   -0.2153         -0.1880             0             0             0             0
Hashing/hash_partition/524288/1/1024/manual_time                   -0.1851         -0.1664             0             0             0             0
Hashing/hash_partition/1048576/1/1024/manual_time                  -0.1456         -0.1350             0             0             0             0
Hashing/hash_partition/2097152/1/1024/manual_time                  -0.1190         -0.1141             0             0             0             0
Hashing/hash_partition/131072/16/64/manual_time                    -0.0639         -0.0638             0             0             0             0
Hashing/hash_partition/262144/16/64/manual_time                    -0.0462         -0.0450             0             0             0             0
Hashing/hash_partition/524288/16/64/manual_time                    -0.0354         -0.0368             1             1             1             1
Hashing/hash_partition/1048576/16/64/manual_time                   -0.0152         -0.0182             1             1             1             1
Hashing/hash_partition/2097152/16/64/manual_time                   -0.0142         -0.0151             2             2             2             2
Hashing/hash_partition/131072/16/128/manual_time                   -0.0618         -0.0604             0             0             0             0
Hashing/hash_partition/262144/16/128/manual_time                   -0.0472         -0.0463             0             0             0             0
Hashing/hash_partition/524288/16/128/manual_time                   -0.0383         -0.0383             1             1             1             1
Hashing/hash_partition/1048576/16/128/manual_time                  -0.0172         -0.0173             1             1             1             1
Hashing/hash_partition/2097152/16/128/manual_time                  -0.0034         -0.0037             2             2             2             2
Hashing/hash_partition/131072/16/256/manual_time                   -0.0542         -0.0539             0             0             0             0
Hashing/hash_partition/262144/16/256/manual_time                   -0.0472         -0.0463             0             0             0             0
Hashing/hash_partition/524288/16/256/manual_time                   -0.0382         -0.0391             1             1             1             1
Hashing/hash_partition/1048576/16/256/manual_time                  -0.0304         -0.0306             1             1             1             1
Hashing/hash_partition/2097152/16/256/manual_time                  -0.0178         -0.0187             2             2             2             2
Hashing/hash_partition/131072/16/512/manual_time                   +0.0108         +0.0198             0             0             0             0
Hashing/hash_partition/262144/16/512/manual_time                   -0.0391         -0.0391             1             0             1             1
Hashing/hash_partition/524288/16/512/manual_time                   -0.0310         -0.0311             1             1             1             1
Hashing/hash_partition/1048576/16/512/manual_time                  -0.0273         -0.0372             1             1             1             1
Hashing/hash_partition/2097152/16/512/manual_time                  -0.0043         -0.0050             2             2             2             2
Hashing/hash_partition/131072/16/1024/manual_time                  -0.0426         -0.0435             1             1             1             1
Hashing/hash_partition/262144/16/1024/manual_time                  -0.0318         -0.0319             1             1             1             1
Hashing/hash_partition/524288/16/1024/manual_time                  -0.0267         -0.0273             1             1             1             1
Hashing/hash_partition/1048576/16/1024/manual_time                 -0.0169         -0.0179             2             2             2             2
Hashing/hash_partition/2097152/16/1024/manual_time                 -0.0120         -0.0127             3             3             3             3
Hashing/hash_partition/131072/256/64/manual_time                   +0.0106         +0.0098             4             4             4             4
Hashing/hash_partition/262144/256/64/manual_time                   +0.0084         +0.0073             6             6             6             6
Hashing/hash_partition/524288/256/64/manual_time                   +0.0054         +0.0055             9             9             9             9
Hashing/hash_partition/1048576/256/64/manual_time                  +0.0160         +0.0154            17            17            17            17
Hashing/hash_partition/2097152/256/64/manual_time                  -0.0066         -0.0067            33            32            33            32
Hashing/hash_partition/131072/256/128/manual_time                  +0.0121         +0.0118             4             4             4             4
Hashing/hash_partition/262144/256/128/manual_time                  +0.0071         +0.0068             6             6             6             6
Hashing/hash_partition/524288/256/128/manual_time                  +0.0042         +0.0041             9             9             9             9
Hashing/hash_partition/1048576/256/128/manual_time                 +0.0039         +0.0039            17            17            17            17
Hashing/hash_partition/2097152/256/128/manual_time                 +0.0074         +0.0073            33            33            33            33
Hashing/hash_partition/131072/256/256/manual_time                  +0.0276         +0.0264             5             5             5             5
Hashing/hash_partition/262144/256/256/manual_time                  +0.0072         +0.0068             6             6             6             6
Hashing/hash_partition/524288/256/256/manual_time                  +0.0022         +0.0020             9             9             9             9
Hashing/hash_partition/1048576/256/256/manual_time                 -0.0035         -0.0036            17            17            17            17
Hashing/hash_partition/2097152/256/256/manual_time                 -0.0042         -0.0042            34            34            34            34
Hashing/hash_partition/131072/256/512/manual_time                  -0.0526         -0.0588             6             6             6             6
Hashing/hash_partition/262144/256/512/manual_time                  +0.0065         +0.0059             7             7             7             7
Hashing/hash_partition/524288/256/512/manual_time                  +0.0058         +0.0054            11            11            11            11
Hashing/hash_partition/1048576/256/512/manual_time                 -0.0030         -0.0030            20            20            20            20
Hashing/hash_partition/2097152/256/512/manual_time                 +0.0042         +0.0041            37            37            37            37
Hashing/hash_partition/131072/256/1024/manual_time                 +0.0075         +0.0066             7             8             7             8
Hashing/hash_partition/262144/256/1024/manual_time                 +0.0033         +0.0025             9             9             9             9
Hashing/hash_partition/524288/256/1024/manual_time                 +0.0025         +0.0019            14            14            14            14
Hashing/hash_partition/2097152/256/1024/manual_time                +0.0046         +0.0045            46            46            46            46
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: #8074
rapids-bot bot pushed a commit that referenced this issue Apr 30, 2021
Converts `unordered_multiset` to use device_uvector instead of device_vector. Also adds a cudf::contains benchmark to SEARCH_BENCHMARK.

Contributes to #7287

Performance of `cudf::contains`, the only user of this class, is significantly improved:

```
(rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/contains_before.json ~/cudf/cpp/build/contains_after.json 
Comparing /home/mharris/rapids/cudf/cpp/build/contains_before.json to /home/mharris/rapids/cudf/cpp/build/contains_after.json
Benchmark                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------
Search/ColumnContains_AllValid/1024/manual_time                    -0.2608         -0.2074             0             0             0             0
Search/ColumnContains_AllValid/4096/manual_time                    -0.9039         -0.8854             1             0             1             0
Search/ColumnContains_AllValid/32768/manual_time                   -0.3135         -0.2648             0             0             0             0
Search/ColumnContains_AllValid/262144/manual_time                  -0.7520         -0.7421             0             0             0             0
Search/ColumnContains_AllValid/2097152/manual_time                 -0.2323         -0.2516             4             3             4             3
Search/ColumnContains_AllValid/16777216/manual_time                -0.1821         -0.1856            40            32            40            32
Search/ColumnContains_AllValid/67108864/manual_time                -0.1368         -0.1377            80            69            81            69
Search/ColumnContains_Nulls/1024/manual_time                       -0.2451         -0.1925             0             0             0             0
Search/ColumnContains_Nulls/4096/manual_time                       -0.2166         -0.1702             0             0             0             0
Search/ColumnContains_Nulls/32768/manual_time                      -0.1798         -0.1450             0             0             0             0
Search/ColumnContains_Nulls/262144/manual_time                     -0.1208         -0.1009             0             0             0             0
Search/ColumnContains_Nulls/2097152/manual_time                    -0.2312         -0.2696             3             2             3             2
Search/ColumnContains_Nulls/16777216/manual_time                   -0.2898         -0.2896            25            17            25            17
Search/ColumnContains_Nulls/67108864/manual_time                   -0.0884         -0.0891            68            62            68            62
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Nghia Truong (https://github.com/ttnghia)
  - Devavret Makkar (https://github.com/devavret)

URL: #8091
rapids-bot bot pushed a commit that referenced this issue May 5, 2021
Converts grouped_rolling to use device_uvector instead of device_vector.

Contributes to #7287

I have not yet added a benchmark for grouped_rolling because I don't understand it well enough to do so.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Robert Maynard (https://github.com/robertmaynard)
  - Dillon Cullinan (https://github.com/dillon-cullinan)

URL: #8106
rapids-bot bot pushed a commit that referenced this issue May 5, 2021
Converts remaining uses of device_vector in groupby to device_uvector.

Contributes to #7287.

Performance on what groupby benchmarks there are is not affected much.

```
(rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/groupby_before.json ~/cudf/cpp/build/groupby_after.json 
Comparing /home/mharris/rapids/cudf/cpp/build/groupby_before.json to /home/mharris/rapids/cudf/cpp/build/groupby_after.json
Benchmark                                                    Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------
Groupby/Basic/10000/manual_time                           -0.0528         -0.0452             0             0             0             0
Groupby/Basic/10000000/manual_time                        +0.0089         +0.0088             3             3             3             3
Groupby/PreSorted/10000000/manual_time                    -0.0004         -0.0004             8             8             8             8
Groupby/PreSortedNth/1000000/manual_time                  -0.0045         -0.0044             0             0             0             0
Groupby/PreSortedNth/10000000/manual_time                 +0.0007         +0.0008             0             0             0             0
Groupby/PreSortedNth/100000000/manual_time                -0.0023         -0.0023             4             4             4             4
Groupby/Shift/1000000/manual_time                         +0.0024         +0.0028             0             0             0             0
Groupby/Shift/10000000/manual_time                        -0.0048         -0.0058             3             3             3             3
Groupby/Shift/100000000/manual_time                       +0.0007         +0.0007            37            37            37            37
```

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Nghia Truong (https://github.com/ttnghia)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)

URL: #8148
rapids-bot bot pushed a commit that referenced this issue May 7, 2021
Issue #7287

Replaces `device_vector` with `device_uvector`. Additional changes were needed to provide the stream parameter at construction time. Reduced the mutable state of the JSON reader.

Other changes: move trie implementation to correct location and fixed naming and namespace.

Because of changes to the trie, CSV and JSON are potentially impacted. 
Measured impact: 

- JSON - no impact
- CSV - 5-10% faster
- Parquet - no impact

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - MithunR (https://github.com/mythrocks)
  - https://github.com/nvdbaranec

URL: #8151
rapids-bot bot pushed a commit that referenced this issue May 12, 2021
Converts all remaining tests to use device_uvector instead of device_vector. 

Contributes to #7287

Also converts a lot of `std::vector` in tests to `thrust::host_vector` to avoid problems with `vector<bool>`. 

Adds a new utility `cudf::detail::make_host_vector_async` (and sync version) which creates a `thrust::host_vector<T>` from a `device_span`. Also makes it possible to create a `host_span` from a `std::string`.

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Nghia Truong (https://github.com/ttnghia)

URL: #8205
rapids-bot bot pushed a commit that referenced this issue May 13, 2021
Converts all remaining benchmark code to use device_uvector instead of device_vector.

Contributes to #7287

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #8208
@harrism
Copy link
Member

harrism commented May 13, 2021

The only remaining use of device_vector in libcudf is in span.hpp and its tests.

Closing.

@harrism harrism closed this as completed May 13, 2021
v21.06 Release automation moved this from Issue-Needs prioritizing to Done May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
No open projects
Development

No branches or pull requests

5 participants