Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert unordered_multiset to use device_uvector #8091

Merged
merged 5 commits into from
Apr 30, 2021

Conversation

harrism
Copy link
Member

@harrism harrism commented Apr 28, 2021

Converts unordered_multiset to use device_uvector instead of device_vector. Also adds a cudf::contains benchmark to SEARCH_BENCHMARK.

Contributes to #7287

Performance of cudf::contains, the only user of this class, is significantly improved:

(rapids) rapids@compose:~/cudf/cpp/build/release$ _deps/benchmark-src/tools/compare.py benchmarks ~/cudf/cpp/build/contains_before.json ~/cudf/cpp/build/contains_after.json 
Comparing /home/mharris/rapids/cudf/cpp/build/contains_before.json to /home/mharris/rapids/cudf/cpp/build/contains_after.json
Benchmark                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------
Search/ColumnContains_AllValid/1024/manual_time                    -0.2608         -0.2074             0             0             0             0
Search/ColumnContains_AllValid/4096/manual_time                    -0.9039         -0.8854             1             0             1             0
Search/ColumnContains_AllValid/32768/manual_time                   -0.3135         -0.2648             0             0             0             0
Search/ColumnContains_AllValid/262144/manual_time                  -0.7520         -0.7421             0             0             0             0
Search/ColumnContains_AllValid/2097152/manual_time                 -0.2323         -0.2516             4             3             4             3
Search/ColumnContains_AllValid/16777216/manual_time                -0.1821         -0.1856            40            32            40            32
Search/ColumnContains_AllValid/67108864/manual_time                -0.1368         -0.1377            80            69            81            69
Search/ColumnContains_Nulls/1024/manual_time                       -0.2451         -0.1925             0             0             0             0
Search/ColumnContains_Nulls/4096/manual_time                       -0.2166         -0.1702             0             0             0             0
Search/ColumnContains_Nulls/32768/manual_time                      -0.1798         -0.1450             0             0             0             0
Search/ColumnContains_Nulls/262144/manual_time                     -0.1208         -0.1009             0             0             0             0
Search/ColumnContains_Nulls/2097152/manual_time                    -0.2312         -0.2696             3             2             3             2
Search/ColumnContains_Nulls/16777216/manual_time                   -0.2898         -0.2896            25            17            25            17
Search/ColumnContains_Nulls/67108864/manual_time                   -0.0884         -0.0891            68            62            68            62

@harrism harrism requested a review from a team as a code owner April 28, 2021 04:23
@harrism harrism self-assigned this Apr 28, 2021
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Apr 28, 2021
@harrism harrism added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Apr 28, 2021
@harrism harrism added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. labels Apr 28, 2021
Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else is fine.


BENCHMARK_DEFINE_F(Search, ColumnContains_AllValid)(::benchmark::State& state)
{
BM_column(state, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean BM_contains?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, perhaps I'm not benchmarking what I intended to!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now that I'm benchmarking the right thing, performance is dramatically improved!

@codecov
Copy link

codecov bot commented Apr 28, 2021

Codecov Report

Merging #8091 (dc98d4c) into branch-0.20 (51336df) will decrease coverage by 0.01%.
The diff coverage is 86.24%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #8091      +/-   ##
===============================================
- Coverage        82.88%   82.87%   -0.02%     
===============================================
  Files              103      103              
  Lines            17668    17835     +167     
===============================================
+ Hits             14645    14781     +136     
- Misses            3023     3054      +31     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/io/orc.py 86.89% <ø> (ø)
python/cudf/cudf/utils/cudautils.py 57.75% <25.00%> (ø)
python/cudf/cudf/utils/dtypes.py 81.87% <41.66%> (-1.57%) ⬇️
python/cudf/cudf/core/column/lists.py 86.98% <66.66%> (-0.43%) ⬇️
python/cudf/cudf/core/column/struct.py 94.73% <66.66%> (-1.56%) ⬇️
python/cudf/cudf/core/tools/datetimes.py 80.42% <75.29%> (-4.11%) ⬇️
python/cudf/cudf/core/groupby/groupby.py 91.55% <76.92%> (+0.11%) ⬆️
python/dask_cudf/dask_cudf/backends.py 89.51% <80.00%> (-0.08%) ⬇️
python/cudf/cudf/core/column/decimal.py 90.29% <83.33%> (-2.64%) ⬇️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31af285...dc98d4c. Read the comment docs.

@harrism harrism requested a review from devavret April 28, 2021 23:13
@github-actions github-actions bot added the CMake CMake build issue label Apr 28, 2021
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to hear performance improved. Looks good to me.

@harrism
Copy link
Member Author

harrism commented Apr 30, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b368ebd into rapidsai:branch-0.20 Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants