Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use warp per string for long strings in cudf::strings::contains() #10739

Merged
merged 5 commits into from
May 2, 2022

Conversation

davidwendt
Copy link
Contributor

Improves the performance on cudf::strings::contains() for long strings. This executes a warp per string to match a target over sections of a single string in parallel. The benchmark showed this to be faster than the current implementation only for longer strings (greater than 64 bytes). It also proved somewhat faster and more consistent than a pure character-parallel approach.

This change may also help improve the performance of the regex contains_re() function in the future.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 26, 2022
@davidwendt davidwendt self-assigned this Apr 26, 2022
@davidwendt davidwendt added this to PR-WIP in v22.06 Release via automation Apr 26, 2022
@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

Merging #10739 (1a63aad) into branch-22.06 (dc1435b) will increase coverage by 0.02%.
The diff coverage is n/a.

❗ Current head 1a63aad differs from pull request most recent head 669eea1. Consider uploading reports for the commit 669eea1 to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06   #10739      +/-   ##
================================================
+ Coverage         86.37%   86.39%   +0.02%     
================================================
  Files               142      142              
  Lines             22306    22306              
================================================
+ Hits              19266    19272       +6     
+ Misses             3040     3034       -6     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/string.py 89.21% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.68% <0.00%> (+0.23%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 92.91% <0.00%> (+1.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc1435b...669eea1. Read the comment docs.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 27, 2022
@davidwendt davidwendt moved this from PR-WIP to PR-Needs review in v22.06 Release Apr 27, 2022
@davidwendt davidwendt marked this pull request as ready for review April 27, 2022 15:28
@davidwendt davidwendt requested a review from a team as a code owner April 27, 2022 15:28
@davidwendt
Copy link
Contributor Author

Improvements to benchmarks ran locally on my dev machine:

rows/width        String    Warp     Speedup
4096/256         0.116 ms  0.050 ms   2.32x
4096/512         0.188 ms  0.056 ms   3.36x
4096/1024        0.348 ms  0.069 ms   5.04x
4096/2048        0.692 ms  0.095 ms   7.28x
4096/4096         1.25 ms  0.146 ms   8.56x
4096/8192         3.13 ms  0.238 ms  13.15x
32768/256        0.121 ms  0.071 ms   1.70x
32768/512        0.203 ms  0.098 ms   2.07x
32768/1024       0.470 ms  0.144 ms   3.26x
32768/2048        1.06 ms  0.240 ms   4.42x
32768/4096        2.00 ms  0.457 ms   4.38x
32768/8192        3.89 ms   1.09 ms   3.57x
262144/256        1.13 ms  0.238 ms   4.75x
262144/512        3.47 ms  0.402 ms   8.63x
262144/1024       8.49 ms  0.727 ms  11.68x
262144/2048       27.1 ms   1.39 ms  19.50x
262144/4096       57.3 ms   3.00 ms  19.10x
2097152/256       4.21 ms   1.61 ms   2.61x
2097152/512       16.1 ms   2.92 ms   5.51x

The rows is the size of the strings column and width is the average number of bytes per string. The String column is the benchmark time for contains() with the current implementation (thread per string). The Warp column is the implementation in this PR with a warp per string. The benchmark times not included here are where the values did not change -- the smaller row/width combinations.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice wins!

v22.06 Release automation moved this from PR-Needs review to PR-Reviewer approved Apr 28, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. The speedups look good and the implementation/tests are clean.

@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6128e0d into rapidsai:branch-22.06 May 2, 2022
v22.06 Release automation moved this from PR-Reviewer approved to Done May 2, 2022
@davidwendt davidwendt deleted the warp-strings-contains branch May 2, 2022 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue strings strings issues (C++ and Python)
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants