Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement in cudf::strings::join_strings for long strings #13283

Merged
merged 15 commits into from
May 15, 2023

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented May 3, 2023

Description

Improves performance for longer strings with cudf::strings::join_strings() API.

The new implementation is inspired by recent changes in json/writer_impl.cu which uses the make_strings_children factor that accepts an iterator of string pointers to gather the results into a single column of data but does lose some performance for smallish-strings (less than 32 bytes). So the original code is refactored to work on smaller strings.
Also, a benchmark implementation is added in this PR to test against various sized input columns.
For longer strings, the performance improvement is up to 100x for very long strings (>512 bytes).

Reference #13048

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 3, 2023
@davidwendt davidwendt self-assigned this May 3, 2023
@github-actions github-actions bot added the CMake CMake build issue label May 3, 2023
@davidwendt
Copy link
Contributor Author

Some performance results from nvbench

|  row_width  |  num_rows  |   Ref Time |   Cmp Time |           Diff |   %Diff |
|-------------|------------|------------|------------|----------------|---------|
|     32      |    4096    |  69.168 us |  73.764 us |       4.596 us |   6.64% |
|     64      |    4096    |  75.721 us |  80.595 us |       4.874 us |   6.44% |
|     128     |    4096    | 100.534 us |  97.495 us |      -3.040 us |  -3.02% |
|     256     |    4096    | 168.141 us | 134.340 us |     -33.801 us | -20.10% |
|     512     |    4096    | 305.502 us |  73.091 us |    -232.411 us | -76.08% |
|    1024     |    4096    | 604.477 us |  76.163 us |    -528.314 us | -87.40% |
|     32      |   32768    |  72.283 us |  78.273 us |       5.989 us |   8.29% |
|     64      |   32768    |  94.454 us |  99.622 us |       5.169 us |   5.47% |
|     128     |   32768    | 144.520 us | 158.625 us |      14.105 us |   9.76% |
|     256     |   32768    | 263.790 us | 276.760 us |      12.970 us |   4.92% |
|     512     |   32768    | 479.498 us | 111.470 us |    -368.028 us | -76.75% |
|    1024     |   32768    | 918.149 us | 131.217 us |    -786.932 us | -85.71% |
|     32      |   262144   | 136.116 us | 143.007 us |       6.891 us |   5.06% |
|     64      |   262144   | 291.781 us | 299.467 us |       7.686 us |   2.63% |
|     128     |   262144   |   1.312 ms | 658.147 us |    -654.296 us | -49.85% |
|     256     |   262144   |   8.639 ms |   4.060 ms |   -4578.518 us | -53.00% |
|     512     |   262144   |  24.817 ms | 380.971 us |  -24435.926 us | -98.46% |
|    1024     |   262144   |  56.648 ms | 554.740 us |  -56093.582 us | -99.02% |
|     32      |  2097152   | 647.612 us | 676.242 us |      28.630 us |   4.42% |
|     64      |  2097152   |   1.810 ms |   1.843 ms |      32.039 us |   1.77% |
|     128     |  2097152   |  12.738 ms |   4.410 ms |   -8327.769 us | -65.38% |
|     256     |  2097152   |  75.984 ms |  26.904 ms |  -49080.225 us | -64.59% |
|     512     |  2097152   | 216.893 ms |   2.296 ms | -214597.167 us | -98.94% |
|     32      |  16777216  |   4.691 ms |   4.867 ms |     175.819 us |   3.75% |
|     64      |  16777216  |  13.816 ms |  13.999 ms |     183.007 us |   1.32% |

Smaller strings seem slightly slower but appears to be mostly noise since the code there did not change much. Also, all the differences are in microseconds.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels May 10, 2023
@davidwendt davidwendt marked this pull request as ready for review May 10, 2023 21:19
@davidwendt davidwendt requested a review from a team as a code owner May 10, 2023 21:19
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non-blocking nits otherwise LGTM

cpp/benchmarks/string/join_strings.cpp Outdated Show resolved Hide resolved
cpp/benchmarks/string/join_strings.cpp Outdated Show resolved Hide resolved
cpp/src/strings/combine/join.cu Outdated Show resolved Hide resolved
@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 4c456cb into rapidsai:branch-23.06 May 15, 2023
54 checks passed
@davidwendt davidwendt deleted the join-strings-perf branch May 15, 2023 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants