Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bitset: Improve performance of count function #309

Merged
merged 1 commit into from
Jun 10, 2022
Merged

Conversation

stotko
Copy link
Owner

@stotko stotko commented Jun 10, 2022

The count() function of the bitset container used two distinct reductions for computing the number of set bits. The first one considers all bits in the first n - 1 block whereas the second one works on each of the remaining bits in the n-th block, i.e. the last block. Since the reduction is memory-bound anyways, the second reduction introduces a large overhead while the first one still has room for additional compute operations. Merge both reductions into a single one that computes a bit mask for each block on the fly before counting the bits per block. This leads to a decent performance improvement for count() and all related functions.

Before:

-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
stdgpu_bitset_count/1000            0.050 ms        0.051 ms        13822
stdgpu_bitset_count/1000000         0.056 ms        0.056 ms        12476
stdgpu_bitset_count/1000000000      0.733 ms        0.733 ms          956
stdgpu_bitset_all/1000              0.050 ms        0.050 ms        13879
stdgpu_bitset_all/1000000           0.056 ms        0.056 ms        12538
stdgpu_bitset_all/1000000000        0.736 ms        0.734 ms          954
stdgpu_bitset_any/1000              0.050 ms        0.050 ms        13841
stdgpu_bitset_any/1000000           0.056 ms        0.056 ms        12165
stdgpu_bitset_any/1000000000        0.734 ms        0.734 ms          954
stdgpu_bitset_none/1000             0.050 ms        0.051 ms        13811
stdgpu_bitset_none/1000000          0.056 ms        0.056 ms        12428
stdgpu_bitset_none/1000000000       0.735 ms        0.734 ms          954

After:

-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
stdgpu_bitset_count/1000            0.026 ms        0.026 ms        26891
stdgpu_bitset_count/1000000         0.031 ms        0.031 ms        22745
stdgpu_bitset_count/1000000000      0.580 ms        0.580 ms         1208
stdgpu_bitset_all/1000              0.026 ms        0.026 ms        26856
stdgpu_bitset_all/1000000           0.031 ms        0.031 ms        22795
stdgpu_bitset_all/1000000000        0.580 ms        0.580 ms         1207
stdgpu_bitset_any/1000              0.026 ms        0.026 ms        26940
stdgpu_bitset_any/1000000           0.031 ms        0.031 ms        22814
stdgpu_bitset_any/1000000000        0.580 ms        0.580 ms         1207
stdgpu_bitset_none/1000             0.026 ms        0.026 ms        26945
stdgpu_bitset_none/1000000          0.032 ms        0.032 ms        22725
stdgpu_bitset_none/1000000000       0.581 ms        0.580 ms         1185

@stotko stotko added this to the 2.0.0 milestone Jun 10, 2022
@codecov
Copy link

codecov bot commented Jun 10, 2022

Codecov Report

Merging #309 (b2f655f) into master (2e13b2e) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #309      +/-   ##
==========================================
- Coverage   97.47%   97.47%   -0.01%     
==========================================
  Files          32       32              
  Lines        2300     2294       -6     
==========================================
- Hits         2242     2236       -6     
  Misses         58       58              
Impacted Files Coverage Δ
src/stdgpu/impl/bitset_detail.cuh 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e13b2e...b2f655f. Read the comment docs.

@stotko stotko merged commit 4994ccf into master Jun 10, 2022
@stotko stotko deleted the bitset_count branch June 10, 2022 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant