Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved performance of comparison with SIMD feature flag (2x-3.5x) #305

Merged
merged 3 commits into from
Aug 20, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Aug 20, 2021

This PR adds explicit SIMD implementations to comparison compute.

By operating over groups of 8 and applying the relevant conversions with native packed_simd2 apis, we gain 2-3.5x in performance. There is no difference in performance over the non-simd implementation; it was just written in a more abstract fashion to allow the simd version to be plugged in.

cargo bench --no-default-features --features benchmarks,compute --bench comparison_kernels
cargo bench --no-default-features --features benchmarks,compute,simd --bench comparison_kernels
eq Float32              time:   [20.319 us 20.394 us 20.492 us]                        
                        change: [-70.496% -70.317% -70.165%] (p = 0.00 < 0.05)

eq scalar Float32       time:   [16.452 us 16.512 us 16.585 us]                               
                        change: [-54.292% -53.941% -53.602%] (p = 0.00 < 0.05)

lt Float32              time:   [19.904 us 19.972 us 20.063 us]                        
                        change: [-67.563% -67.329% -67.084%] (p = 0.00 < 0.05)

lt scalar Float32       time:   [16.410 us 16.467 us 16.541 us]                               
                        change: [-54.324% -54.088% -53.814%] (p = 0.00 < 0.05)

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Aug 20, 2021
@jorgecarleitao jorgecarleitao changed the title Added explicit SIMD implementation to comparison compute Added explicit SIMD implementation to comparison compute (2x-3.5x) Aug 20, 2021
@jorgecarleitao
Copy link
Owner Author

There is an extra optimization in using the more natural md representation of the native types, but it requires a bit more work on the MutableBitmap to be safely constructed from u8-u64 chunks

@jorgecarleitao jorgecarleitao changed the title Added explicit SIMD implementation to comparison compute (2x-3.5x) Improved performance of comparison with SIMD feature flag (2x-3.5x) Aug 20, 2021
@codecov
Copy link

codecov bot commented Aug 20, 2021

Codecov Report

Merging #305 (afaff93) into main (bbe607c) will increase coverage by 0.01%.
The diff coverage is 94.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #305      +/-   ##
==========================================
+ Coverage   80.52%   80.53%   +0.01%     
==========================================
  Files         323      324       +1     
  Lines       21135    21151      +16     
==========================================
+ Hits        17019    17034      +15     
- Misses       4116     4117       +1     
Impacted Files Coverage Δ
src/compute/comparison/mod.rs 93.61% <ø> (ø)
src/compute/nullif.rs 0.00% <0.00%> (ø)
src/compute/comparison/simd/mod.rs 95.83% <95.83%> (ø)
src/compute/comparison/primitive.rs 95.09% <100.00%> (-0.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbe607c...afaff93. Read the comment docs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant