Faster `take` with null values (2-3x) #633

jorgecarleitao · 2021-11-25T06:16:22Z

This was achieved by splitting the "take" of values and validity as two separate loops. It seems to vectorize better.

# native = skylake-avx512
RUSTFLAGS="-C target-cpu=native" cargo bench --features compute,benchmarks \
    --bench take_kernels -- "take .* values nulls 2\^20"

take i32 values nulls 2^20                                                                            
                        time:   [6.2657 ms 6.2879 ms 6.3151 ms]
                        change: [-57.959% -57.507% -57.123%] (p = 0.00 < 0.05)
take bool values nulls 2^20                                                                             
                        time:   [4.4977 ms 4.5077 ms 4.5197 ms]
                        change: [-69.940% -69.411% -68.964%] (p = 0.00 < 0.05)
take str values nulls 2^20                                                                            
                        time:   [25.706 ms 25.878 ms 26.063 ms]
                        change: [-20.546% -19.827% -19.031%] (p = 0.00 < 0.05)

This was a side observation of DataEngineeringLabs/simd-benches#1 which showed that

pub fn naive_take(values: &[f32], indices: &[usize]) -> Vec<f32> {
    indices.iter().map(|i| values[*i]).collect()
}

is compiled to the same instructions as using portable_simd's gather, suggesting that the compiler is quite smart in compiling naive_take and we can thus leverage that.

ritchie46 · 2021-11-25T06:24:30Z

Intersting. I also had a similar result with splitting the bound check in two loops.

Maybe that makes sense here as well. Als it is now checked in the validity and in the values.

codecov · 2021-11-25T06:26:17Z

Codecov Report

Merging #633 (865ad04) into main (e9a6c3e) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #633      +/-   ##
==========================================
- Coverage   69.92%   69.91%   -0.02%     
==========================================
  Files         300      300              
  Lines       16634    16627       -7     
==========================================
- Hits        11632    11625       -7     
  Misses       5002     5002

Impacted Files	Coverage Δ
src/compute/take/boolean.rs	`90.00% <100.00%> (-0.39%)`	⬇️
src/compute/take/generic_binary.rs	`97.70% <100.00%> (-0.06%)`	⬇️
src/compute/take/primitive.rs	`89.58% <100.00%> (-0.62%)`	⬇️
src/compute/arithmetics/time.rs	`42.30% <0.00%> (-1.93%)`	⬇️
src/bitmap/utils/slice_iterator.rs	`92.53% <0.00%> (+1.49%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9a6c3e...865ad04. Read the comment docs.

Dandandan · 2021-11-25T07:34:19Z

Cool results 😎

jorgecarleitao added the enhancement An improvement to an existing feature label Nov 25, 2021

jorgecarleitao force-pushed the take_faster branch from ae0f4c3 to 9d5a1c9 Compare November 25, 2021 06:16

jorgecarleitao added 3 commits November 25, 2021 06:36

Added more benches for take.

87f423b

Faster take with null values.

a0c16e3

Migrated other kernels.

865ad04

jorgecarleitao force-pushed the take_faster branch from 9d5a1c9 to 865ad04 Compare November 25, 2021 06:48

jorgecarleitao changed the title ~~Faster take with null values (2x)~~ Faster take with null values (2-3x) Nov 25, 2021

jorgecarleitao changed the title ~~Faster take with null values (2-3x)~~ Faster take with null values (2-3x) Nov 25, 2021

jorgecarleitao merged commit c3222aa into main Nov 25, 2021

jorgecarleitao deleted the take_faster branch November 25, 2021 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster `take` with null values (2-3x) #633

Faster `take` with null values (2-3x) #633

jorgecarleitao commented Nov 25, 2021 •

edited

Loading

ritchie46 commented Nov 25, 2021

codecov bot commented Nov 25, 2021 •

edited

Loading

Dandandan commented Nov 25, 2021

Faster take with null values (2-3x) #633

Faster take with null values (2-3x) #633

Conversation

jorgecarleitao commented Nov 25, 2021 • edited Loading

ritchie46 commented Nov 25, 2021

codecov bot commented Nov 25, 2021 • edited Loading

Codecov Report

Dandandan commented Nov 25, 2021

Faster `take` with null values (2-3x) #633

Faster `take` with null values (2-3x) #633

jorgecarleitao commented Nov 25, 2021 •

edited

Loading

codecov bot commented Nov 25, 2021 •

edited

Loading