Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Faster take with null values (2-3x) #633

Merged
merged 3 commits into from
Nov 25, 2021
Merged

Faster take with null values (2-3x) #633

merged 3 commits into from
Nov 25, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Nov 25, 2021

This was achieved by splitting the "take" of values and validity as two separate loops. It seems to vectorize better.

# native = skylake-avx512
RUSTFLAGS="-C target-cpu=native" cargo bench --features compute,benchmarks \
    --bench take_kernels -- "take .* values nulls 2\^20"
take i32 values nulls 2^20                                                                            
                        time:   [6.2657 ms 6.2879 ms 6.3151 ms]
                        change: [-57.959% -57.507% -57.123%] (p = 0.00 < 0.05)
take bool values nulls 2^20                                                                             
                        time:   [4.4977 ms 4.5077 ms 4.5197 ms]
                        change: [-69.940% -69.411% -68.964%] (p = 0.00 < 0.05)
take str values nulls 2^20                                                                            
                        time:   [25.706 ms 25.878 ms 26.063 ms]
                        change: [-20.546% -19.827% -19.031%] (p = 0.00 < 0.05)

This was a side observation of DataEngineeringLabs/simd-benches#1 which showed that

pub fn naive_take(values: &[f32], indices: &[usize]) -> Vec<f32> {
    indices.iter().map(|i| values[*i]).collect()
}

is compiled to the same instructions as using portable_simd's gather, suggesting that the compiler is quite smart in compiling naive_take and we can thus leverage that.

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Nov 25, 2021
@ritchie46
Copy link
Collaborator

Intersting. I also had a similar result with splitting the bound check in two loops.

Maybe that makes sense here as well. Als it is now checked in the validity and in the values.

@codecov
Copy link

codecov bot commented Nov 25, 2021

Codecov Report

Merging #633 (865ad04) into main (e9a6c3e) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #633      +/-   ##
==========================================
- Coverage   69.92%   69.91%   -0.02%     
==========================================
  Files         300      300              
  Lines       16634    16627       -7     
==========================================
- Hits        11632    11625       -7     
  Misses       5002     5002              
Impacted Files Coverage Δ
src/compute/take/boolean.rs 90.00% <100.00%> (-0.39%) ⬇️
src/compute/take/generic_binary.rs 97.70% <100.00%> (-0.06%) ⬇️
src/compute/take/primitive.rs 89.58% <100.00%> (-0.62%) ⬇️
src/compute/arithmetics/time.rs 42.30% <0.00%> (-1.93%) ⬇️
src/bitmap/utils/slice_iterator.rs 92.53% <0.00%> (+1.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9a6c3e...865ad04. Read the comment docs.

@jorgecarleitao jorgecarleitao changed the title Faster take with null values (2x) Faster take with null values (2-3x) Nov 25, 2021
@jorgecarleitao jorgecarleitao changed the title Faster take with null values (2-3x) Faster take with null values (2-3x) Nov 25, 2021
@Dandandan
Copy link
Collaborator

Cool results 😎

@jorgecarleitao jorgecarleitao merged commit c3222aa into main Nov 25, 2021
@jorgecarleitao jorgecarleitao deleted the take_faster branch November 25, 2021 16:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants