Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved performance of sum aggregation via aligned loads (-10%) #445

Merged
merged 5 commits into from
Sep 29, 2021

Conversation

ritchie46
Copy link
Collaborator

This adds aligned load instruction to the SIMD aggregation of arrays without null values. This implementation works on any alignment, and does not require an aligned allocator, and also works on sliced arrays.

Benchmark is a bit mixed on small data sizes. It seems to improving with more data, I want to run one on more data later.

Gnuplot not found, using plotters backend
sum 2^10 f32            time:   [101.38 ns 101.41 ns 101.45 ns]                         
                        change: [+1.5061% +1.5401% +1.5774%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

sum 2^12 f32            time:   [284.72 ns 285.05 ns 285.44 ns]                         
                        change: [+0.4929% +0.6963% +0.9300%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

sum 2^14 f32            time:   [1.0246 us 1.0246 us 1.0247 us]                          
                        change: [-4.1159% -3.6418% -3.1975%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

sum 2^16 f32            time:   [4.0764 us 4.0773 us 4.0783 us]                          
                        change: [+1.4638% +1.6504% +1.7873%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

sum 2^18 f32            time:   [19.596 us 19.600 us 19.606 us]                          
                        change: [+0.2075% +0.2434% +0.2801%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

sum 2^20 f32            time:   [67.890 us 67.909 us 67.931 us]                         
                        change: [-6.5180% -6.4803% -6.4365%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  11 (11.00%) high mild
  1 (1.00%) high severe


@ritchie46 ritchie46 force-pushed the aligned_load branch 3 times, most recently from a236c47 to de512f2 Compare September 24, 2021 07:54
@codecov
Copy link

codecov bot commented Sep 24, 2021

Codecov Report

Merging #445 (5f85f94) into main (194a95d) will increase coverage by 0.13%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #445      +/-   ##
==========================================
+ Coverage   79.89%   80.02%   +0.13%     
==========================================
  Files         371      371              
  Lines       22776    22841      +65     
==========================================
+ Hits        18197    18279      +82     
+ Misses       4579     4562      -17     
Impacted Files Coverage Δ
src/types/simd/packed.rs 0.00% <ø> (ø)
src/types/simd/native.rs 90.90% <66.66%> (-3.83%) ⬇️
src/compute/aggregate/sum.rs 66.07% <80.00%> (-0.60%) ⬇️
src/bitmap/utils/slice_iterator.rs 92.53% <0.00%> (-1.50%) ⬇️
src/io/avro/read/schema.rs 41.37% <0.00%> (-1.15%) ⬇️
src/ffi/mod.rs 100.00% <0.00%> (ø)
src/array/ffi.rs 68.42% <0.00%> (ø)
src/ffi/array.rs 68.75% <0.00%> (ø)
tests/it/bitmap/mod.rs 100.00% <0.00%> (ø)
tests/it/bitmap/utils/mod.rs 100.00% <0.00%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 194a95d...5f85f94. Read the comment docs.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Left some ideas.

src/compute/aggregate/sum.rs Outdated Show resolved Hide resolved
src/types/simd/packed.rs Outdated Show resolved Hide resolved
src/compute/aggregate/sum.rs Outdated Show resolved Hide resolved
@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Sep 26, 2021
@ritchie46
Copy link
Collaborator Author

ritchie46 commented Sep 26, 2021

An update on the benchmarks:

Gnuplot not found, using plotters backend
sum 2^10 f64            time:   [151.72 ns 152.40 ns 153.21 ns]                         
                        change: [-3.5650% -3.1521% -2.7679%] (p = 0.00 < 0.05)
                        Performance has improved.

sum 2^10 i64            time:   [113.88 ns 113.91 ns 113.94 ns]                         
                        change: [+4.6575% +4.7503% +4.8275%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

sum 2^12 f64            time:   [526.64 ns 527.46 ns 528.31 ns]                          
                        change: [-2.1437% -1.9416% -1.7502%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

sum 2^12 i64            time:   [348.69 ns 348.94 ns 349.20 ns]                         
                        change: [-22.641% -21.575% -20.855%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

sum 2^14 f64            time:   [1.9927 us 1.9929 us 1.9930 us]                          
                        change: [-3.1503% -2.7050% -2.3456%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

sum 2^14 i64            time:   [1.7309 us 1.7317 us 1.7325 us]                          
                        change: [-8.9944% -8.9284% -8.8592%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

sum 2^16 f64            time:   [8.3187 us 8.3202 us 8.3222 us]                          
                        change: [-5.6838% -5.3678% -5.0664%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  6 (6.00%) low severe
  3 (3.00%) high mild
  6 (6.00%) high severe

sum 2^16 i64            time:   [9.2400 us 9.2499 us 9.2608 us]                          
                        change: [-7.9995% -7.8217% -7.6153%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

sum 2^18 f64            time:   [35.474 us 35.487 us 35.504 us]                          
                        change: [-9.2037% -9.1425% -9.0746%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

sum 2^18 i64            time:   [35.166 us 35.174 us 35.182 us]                          
                        change: [-10.221% -10.140% -10.061%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

sum 2^20 f64            time:   [188.31 us 188.55 us 188.80 us]                         
                        change: [-2.2460% -1.7823% -1.2743%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

sum 2^20 i64            time:   [207.85 us 209.90 us 212.52 us]                         
                        change: [-8.0797% -6.7231% -5.2968%] (p = 0.00 < 0.05)
                        Performance has improved.

It seems to be consistent ~5-10% improvement. Still have to find some union in traits for packed_simd and native.

@ritchie46 ritchie46 changed the title WIP: Aligned load Sum aggregation Aligned load Sum aggregation Sep 27, 2021
@jorgecarleitao jorgecarleitao changed the title Aligned load Sum aggregation Improved performance of sum aggregation via aligned loads Sep 28, 2021
Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a small comment that imo further simplifies the code.

src/compute/aggregate/sum.rs Outdated Show resolved Hide resolved
@jorgecarleitao jorgecarleitao changed the title Improved performance of sum aggregation via aligned loads Improved performance of sum aggregation via aligned loads (-10%) Sep 28, 2021
src/compute/aggregate/sum.rs Outdated Show resolved Hide resolved
src/compute/aggregate/sum.rs Outdated Show resolved Hide resolved
@jorgecarleitao jorgecarleitao merged commit 94fd267 into jorgecarleitao:main Sep 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants