Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Simplified min_max_string and min_max_binary #1004

Merged

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented May 22, 2022

Signed-off-by: remzi 13716567376yh@gmail.com

Changes in the PR

  1. Replace fold with reduce
  2. use macro to abstract the common expression
  3. Add test to cover the no null cases.
  4. Add benchmark for min utf8
  5. Less code is better
  6. Functional style

No obvious performance impact

min 2^10 utf8           time:   [33.766 ns 33.786 ns 33.810 ns]                           
                        change: [+0.0167% +0.2385% +0.4298%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

min null 2^10 utf8      time:   [33.352 ns 33.370 ns 33.391 ns]                                
                        change: [-4.7561% -4.6289% -4.5317%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Signed-off-by: remzi <13716567376yh@gmail.com>
@codecov
Copy link

codecov bot commented May 22, 2022

Codecov Report

Merging #1004 (c2332f1) into main (a10db3a) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1004      +/-   ##
==========================================
+ Coverage   71.42%   71.45%   +0.02%     
==========================================
  Files         356      356              
  Lines       19784    19752      -32     
==========================================
- Hits        14131    14113      -18     
+ Misses       5653     5639      -14     
Impacted Files Coverage Δ
src/compute/aggregate/min_max.rs 74.24% <100.00%> (+3.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a10db3a...c2332f1. Read the comment docs.

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 marked this pull request as draft May 22, 2022 22:56
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 changed the title Rewrite min_max_string Rewrite min_max_string and min_max_binary May 23, 2022
@@ -42,6 +42,18 @@ fn add_benchmark(c: &mut Criterion) {
c.bench_function(&format!("min null 2^{} f32", log2_size), |b| {
b.iter(|| bench_min(&arr_a))
});

let arr_a = create_string_array::<i32>(1, size, 0.0, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the length of each string to be 1 to make the time of comparing each element to be determined.

@HaoYang670 HaoYang670 marked this pull request as ready for review May 23, 2022 01:37
@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label May 23, 2022
@jorgecarleitao jorgecarleitao merged commit a810b03 into jorgecarleitao:main May 23, 2022
@jorgecarleitao jorgecarleitao changed the title Rewrite min_max_string and min_max_binary Simplified min_max_string and min_max_binary May 23, 2022
@jorgecarleitao
Copy link
Owner

Thanks a lot for this PR.

I went through this and also ran the benches on main vs this PR with different string lengths and array lengths and the performance is equivalent on my machine.

Given that this is less code and more tested, we ship it :)

@HaoYang670 HaoYang670 deleted the refactor_min_max_binary branch May 23, 2022 05:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants