Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added support for limited sort #218

Merged
merged 5 commits into from Jul 24, 2021
Merged

Added support for limited sort #218

merged 5 commits into from Jul 24, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Jul 23, 2021

Closes #215

This:

  • adds support to sort with a limit.
  • adds bench to sort utf8 with nulls
  • improves performance of (unlimited) sort of utf8 for small arrays by 10-15%
  • makes all sorts unstable (most were already)

This change is backward incompatible: all public functions in compute::sort now expect an Option<usize> denoting the limit of the sort. The migration is to add a new parameter None to them.

Bench results (utf8 with 10% of nulls)

sort utf8 null 2^10     time:   [130.62 us 130.75 us 130.86 us]                                
                        change: [-14.887% -14.549% -14.269%] (p = 0.00 < 0.05)
sort utf8 null 2^12     time:   [651.78 us 652.41 us 653.03 us]                                
                        change: [-12.832% -12.593% -12.372%] (p = 0.00 < 0.05)
                        Performance has improved.
sort utf8 null 2^14     time:   [3.0201 ms 3.0244 ms 3.0286 ms]                                 
                        change: [-11.194% -11.032% -10.871%] (p = 0.00 < 0.05)

@codecov
Copy link

codecov bot commented Jul 23, 2021

Codecov Report

Merging #218 (25e5ce0) into main (b730ba7) will increase coverage by 0.04%.
The diff coverage is 77.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #218      +/-   ##
==========================================
+ Coverage   76.82%   76.86%   +0.04%     
==========================================
  Files         226      229       +3     
  Lines       19417    19520     +103     
==========================================
+ Hits        14917    15005      +88     
- Misses       4500     4515      +15     
Impacted Files Coverage Δ
src/compute/sort/mod.rs 53.38% <36.17%> (-9.30%) ⬇️
src/compute/sort/common.rs 92.40% <92.40%> (ø)
src/compute/sort/primitive/sort.rs 94.80% <92.59%> (-5.20%) ⬇️
src/compute/sort/lex_sort.rs 82.14% <93.33%> (+1.56%) ⬆️
src/compute/sort/boolean.rs 94.44% <94.44%> (ø)
src/buffer/mutable.rs 91.76% <100.00%> (+1.23%) ⬆️
src/compute/merge_sort/mod.rs 93.56% <100.00%> (ø)
src/compute/sort/primitive/indices.rs 100.00% <100.00%> (+4.47%) ⬆️
src/compute/sort/utf8.rs 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b730ba7...25e5ce0. Read the comment docs.

@jorgecarleitao jorgecarleitao added backwards-incompatible enhancement An improvement to an existing feature labels Jul 23, 2021
@jorgecarleitao jorgecarleitao merged commit a6e8b69 into main Jul 24, 2021
@jorgecarleitao jorgecarleitao deleted the sort_limit branch July 24, 2021 12:32
@sundy-li
Copy link
Collaborator

Great job, and I think it will be better to add limit option to merge sort.

https://github.com/datafuselabs/datafuse/blob/master/common/datavalues/src/columns/common.rs#L225-L329

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backwards-incompatible enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce limit option to sort
2 participants