-
-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster range #1954
Faster range #1954
Conversation
fulmicoton
commented
Mar 22, 2023
•
edited
Loading
edited
cfeb4c1
to
a11275c
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1954 +/- ##
==========================================
- Coverage 94.52% 94.52% -0.01%
==========================================
Files 311 317 +6
Lines 57241 57775 +534
==========================================
+ Hits 54109 54610 +501
- Misses 3132 3165 +33
... and 21 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
645a842
to
0362a70
Compare
0362a70
to
7b82a90
Compare
This PR does several changes - ip compact space now uses u32 - the bitunpacker now gets a get_batch function - we push down range filtering, removing GCD / shift in the bitpacking codec. - we rely on AVX2 routine to do the filtering.
7b82a90
to
f9f5750
Compare
bitpacker/src/bitpacker.rs
Outdated
// #Panics | ||
// | ||
// This methods panics if `num_bits` is > 32. | ||
pub fn get_batch_u32s(&self, start_idx: u32, data: &[u8], output: &mut [u32]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that should be pub, or is that just for the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought it was a very useful API... Getting a batch of u32s values sounds useful.
// spread over the full u128 space. | ||
// | ||
// This change will force the algorithm to degenerate into dictionary encoding. | ||
// if amplitude_bits > 32 && cost >= saved_bits { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change is commented, and that's not how it should work
we could alter the condition, so we always add_blanks
when reaching 32 bits value space.
let reached_u32_value_space = amplitude_new_bits <= 32 && amplitude_bits > 32;
if !reached_u32_value_space && cost >= saved_bits {
continue;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I messed up. In test_worst_case_scenario
I never managed to end up with amplitude_bits > 32
so I ended up just putting the assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_worst_case_scenario
tests the wrong thing, it calls add_blanks
unconditionally, but the algorithm could decide that the metadata overhead is too high and not add the blanks to reduce the value space. The logic that needs to be tested in test_worst_case_scenario
is get_compact_space
. 10_000 blanks may not be enough
The cost per blank is hardcoded currently const COST_PER_BLANK_IN_BITS: usize = 36;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooohh you are absolutely right!
columnar/src/column_values/u128_based/compact_space/build_compact_space.rs
Outdated
Show resolved
Hide resolved
@@ -228,4 +243,11 @@ mod tests { | |||
assert_eq!(blanks.pop().unwrap().blank_size(), 101); | |||
assert_eq!(blanks.pop().unwrap().blank_size(), 11); | |||
} | |||
|
|||
#[test] | |||
fn test_worst_case_scenario() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time, the unit test does fail without the extra if condition in the test.
// #Panics | ||
// | ||
// This methods panics if `num_bits` is > 32. | ||
fn get_batch_u32s(&self, start_idx: u32, data: &[u8], output: &mut [u32]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it private upon your suggestion. We can make it public if it because useful.