-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add dense codec #1711
add dense codec #1711
Conversation
PSeitz
commented
Dec 8, 2022
•
edited
edited
9c4b9ad
to
3c83264
Compare
3c83264
to
2c2f5c3
Compare
/// # Panics | ||
/// | ||
/// May panic if any `idx` is greater than the column length. | ||
pub fn translate_codec_idx_to_original_idx<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method will eventually work on a &mut Vec<u32>
probably
Codecov Report
@@ Coverage Diff @@
## main #1711 +/- ##
==========================================
+ Coverage 94.07% 94.10% +0.03%
==========================================
Files 259 261 +2
Lines 49637 49925 +288
==========================================
+ Hits 46694 46980 +286
- Misses 2943 2945 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
const SERIALIZED_BLOCK_SIZE: usize = BLOCK_BITVEC_SIZE + BLOCK_OFFSET_SIZE; | ||
|
||
fn count_ones(block: u64, pos_in_block: u32) -> u32 { | ||
if pos_in_block == 63 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I didn't manage to find rust code to spew this...
pub fn count_ones(block: u64, pos_in_block: u32) -> u32 {
unsafe { core::arch::x86_64::_bzhi_u64(block, pos_in_block) }.count_ones()
}
1de7853
to
9bb6822
Compare
improve benchmark
9bb6822
to
789d29c
Compare
/// | ||
/// The last offset number is equal to the number of values in the index. | ||
fn find_block(dense_idx: u32, mut block_pos: u32, data: &[u8]) -> u32 { | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick:
I tend to prefer a for-loop.
loop { | |
for i in block_pos.. { | |
} | |
panic!("... |
/// # Correctness | ||
/// dense_idx needs to be smaller than the number of values in the index | ||
/// | ||
/// The last offset number is equal to the number of values in the index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question:
Do we want to use a faster algorithm? Should we add a TODO in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did extend the benchmark to see what happens if we do a full scan, but have very large steps, so that time is mostly spent in the linear search. Linear search seems to be fine, as it is a minor contributor to cost.
test null_index::dense::bench::bench_dense_codec_translate_dense_to_orig_90percent_filled_full_scan ... bench: 25,282,406 ns/iter (+/- 13,598,775)
test null_index::dense::bench::bench_dense_codec_translate_dense_to_orig_90percent_filled_random_stride_100 ... bench: 1,083,598 ns/iter (+/- 322,955)
test null_index::dense::bench::bench_dense_codec_translate_dense_to_orig_90percent_filled_random_stride_50_000 ... bench: 15,187 ns/iter (+/- 4,774)
mod null_index_footer; | ||
|
||
mod column; | ||
mod gcd; | ||
mod serialize; | ||
|
||
pub use null_index::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub use null_index::*; |
That's actually nicer to not expose symbol here. The module is perfectly nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it to get temporarily rid of the annoying unused warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can place #[allow(dead_code)]
on the mod null_index
declaration instead :)
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
cd370d9
to
f53a996
Compare
f53a996
to
976128a
Compare