Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dense codec #1711

Merged
merged 6 commits into from
Dec 9, 2022
Merged

add dense codec #1711

merged 6 commits into from
Dec 9, 2022

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Dec 8, 2022

100k elements
test null_index::dense::bench::bench_dense_codec_translate_dense_to_orig ... bench:   1,191,989 ns/iter (+/- 48,555)
test null_index::dense::bench::bench_dense_codec_translate_orig_to_dense ... bench:     286,443 ns/iter (+/- 2,509)

@PSeitz PSeitz force-pushed the sparse_dense_index branch 6 times, most recently from 9c4b9ad to 3c83264 Compare December 8, 2022 04:26
/// # Panics
///
/// May panic if any `idx` is greater than the column length.
pub fn translate_codec_idx_to_original_idx<'a>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method will eventually work on a &mut Vec<u32> probably

@codecov-commenter
Copy link

codecov-commenter commented Dec 8, 2022

Codecov Report

Merging #1711 (789d29c) into main (96c93a6) will increase coverage by 0.03%.
The diff coverage is 99.30%.

@@            Coverage Diff             @@
##             main    #1711      +/-   ##
==========================================
+ Coverage   94.07%   94.10%   +0.03%     
==========================================
  Files         259      261       +2     
  Lines       49637    49925     +288     
==========================================
+ Hits        46694    46980     +286     
- Misses       2943     2945       +2     
Impacted Files Coverage Δ
fastfield_codecs/src/lib.rs 98.89% <ø> (ø)
fastfield_codecs/src/null_index/dense.rs 99.29% <99.29%> (ø)
fastfield_codecs/src/null_index/mod.rs 100.00% <100.00%> (ø)
src/postings/stacker/expull.rs 98.57% <0.00%> (-0.48%) ⬇️
src/schema/schema.rs 98.91% <0.00%> (+0.13%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

const SERIALIZED_BLOCK_SIZE: usize = BLOCK_BITVEC_SIZE + BLOCK_OFFSET_SIZE;

fn count_ones(block: u64, pos_in_block: u32) -> u32 {
if pos_in_block == 63 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I didn't manage to find rust code to spew this...

pub fn count_ones(block: u64, pos_in_block: u32) -> u32 {
    unsafe { core::arch::x86_64::_bzhi_u64(block, pos_in_block) }.count_ones()
}

improve benchmark
///
/// The last offset number is equal to the number of values in the index.
fn find_block(dense_idx: u32, mut block_pos: u32, data: &[u8]) -> u32 {
loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick:
I tend to prefer a for-loop.

Suggested change
loop {
for i in block_pos.. {
}
panic!("...

/// # Correctness
/// dense_idx needs to be smaller than the number of values in the index
///
/// The last offset number is equal to the number of values in the index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

Do we want to use a faster algorithm? Should we add a TODO in there?

Copy link
Contributor Author

@PSeitz PSeitz Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did extend the benchmark to see what happens if we do a full scan, but have very large steps, so that time is mostly spent in the linear search. Linear search seems to be fine, as it is a minor contributor to cost.

test null_index::dense::bench::bench_dense_codec_translate_dense_to_orig_90percent_filled_full_scan              ... bench:  25,282,406 ns/iter (+/- 13,598,775)
test null_index::dense::bench::bench_dense_codec_translate_dense_to_orig_90percent_filled_random_stride_100          ... bench:   1,083,598 ns/iter (+/- 322,955)
test null_index::dense::bench::bench_dense_codec_translate_dense_to_orig_90percent_filled_random_stride_50_000 ... bench:      15,187 ns/iter (+/- 4,774)

mod null_index_footer;

mod column;
mod gcd;
mod serialize;

pub use null_index::*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub use null_index::*;

That's actually nicer to not expose symbol here. The module is perfectly nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to get temporarily rid of the annoying unused warnings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can place #[allow(dead_code)] on the mod null_index declaration instead :)

PSeitz and others added 2 commits December 9, 2022 08:01
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
@PSeitz PSeitz merged commit a05a003 into main Dec 9, 2022
@PSeitz PSeitz deleted the sparse_dense_index branch December 9, 2022 07:48
@fulmicoton fulmicoton restored the sparse_dense_index branch December 12, 2022 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants