Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BitPacker4x::num_bits_sorted SIGSEGV on empty arrays #23

Closed
Kerollmops opened this issue Apr 30, 2020 · 7 comments
Closed

BitPacker4x::num_bits_sorted SIGSEGV on empty arrays #23

Kerollmops opened this issue Apr 30, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@Kerollmops
Copy link

Kerollmops commented Apr 30, 2020

When called with an empty array the num_bits_sorted triggers a segfault, this is not specified anywhere that the array must not be empty (or do I missed it?).

fn num_bits_sorted(&self, initial: u32, decompressed: &[u32]) -> u8 {
     unsafe { scalar::UnsafeBitPackerImpl::num_bits_sorted(initial, decompressed) }
}

If you want to easily reproduce the bug just create an empty vec![] and give it to BitPacker4x::num_bits_sorted.

@fulmicoton fulmicoton added the bug Something isn't working label Apr 30, 2020
@fulmicoton
Copy link
Contributor

Good point! Let's fix this.

@fulmicoton
Copy link
Contributor

@Kerollmops also, if you intend to use bitpacking for meili, make sure to bitpack on blocks with a limit size. (typically 128 ints.)

@Kerollmops
Copy link
Author

@fulmicoton, what do you mean by a block with a limit size?

@fulmicoton
Copy link
Contributor

What I meant is : you do not want to compute the bit size for your entire postlist and stick to it. The reason is that the compression rate will then be determined by your largest delta. If your posting list has millions of elements you will probably end up having on outlier ruin the entire compression.

@Kerollmops
Copy link
Author

Kerollmops commented Apr 30, 2020

Ok so the num_bits_sorted kind of silently ignore the rest of the slice, right?
So what I have to do is to store the computed num_bits for each block along with the block?

It makes perfect sense to me, thank you!

@fulmicoton
Copy link
Contributor

Yes. I'll update doc and add an assert.

Your first block required numbits=10 and the second one numbits=15.
The second block was ignored, so only 10 bits were used. The first delta for which it was insufficient happened to be exactly 2^10=1024 but that was a coincidence.

@Kerollmops
Copy link
Author

Kerollmops commented Apr 30, 2020

I have computed and added the num_bits values in front of all the numbers blocks.
Thank you! This is fixed now!

(Whooops, the SIGSEGV is not fixed, I reopened)

@Kerollmops Kerollmops reopened this Apr 30, 2020
fulmicoton added a commit that referenced this issue Apr 30, 2020
The block in arguments need to be of len BLOCK_LEN

Closes #23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants