Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index: Use roaring bitmaps for content posting lists #10

Closed
wants to merge 3 commits into from

Conversation

keegancsmith
Copy link
Member

No description provided.

Change-Id: Ib557078f05714a14b8ee36682db81c8fad4f929a
Tests still don't pass

Change-Id: Ic68811b64aa43324fda9ed9186a2154639f64d8b
Change-Id: I445385bc75a02a3b5d8aabde608d133d95a4fc01
@keegancsmith
Copy link
Member Author

@sourcegraph/core-services so this might not be better. I believe loading a roaring bitset requires scanning over the whole []buf representing it. This may actually be way more IO than the current delta encoding approach, which lazily delta decodes. So in the case of something with a lot of results, you will do way less IO. But in the case of something with few results, you may end up doing the same amount of IO.

So roaring bitsets may be inappropriate for posting lists to byte offsets. There is likely something better though. From a quick look at the roaring bitsets implementation, we can probably get away with something based on roaring bitsets which is optimized for streaming them, or we can look into some research.

@tsenart
Copy link
Contributor

tsenart commented Jun 28, 2019

I believe loading a roaring bitset requires scanning over the whole []buf representing it.

From: https://github.com/RoaringBitmap/RoaringFormatSpec#general-layout

The layout is designed so that random access to the data is possible without materializing the bitmap in memory while being storage efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants