Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

problems on import(roaring) #36

Closed
tgruben opened this issue Jan 8, 2016 · 5 comments · Fixed by #38
Closed

problems on import(roaring) #36

tgruben opened this issue Jan 8, 2016 · 5 comments · Fixed by #38

Comments

@tgruben
Copy link
Member

tgruben commented Jan 8, 2016

Something not right on the roaring bitmap. I've got it narrowed down to the WriteTo method of the array.

The error reported is

2016/01/08 10:09:30 import error: db=3, frame=b.n, slice=59, bits=5434

I'm thinking there might be an overflow somewhere, but i can't find it.
1237.txt

@tgruben
Copy link
Member Author

tgruben commented Jan 8, 2016

i'm wondering if the roaring implementation needs to replace the 16bit containers with 32 bit containers. The idea behind those containers is to have half the bits for low and half for high. So the container should be 32 bits instead of 64 bits.

Thought?

@benbjohnson
Copy link
Contributor

I pushed up a PR to actually return the error message: 🤦
#37

Can you rerun the import and send the error?
As far as the containers, if they have 32-bits then the array size or else they become really inefficient. Instead of 4,096 elements it'd probably need to go up to 67,108,864 (4K x 16K). If we left it at 4K then we'd need to allocate a huge array for the bitmap container to hold all the elements.
Do you have a failing test case for the issue?
Ben

On Friday, January 8, 2016 10:43 AM, tgruben <notifications@github.com> wrote:

i'm wondering if the roaring implementation needs to replace the 16bit containers with 32 bit containers. The idea behind those containers is to have half the bits for low and half for high. So the container should be 32 bits instead of 64 bits.Thought?—
Reply to this email directly or view it on GitHub.

@tgruben
Copy link
Member Author

tgruben commented Jan 9, 2016

You right on the inefficiency. Looking at the roaring code it seems to chop up the 32 bit number into 16bit parts..a high and a low. Perhaps we split up into 4 16 bit chunks. HighHigh, HighLow, LowHigh, LowLow and do the search. Never needing the 64 bit ops..ie search64.

And I would search in this order(for it would be just the same performance as roaring for values< 2^32
LowHigh,LowLow,HighLow,HighHigh

How does that grab you?

Here is the error...
2016/01/09 08:18:06 import error: db=3, frame=b.n, slice=%!s(uint64=59), bits=5434, err=open storage: unmarshal storage: file=/Users/tgruben/.pilosa/3/b.n/59, err=checksum mismatch: exp=5c922157, got=9dc388c3
2

The error is in importing the attached file above in slice 59, i'm trying to get it narrowed down better than that, but I haven't found the culprit. Below is the profile ids of the failed "import"

pids.txt

@benbjohnson
Copy link
Contributor

It seems like it would add a lot of complexity by adding 4 levels instead of just 2. Unless we had a lot of containers it doesn't seem like it would help much.

@benbjohnson
Copy link
Contributor

@tgruben I found and fixed the issue: #38

tgruben pushed a commit to tgruben/pilosa that referenced this issue Sep 4, 2019
seebs pushed a commit to seebs/pilosa that referenced this issue Nov 22, 2019
…keys

Add column keys support to IncludeColumn
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants