Conversation
roaring/roaring.go
Outdated
} else if va > vb { | ||
j++ | ||
} else { | ||
s1, s2 := a.array, b.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a de facto convention for roaring variable names, and there's a new convention being introduced here, because of the swap. Whatever that is, can we settle on something and stick with it, in any other swap logic that gets introduced?
I might have used n1
instead of l1
, and maybe c1, c2
- c
for container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, i'll fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on consideration, it's better to just use the existing convention, and just let the swap be a swap of a and b. we already have these implicitly in a number of places; an intersection count between "a" and "b" can be implemented by passing them to a concrete intersection count function in either order, so it's not breaking anything to just say "without loss of generality, let a be the shorter array".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good, but needs to be rebased on master (or have master merged in, or whatever... we're generally not too picky about our git history)
The circumstances under which bitmaps are converted between types are not 100% nailed down, and the IntersectionCount benchmark was actually using a bitmap for the "run" data set as well as for the "bitmap" data set. Fix that by using Optimize() explicitly. Also, add a second RLE set so we can compare the difference between "one run for the entire set" and "several runs". Also add array/array comparisons. We use two different lengths of arrays, because performance turns out to vary between "first array longer" and "second array longer". Also added a benchmark for getBenchData itself, since it's at least one possible use case for "creating a lot of containers".
If the total number of things returned was small enough to make an array, intersectBitmapRun converted to an array. This seems possibly-premature; future processing might well prefer a bitmap. We know everything gets optimized before being written out, let's not convert without a specific reason. But also, let's use an array no matter which container is small enough to prove that we can do so safely. Fixes FeatureBaseDB#854.
The net effect of this is to not recompute "the current value of the first array" on every loop, pretty much. However, the swap to make sure the inner loop is on the longer array seems to be significant for performance. On my system, this moves runtime from ~29us per op to ~17us per op.
confirm that the number of runs comes out as expected, and add a couple of numbers out of order to verify that the 17-18-19 set gets coalesced into one run even if we add 17 and 19 before 18.
Roaring likes to call things "a" and "b", not "1" and "2", and use "n" for length, not "l", etcetera. Adopt these conventions to make code more readable. Also drop the 'vb' value since it isn't expensive to compute and the compiler can figure out that it can reuse the value.
A transient bug introduced in intersectionCountArrayArray was not caught by the tests, because it would only manifest when two containers of different lengths were being compared. Also improve the testing for intersectArrayArray, even though that code hasn't been changed.
I am aware that I don't actually ever use the length of a after this line of code, but if I don't correctly update it, any future change that needs that length will break mysteriously. We humbly ask gometalinter to consider the reply of counsel in _Arkell v. Pressdram_ (1971).
Overview
Some benchmark updates, minor bug fixes, and an easy low-hanging performance fix.
(Note: No actual documentation changes, but I have updated all the documentation related to what I'm changing.)
Fixes #854
Updates #735
Pull request checklist
Code review checklist
This is the checklist that the reviewer will follow while reviewing your pull request. You do not need to do anything with this checklist, but be aware of what the reviewer will be looking for.