Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

Seebs/bench #1741

Merged
merged 7 commits into from Nov 16, 2018
Merged

Seebs/bench #1741

merged 7 commits into from Nov 16, 2018

Conversation

seebs
Copy link
Contributor

@seebs seebs commented Nov 13, 2018

Overview

Some benchmark updates, minor bug fixes, and an easy low-hanging performance fix.

(Note: No actual documentation changes, but I have updated all the documentation related to what I'm changing.)

Fixes #854
Updates #735

Pull request checklist

Code review checklist

This is the checklist that the reviewer will follow while reviewing your pull request. You do not need to do anything with this checklist, but be aware of what the reviewer will be looking for.

  • Ensure that any changes to external docs have been included in this pull request.
  • If the changes require that minor/major versions need to be updated, tag the PR appropriately.
  • Ensure the new code is properly commented and follows Idiomatic Go.
  • Check that tests have been written and that they cover the new functionality.
  • Run tests and ensure they pass.
  • Build and run the code, performing any applicable integration testing.

@seebs seebs requested a review from tgruben November 13, 2018 18:51
} else if va > vb {
j++
} else {
s1, s2 := a.array, b.array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a de facto convention for roaring variable names, and there's a new convention being introduced here, because of the swap. Whatever that is, can we settle on something and stick with it, in any other swap logic that gets introduced?

I might have used n1 instead of l1, and maybe c1, c2 - c for container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, i'll fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on consideration, it's better to just use the existing convention, and just let the swap be a swap of a and b. we already have these implicitly in a number of places; an intersection count between "a" and "b" can be implemented by passing them to a concrete intersection count function in either order, so it's not breaking anything to just say "without loss of generality, let a be the shorter array".

jaffee
jaffee previously approved these changes Nov 16, 2018
Copy link
Member

@jaffee jaffee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good, but needs to be rebased on master (or have master merged in, or whatever... we're generally not too picky about our git history)

The circumstances under which bitmaps are converted between
types are not 100% nailed down, and the IntersectionCount
benchmark was actually using a bitmap for the "run" data set
as well as for the "bitmap" data set. Fix that by using
Optimize() explicitly. Also, add a second RLE set so we
can compare the difference between "one run for the entire
set" and "several runs".

Also add array/array comparisons. We use two different
lengths of arrays, because performance turns out to vary
between "first array longer" and "second array longer".

Also added a benchmark for getBenchData itself, since it's
at least one possible use case for "creating a lot of
containers".
If the total number of things returned was small enough to
make an array, intersectBitmapRun converted to an array. This
seems possibly-premature; future processing might well prefer
a bitmap. We know everything gets optimized before being
written out, let's not convert without a specific reason. But
also, let's use an array no matter which container is small
enough to prove that we can do so safely.

Fixes FeatureBaseDB#854.
The net effect of this is to not recompute "the current
value of the first array" on every loop, pretty much.
However, the swap to make sure the inner loop is on the
longer array seems to be significant for performance.
On my system, this moves runtime from ~29us per op
to ~17us per op.
confirm that the number of runs comes out as expected,
and add a couple of numbers out of order to verify that
the 17-18-19 set gets coalesced into one run even
if we add 17 and 19 before 18.
Roaring likes to call things "a" and "b", not "1" and "2",
and use "n" for length, not "l", etcetera. Adopt these
conventions to make code more readable.

Also drop the 'vb' value since it isn't expensive to
compute and the compiler can figure out that it can
reuse the value.
A transient bug introduced in intersectionCountArrayArray was
not caught by the tests, because it would only manifest when
two containers of different lengths were being compared. Also
improve the testing for intersectArrayArray, even though that
code hasn't been changed.
I am aware that I don't actually ever use the length of a
after this line of code, but if I don't correctly update it,
any future change that needs that length will break
mysteriously. We humbly ask gometalinter to consider
the reply of counsel in _Arkell v. Pressdram_ (1971).
@seebs seebs merged commit 0ed5b86 into FeatureBaseDB:master Nov 16, 2018
@seebs seebs deleted the seebs/bench branch November 17, 2018 00:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

review intersectBitmapRun() logic, specifically the return container type.
3 participants