tune for performance #80

glycerine · 2016-12-16T02:17:49Z

often times in the run container operations it appeared faster to convert to bitmaps and the do the operations there. Meaure and see if any of the remaining operations could benefit from this kind of tuning.

glycerine · 2016-12-17T03:32:26Z

...or vice-versa. I ended using bitmapContainer operations by default for most things in the runContainer binary operations, since they sped up the high cardinality test cases so much. For low cardinality however the non-bitmapContainer ops may be a win. Requires measurement.

glycerine · 2016-12-19T21:54:31Z

@lemire are there existing benchmarks (in the Java version?) that should be ported to Go; or do the Go benchmarks already suffice?

I'd like to do at least one pass of perf tuning in order to catch glaring issues while the code is still fresh in mind. To that end, I'd like to have benchmarks that include the runContainer16 as a part of their testing. If nothing is available I can generate random ones easily, but I think it would be nice to compare speed directly against the Java or C Roaring implementations if possible.

glycerine · 2016-12-19T21:56:13Z

here is what I get from the java repo, if someone could explain the columns, this might be a good place to start....

jaten@jatens-MacBook-Pro ~/go/src/github.com/RoaringBitmap/RoaringBitmap/simplebenchmark (master) $ ./run.sh
# bitspervalue nanotimefor2by2and nanotimefor2by2or nanotimeforwideor nanotimeforcontains (first normal then buffer)
census-income.zip                       2.59         2398999   3087540    663947     51744         2616805   3242794    801903     61818
census-income_srt.zip                   0.60          892023   2101384    493592     51848         1199480   2715677    669780     57988
census1881.zip                          15.06          87353    923013   1331235     32423          113971   1187069   1497454     31657
census1881_srt.zip                      2.07           83891    366097    951445     42705           71974    387080    839619     33649
dimension_003.zip                       0.59          535187   1272743   1130322    314439          611348   1807932   1350725    404351
dimension_008.zip                       0.35          451754    640828    493947     94197          518238   1268994    626942    275252
dimension_033.zip                       0.14          123837    435861    375743     35644          190126    578976    389863     38769
uscensus2000.zip                        35.91          31364    148260    935618     28363           32830    225320   1152845     25543
weather_sept_85.zip                     5.38         9536055  12942823   3206015    116243        10009470  13687968   3265626    123652
weather_sept_85_srt.zip                 0.34         1121712   2513016   1011577     49968         1263084   2760889   1142754     52280
wikileaks-noquotes.zip                  5.70          277787    859430    340619     52893          333683    985212    389047     57702
wikileaks-noquotes_srt.zip              1.48          126781    565407    300072     45610          134155    483344    228947     42447
jaten@jatens-MacBook-Pro ~/go/src/github.com/RoaringBitmap/RoaringBitmap/simplebenchmark (master) $

lemire · 2016-12-19T21:58:39Z

@glycerine We have Go wrappers around C code, so that's a good reference point... that is, Go's roaring should aim to be at least as good as gocroaring...

https://github.com/RoaringBitmap/gocroaring

lemire · 2016-12-19T22:00:00Z

I have a shallow benchmark...
https://github.com/lemire/gobitmapbenchmark

lemire · 2016-12-19T22:03:10Z

@glycerine

The easiest reference point might be this C/C++ benchmark...

https://github.com/RoaringBitmap/CBitmapCompetition

The README.md file there goes into the details of what each and every column means. This is not Go code, of course... it is C and C++... but the idea is the same.

glycerine · 2016-12-19T22:04:42Z

great! thanks @lemire

lemire · 2016-12-19T22:07:34Z

@glycerine

As for your specific question:

bitspervalue: bits of storage per value stored on average... should be self-explanatory... should be better when using run compression for obvious reasons

nanotimefor2by2and : take bitmaps two-by-two, compute intersection, report the time

nanotimefor2by2or : same with union

nanotimeforwideor : compute the "wide" (complete) union of all bitmaps

nanotimeforcontains : we compute the universe size (all integers in [0,N)) then for each bitmap we check whether the values N/4, N/2 and 3N/4 are present in teach bitmap... this stresses random access...

glycerine · 2016-12-19T22:14:19Z

looking at the java test data, it is about 120MB; I'm going to make a separate repo for it so as to not duplicate too much in the Go implementation.

lemire · 2016-12-19T22:22:03Z

@glycerine

Actually, yes, I think it is nice to have it as a secondary repository.

glycerine · 2016-12-19T22:40:55Z

isolated and duplicated here: https://github.com/RoaringBitmap/real-roaring-datasets

lemire · 2018-04-02T19:58:06Z

The benchmark against the C version of Roaring has been extended considerably: see https://github.com/lemire/gobitmapbenchmark

glycerine mentioned this issue Dec 16, 2016

RunContainer32 implements a run-length #70

Merged

lemire added the performance label Mar 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tune for performance #80

tune for performance #80

glycerine commented Dec 16, 2016

glycerine commented Dec 17, 2016

glycerine commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Dec 19, 2016 •

edited

lemire commented Dec 19, 2016

lemire commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Apr 2, 2018

tune for performance #80

tune for performance #80

Comments

glycerine commented Dec 16, 2016

glycerine commented Dec 17, 2016

glycerine commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Dec 19, 2016 • edited

lemire commented Dec 19, 2016

lemire commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Dec 19, 2016

glycerine commented Dec 19, 2016

lemire commented Apr 2, 2018

lemire commented Dec 19, 2016 •

edited