For data size <2M entities
- Efficient storage data in memory
- Efficient search
- Prepare infrastructure for performance analyze
This repo is comparing bitmap
and map
as a cache.
map
is provided by Golang built-in librarybitmap
's are provided by https://github.com/RoaringBitmap/roaring. Few other bitmap repositories used occasionally, but excluded from results described in this section
Some criterias to consider: decrease memory requirements? improve latency? Increase cost of development and support for that? See below
- Read Status, you might get some issues running non-essential benchmarks
- Read Design
- Take into consideration: very few search options are supported, so benchmarks are limited
- Read Benchmark Aggregation - it may not be up-to-date, but will give you the common understanding
of the
bitmap
benefits Data sizing
: only ~500k
prices with ~8
conditions each. The results for your data could be different.Fair competition
: there are different strategies to improve bothbitmap
andmap
and some of them were applied.High throughput
vsLow Latency
- low-latency is expected hereUsage
: this repository is comparing search capabilities, leaving update or inserts behind, which could be important use case. Not only timing and memory usage (gc) is important here, but also you mind end up with x2 memory requirements (e.g. creating two caches at the same time) or will be forced to add READ lock for concurrency update\read. Map as well doesn't allow concurrent writes, so at least search compare oranges with orangesRequests
: with bitmaps client were able to send on1281%
more requests. e.g.2 244 242
vs162 471
Latency
: at the same time for99% percentile
got better timing on>900%
, e.g.911µs
vs9.534ms
with even better results for all requests before 99% percentileMemory
:bitmap
cache occupies less memory? This is only half of the story. First, you should beware why. bitmap and map index part has minimum difference (2MB vs 4MB
) for such data sizing. In fact, benefits comes from ability not to store all fields of original entities. If storing original entities is MUST - you may not gain any memory benefits. bitmap takes less for used structures -90MB vs 180MB total
. However, proper GC configuration is MUST, and minimum possible memory limit will be higher.GOMEMLIMIT
: You may need to set memory limit as800MB
for100MB
of the actual cache to get decent performance for99%
percentile. See bitmap 2gb vs 800mb. If degradation further is fine (we are still discussing timings<5ms
) see bitmap 2gb vs 500mb. No difference for map 2gb vs 500mbCPU
:bitmap
is way faster (1 order of magnitude,1000%
more requests), but it .GoGC
: Bitmap comes with the cost of increased memory allocation, which means increased GC pressure. Value1000
looks good. For some background see bitmap. gogc 1000 vs 100Cost
:bitmap
development and support is way harder than regularmap
. It is about supporting low-level data types. In contrastMap
is like business as usual - you will be able to spend more time on you actual business cases, then on dealing with performance. More over, you can check issues of differentbitmap libraries
and found all sort of issues hard-to-spot, like race conditions. Add your own changes on top. Good testing, even for concurrent READ scenarios is must.Language
:go
is used here.java
maybe a good alternative, over the years people built different efficient map implementations, and not only that, java's 32-bit support is free (which means less memory, faster).Http Server
: when it comes to bitmaps, you can achieve micro-seconds latency and here every improvement matters. For example, bitmap. http server fiber vs default fiber got40%
better timings (319µs
improvement for 97% percentile). Map has fewer requests, and timing from the start way higher, so, it is1-3%
(135µs
for 97% percentile) mapMicro Second Tooling
: Not many tools support microsecond measurements,wrk
is among those few.Docker
: has fewer calls1 457 261
compare to non-docker option2 244 242
, but similar timing. It is not yet clear why
Work in progress. There are still a lot todo to get the 'right' results. Unit tests, different data cases, more business cases, challenges with benchmarking and env configuration
- See Status for details
- build:
make build
- run
- map:
make run-map32-fiber
- bitmap:
make run-roaring32-fiber
- map:
- Search 1 price. Bitmap
curl -POST http://localhost:8091/v1/search/bitmap/prices \
-H "Content-Type: application/json" \
-d @benchmark/500k-large-groups/sample/search-price-request.json
- Search 1 price. Map
curl -POST http://localhost:8091/v1/search/map/prices \
-H "Content-Type: application/json" \
-d @benchmark/500k-large-groups/sample/search-price-request.json
- Search 5 prices. Bitmap
curl -H "Content-Type: application/json" -POST http://localhost:8091/v5/search/bitmap/bulk/prices \
-d @benchmark/500k-large-groups/sample/search-price-bulk-request-5-nd.json
- Search 10000 prices. Bitmap
time curl -H "Content-Type: application/json" -o /dev/null -POST http://localhost:8091/v5/search/bitmap/bulk/prices \
-d @benchmark/500k-large-groups/sample/search-price-bulk-request-10000-nd.json
- Low-level:
make bench
(go tests) - Memory:
make bench-memory-bitmap
See bitmap memorymake bench-memory-map
. See map memory
- High-level
- Wrk new approach:
make run-wrk-experiments
(in 1 click). See reports - Wrk:
make wrk
- Wrk2:
make wrk2
- Wrk new approach:
- Docker (beta)
- Build:
make build docker
- Run roaring32:
make docker-run-roaring32-fiber
- Run map32:
make docker-run-map32-fiber
- Run roaring32:
- Benchmarks
- Wrk:
make wrk -e IN_DOCKER=true
- Wrk2:
make wrk2 -e IN_DOCKER=true
- Wrk:
- Build: