A compressed bitmap class in C++.
|doxyconfig.txt||Minor fixes to the documentation.|
|example.cpp||Preparing new version|
|example2.cpp||Solve typo that would prevent a clean compile.|
|makefile||Preparing new release|
Compressed bitset in C++ Daniel Lemire == What is this? == The class EWAHBoolArray is a compressed bitset data structure. It also provides a basic BoolArray class which can serve as a traditional bitmap. This library is used by database and information retrieval engines such as Hustle -- A column oriented, embarrassingly distributed relational event database (see https://github.com/chango/hustle). == Licensing == Apache License 2.0. (Other licenses are possible.) Update (May 20th, 2013): though by default I use the Apache License 2.0 (which is compatible with GPL 3.0), you can *also* consider this library licensed under GPL 2.0. == Limitations == Because of the compression type being used, you must set the bits in increasing order (no random access while creating the bitmaps) when using EWAHBoolArray. == Dependencies == None. (Will work under MacOS, Windows or Linux.) Compilers tested: clang++, g++, Intel compiler, Microsoft Visual Studio == Usage == make ./unit make example ./example == Example == Please see example.cpp. For an example with tabular data, please see example2.cpp. == Ruby wrapper == Josh Ferguson wrote a wrapper for Ruby. The implementation is packaged and installable as a ruby gem. You can install it by typing: gem install ewah-bitset == Further reading == Please see Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering 69 (1), pages 3-28, 2010. http://arxiv.org/abs/0901.3751 == Warning == Run your own unit tests. Report bugs. == Persistent storage == If you save the bitmap to disk using "write" and then try to read it again with "read" on a different machine, it *may* crash. The file format is specific to the machine you are using. E.g., using two 64-bit Windows PC might work, but if you send the same data to a 32-bit Linux box it may fail. You can get better persistence with the readBuffer and writeBuffer methods. They will be consistent across machines except for bit endianess: you may need to manually check bit endianess. Moreover, these methods require you to save some metadata (i.e., buffersize and sizeinbits) yourself in some portable way. For example: For saving in a persistent way: size_t sb = mybitmap.sizeInBits(); // save sb somewhere safe, your responsability! size_t bs = mybitmap.bufferSize(); // save bs somewhere safe, your responsability! mybitmap.writeBuffer(out); // this writes the internal buffer of the bitmap For loading: mybitmap.readBuffer(in,bs); // recover the saved internal buffer, you are responsable for endianess mybitmap.setSizeInBits(sb); // you have to do this to get a proper bitmap size