Greplin Bloom Filter
An (optionally persistent) counting Bloom Filter implementation in Java.
This is a very early stage project. It works for our needs. We haven't verified it works beyond that. Issue reports and patches are very much appreciated!
For example, some obvious improvements include
Support for a variable number of count bits (including 1)
More efficient thread safety
More hash functions to choose between
More efficient disk persistence
git clone https://github.com/Greplin/greplin-bloom-filter.git cd greplin-bloom-filter mvn install
This is a counting bloom filter that uses 4-bits per bucket. Once the count exceeds 15 items in a bucket, decrements are no longer possible.
Instead of using N distinct hashes, we use two linear combinations of two runs of a repeated murmur hash per Kirch and Mitzenmacher.
// create a new bloom filter with the desired set of properties final int expectedNumberOfItems = 10000; final double desiredFalsePositiveRate = 0.00001; BloomFilter bf = BloomFilter.createOptimal("/tmp/bloom.dat", expectedNumberOfItems, desiredFalsePositiveRate, true); // test out the bloom filter bf.add("Hello World".getBytes()); System.out.println(bf.contains("Hello World".getBytes())); System.out.println(bf.contains("Foo Bar".getBytes())); // persist it to disk (note that it is only persisted to disk when you call flush) bf.flush() // try removing an item bf.remove("Hello World".getBytes()); System.out.println(bf.contains("Hello World".getBytes())); // close the bloom filter (which also persists any unflushed changes) bf.close();