A Bloom Filter Implementation in Go
This implementation includes:
- Standard Bloom Filter
- Counting Bloom Filter Standard Bloom Filter, except that it supports deletions, by keeping 8-bit counters in each slot. These counters are likely to overflow if we add too many elements.
- Scalable Bloom Filter This is a Bloom Filter, which can scale. Standard Bloom Filters will get too full at some time. If you keep adding elements any further, the False Positive Rate will go up. Scalable Bloom Filters get rid of this problem, by creating a new bloom filter to insert new elements. The newer bloom filters grow exponentially in size, and hence it will take exponentially more elements to become full. This is similar to how vectors grow.
You can create a new standard bloom filter like this:
bf := NewBloomFilter(numHashFuncs, bloomFilterSize int)Where,
numHashFuncsis the number of hash functions to use in the filter.
bloomFilterSizeis the size of the bloom filter.
Adding a new element is as simple as:
elementByteSliceis a representation of the element in the byte slice format.
Similarily, you can check if the element is present in the Bloom Filter by doing this:
The Check method is absolutely correct when it returns
false. However, when it returns
true, it might be wrong with a probability
The scalable bloom filter is exactly the same, except it is created as follows:
sbf := NewScalableBloomFilter(numHashFuncs, firstBFSize, maxBloomFilters, growthFactor int, targetFPR float64)
firstBFSizeis the size of the first Bloom Filter which will be created.
maxBloomFiltersis the upper limit on the number of Bloom Filters to create
growthFactoris the rate at which the Bloom Filter size grows.
targetFPRis the maximum false positive rate allowed for each of the constituent bloom filters, after which a new Bloom Filter would be created and used
The counting bloom filter is similar, except it is created as follows:
cbf := NewCountingBloomFilter(numHashFuncs, cbfSize int)
The counting bloom filter also supports deletion:
I have written a couple of simple tests, which you can see in
bloomfilter_test.go, and run the tests by doing
- Scalable Bloom Filters - Paulo Sérgio Almeida, Carlos Baquero, Nuno Preguiça, David Hutchison
- Less Hashing, Same Performance: Building a Better Bloom Filter - Adam Kirsch and Michael Mitzenmacher
- Analysis for the false positive rate in Scalable Bloom Filters
Thanks to Aditya for suggesting the idea and pointers.