False positives #31

dholms · 2020-08-11T01:03:03Z

I'm getting a much higher false positive rate than I would expect from a bloom filter of the size that I'm using

I'm using a 1024-bit bloomfilter with 16 hashes and 20 elements in each filter.

I'm running a test which adds 20 elements to a filter, checking before adding each that the item isn't already in the filter.

After running the test 500 times, there are ~4 collisions.

Given a bloom filter with those parameters, there should only be about a 1/1.3 billion chance of collision (https://hur.st/bloomfilter/?n=20&p=&m=1024&k=16)

Here's the short script:

    let collisions = 0
    for(let i=0; i< 500; i++) {
      const filter = new BloomFilter(1024, 16)
      const dict = {}
      for(let j=0; j< 20; j++) {
        const str = Math.floor(Math.random() * 1000000000).toString(16)
        if(filter.test(str) && dict[str] !== true){
          console.log("COLLISION: ", str)
          collisions++
        }
        filter.add(str)
        dict[str] = true
      }
      console.log(i)
    }
    console.log('done: ', collisions)

jasondavies · 2020-08-11T12:25:36Z

Interesting. The only way I was able to fix this was by using k independent hash functions, instead of the current version which uses 2 independent hash functions a and b internally, and combines them to generate k hash functions using:

h_i(d) = a(d) + i * b(d)

As far as I know this is common practice, and the idea is mentioned in Building a Better Bloom Filter. I'm guessing that this optimisation has some effect on the false positive rate as the k hash functions are not completely independent.

jasondavies · 2020-08-11T13:03:04Z

Perhaps @Gopiandcode would be interested in this. He is the author of Bloom filters debunked: Dispelling 30 Years of bad math with Coq!

nponeccop · 2023-03-24T13:33:20Z

I'd suggest you try the benchmarking using a CSRNG from crypto instead of Math.random(). With such complex bit patterns, the randomness provided by simple RNGs is often not enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False positives #31

False positives #31

dholms commented Aug 11, 2020

jasondavies commented Aug 11, 2020

jasondavies commented Aug 11, 2020

nponeccop commented Mar 24, 2023

False positives #31

False positives #31

Comments

dholms commented Aug 11, 2020

jasondavies commented Aug 11, 2020

jasondavies commented Aug 11, 2020

nponeccop commented Mar 24, 2023