-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False positives #31
Comments
Interesting. The only way I was able to fix this was by using k independent hash functions, instead of the current version which uses 2 independent hash functions
As far as I know this is common practice, and the idea is mentioned in Building a Better Bloom Filter. I'm guessing that this optimisation has some effect on the false positive rate as the k hash functions are not completely independent. |
Perhaps @Gopiandcode would be interested in this. He is the author of Bloom filters debunked: Dispelling 30 Years of bad math with Coq! |
I'd suggest you try the benchmarking using a CSRNG from |
I'm getting a much higher false positive rate than I would expect from a bloom filter of the size that I'm using
I'm using a 1024-bit bloomfilter with 16 hashes and 20 elements in each filter.
I'm running a test which adds 20 elements to a filter, checking before adding each that the item isn't already in the filter.
After running the test 500 times, there are ~4 collisions.
Given a bloom filter with those parameters, there should only be about a 1/1.3 billion chance of collision (https://hur.st/bloomfilter/?n=20&p=&m=1024&k=16)
Here's the short script:
The text was updated successfully, but these errors were encountered: