-
Notifications
You must be signed in to change notification settings - Fork 288
Description
Inconsistency Between Redis Documentation and Referenced Paper Regarding Cuckoo Filter Load / Fill Rates
Page: https://redis.io/docs/latest/develop/data-types/probabilistic/cuckoo-filter/
I noticed an inconsistency in the documentation for cuckoo filters when comparing the Redis documentation with the research paper linked on the same page.
The linked paper:
Cuckoo Filter: Practically Better Than Bloom
(https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) states the following:
"With k = 2 hash functions, the load factor α is 50% when the bucket size b = 1 (i.e., the hash table is directly mapped), but increases to 84%, 95% or 98% respectively using bucket size b = 2, 4 or 8."
However, the Redis documentation under “Choosing the Bucket Size”
(https://redis.io/docs/latest/develop/data-types/probabilistic/cuckoo-filter/#choosing-the-bucket-size-bucketsize) states:
"When bucket size of 1 is used the fill rate is 55% and false positive error rate is 2/256 ≈ 0.78% which is the minimal false positive rate you can achieve. Larger buckets increase the error rate linearly but improve the fill rate of the filter. For example, a bucket size of 3 yields a 2.34% error rate and an 80% fill rate. Bucket size of 4 yields a 3.12% error rate and a 95% fill rate."
Question
Could you clarify which values are correct, or whether the Redis implementation deviates intentionally from the results in the paper (e.g., due to differences in assumptions, parameterization, fingerprint size, or implementation-specific adjustments)?