Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

false_positive_probability not holding for large bloomfilter #7

Open
rhettlunn opened this issue Dec 7, 2020 · 1 comment
Open

Comments

@rhettlunn
Copy link

I was expecting the False Positive rate to stay below the level the bloomfilter was initialized with until the bloomfilter reaches capacity, but instead it vastly exceeds it.

Create a bloomfilter with 400 million capacity, FPP of 0.001, put 100 million things into the bloomfilter, then test 1 million things that weren't put into the bloomfilter.

I expected to find 1,000 or less false positives, but instead I find 23,242.

iex(1)> b = Blex.new(400000000, 0.001)
%Blex{
  a: #Reference<0.3778163223.2071855107.97792>,
  b: 30,
  hash_fn: #Function<4.114810444/2 in Blex.get_hash_fn/1>,
  hash_id: 202,
  k: 10,
  m: 1073741824
}
iex(2)> 1..100000000 |> Enum.map(fn x -> Blex.put(b, x) end)
[:ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok,
 :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok,
 :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok, :ok,
 :ok, :ok, ...]
iex(3)> 101000001..102000000 |> Enum.map(fn x -> Blex.member?(b, x) end) |> Enum.filter(fn x -> x end) |> Enum.count
23242

I get the exact same false positive count when I initialize the bloomfilter with a capacity of 1.6 billion.

While experimenting I also tried creating a bloomfilter with capacity of 3.2 billion, but that resulted in a very different error that I'll open a separate issue for.

@wantedru
Copy link

I have the same issue. I use in production bloomfilter created with b = Blex.new(300000000, 0.0001) and containing 250 million things. Just tested 1 million things and received 56784 false positives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants