Skip to content
This repository has been archived by the owner on Oct 29, 2023. It is now read-only.

mceachen/bloomer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bloomer: Bloom filters with elastic

Gem Version Build Status

Bloom filters are great for quickly checking to see if a given string has been seen before--in constant time, and using a fixed amount of RAM, as long as you know the expected number of elements up front. If you add more than capacity elements to the filter, accuracy for include? will drop below false_positive_probability.

Scalable Bloom Filters maintain a maximal false_positive_probability by using additional RAM as needed.

Bloomer is a Bloom Filter. Bloomer::Scalable is a Scalable Bloom Filter.

Keep in mind that false positives with Bloom filters are expected, with a specified probability rate. False negatives, however, are not. In other words,

  • if include? returns false, that string has certainly not been added
  • if include? returns true, it might mean that string was added (depending on the false_positive_probability parameter provided to the constructor).

This implementation is unique in that Bloomer

  • supports scalable bloom filters (SBF)
  • uses triple hash chains (see the paper)
  • can marshal state quickly
  • has rigorous tests
  • is pure ruby
  • does not require EM or Redis or something else unrelated to simply implementing a bloom filter

Usage

expected_size = 10_000
false_positive_probability = 0.01
b = Bloomer.new(expected_size, false_positive_probability)
b.add "cat"
b.include? "cat"
#=> true
bf.include? "dog"
#=> false

Scalable Bloom filters uses the same API:

b = Bloomer::Scalable.new
b.add "boom"
b.include? "boom"
#=> true
bf.include? "badda"
#=> false

Serialization can be done using MessagePack:

Notice, you'll need to require bloomer/msgpackable to enable serialization.

require 'bloomer/msgpackable'
b = Bloomer.new(10)
b.add("a")
s = b.to_msgpack
new_b = Bloomer.from_msgpack(s)
new_b.include? "a"
#=> true

The original class will be preserved regardless of calling Bloomer.from_msgpack(s) or Bloomer::Scalable.from_msgpack(s):

require 'bloomer/msgpackable'
b = Bloomer::Scalable.new
b.add("a")
s = b.to_msgpack
new_b = Bloomer.from_msgpack(s)
new_b.class == Bloomer::Scalable
#=> true

Changelog

1.0.0

  • Using msgpack for more secure deserialization. Marshal.load still works but is not recommended

0.0.5

  • Switched from rspec to minitest

0.0.4

  • Fixed gem packaging

0.0.3

  • Added support for scalable bloom filters (SBF)

0.0.2

  • Switch to triple-hash chaining (simpler, faster, and better false-positive rate)

0.0.1

  • Bloom, there it is.

About

A scalable ruby Bloom filter

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages