Skip to content

instaclick/php-bloom-filter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

###Blog http://www.xuetech.com/search/label/Bloom%20Filter

###Summary This is a proper bloom filter implementation in PHP. I could not find an space/time efficient open-source PHP one. It uses a binary string to store the bit vector and manipulates based on byte indexes of the string.

###Performance On a 3.8GHz Sandy Bridge system, the single threaded insert/lookup throughput is 100K/150K elements / second respectively with k = 6.

###Math Bits

  • m vector bits

  • k hash functions

  • n elements

  • p probability of false positive

  • (1-(1-1/m)^(mln(2)))^(mln(2)/n)=p

  • k = m*ln(2)/n;

###Notes and Limitations

  • PHP Limitations

    • Strings are limited to byte addressing of signed 32 bit integers. The maximum string is only 2GB - 1B (2^31-1 Bytes).
    • The bit vector only supports powers of 2 bits in this implementation. Thus the largest vector size is 1GB.
    • Workaround with multiple strings could allow for implementations greater than 1GB.
    • Minor edits are required to support PHP 5.3 due to the use of array dereferencing features of PHP 5.4.
    • PHP lacks calloc or malloc so str_repeat is used to allocate the bit array.
    • PHP cannot directly use the output of str_repeat and primitive assignment will require double the memory of the vector size.
  • MurmurHash is a hot topic in this realm. It isn't implemented here though due to module dependency for fast implementation.

About

clean and efficient bloom filter using native datatypes in pure PHP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 100.0%