Skip to content

preytaren/fastbloom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fastbloom

A lightweight but fast Bloomfilter written in Python(2.7).

Introduction

Based on mmap and MurMur, Spooky hash functions as the base hashes. Use double hashing to reduce the number of hashes to two.

Requirements

Install boost and boost-python

  • Ubuntu

sudo apt-get install libboost-all-dev

  • Centos

sudo yum install boost-devel

  • OSX

brew install boost boost-python

Install pyhash

sudo pip install pyhash

Install fastbloom

sudo pip install fastbloom

Examples

>>> from fastbloom import BloomFilter
>>> filter_ = BloomFilter(10000, 0.001) # set size of input 1000, error rate 0.1%
>>> filter_.add('www.google.com')
>>> 'www.google.com' in filter_
True
>>> 'www.github.com' in filter_
False

Benchmark

When input_size set to 1000000, accepted_error_rate set to 0.01%

  • Memory consumption: 4 MB
  • Add operation: 7322.42 ops
  • Check operation: 24788.79 ops
  • Actual fault positive rate: 0.0000

You can just run the benchmark.py to see the actual benchmarks

Todo

  • add a faster bloomfilter
  • add save and restore functions(as files)
  • write a scalable bloomfilter

About

A simple but fast bloomfilter written in Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages