Probabilistic Data Structures for Realtime Analytics (PyData 2013)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
papers
.DS_Store
.gitignore
649px-Bloom_filter.svg.png
Hash_table_5_0_1_1_1_1_1_LL.svg.png
README.md
bf.png
bf_new_variant.png
bloom_filter.png
cdbf_array3.png
cdbf_maintenance.png
cms_array.png
hashfunctions.py
hll_bitpattern.png
hll_bitpatttern_stoch_avg.png
longuest_run.png
parsely.png
parsely_blackbox.jpg
parsely_blackbox.png
pydata2013.html
pydata2013.ipynb
pydata2013.slides.html

README.md

Probabilistic Data Structures for Realtime Analytics (PyData 2013)

More and more applications are now dealing with massive data that need to be processed in realtime. While easing the development of realtime analytics applications, computing platforms like Storm increases the need for efficient algorithms that can run on a single pass on the data stream. In this talk, I'll give a brief overview of some interesting probabilistic data structures that can used in this context: Bloomfilter, Temporal Bloomfilter, Count-Min Sketch and HyperLogLog.