Python Implementation of Super and Hyper Log Log Sketches
Python
Switch branches/tags
Nothing to show
Clone or download
#8 Compare This branch is 36 commits ahead of goncalvesnelson:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
hyperloglog encode unicode to utf8 before submitting to sha1 May 8, 2018
.gitignore First Commit Apr 15, 2012
.travis.yml Fix test setup Jul 4, 2016
LICENSE Add LICENSE file May 4, 2013
README.rst Added some instructions Dec 10, 2013
setup.py Bump version May 8, 2018

README.rst

Python implementation of the Hyper LogLog and Sliding Hyper LogLog cardinality counter algorithms.

Installation:

Use pip install hyperloglog to install from PyPI.

Usage:

import hyperloglog
hll = hyperloglog.HyperLogLog(0.01)  # accept 1% counting error
hll.add("hello")
print len(hll)  # 1
hll.add("hello")
print len(hll)  # 1 as items aren't added more than once
hll.add("hello again")
print len(hll)  # 2

If we add a further 1000 random strings (giving a total of 1002 strings) we'll have a count roughly within 1% of the true value, in this case it counts 1007 (within +/- 10.2 of the true value)

# add 1000 random 30 char strings to hll
import random
import string
[hll.add("".join([string.ascii_letters[random.randint(0, len(string.ascii_letters)-1)] for n in range(30)])) for m in range(1000)]
print len(hll)  # 1007

Changes:

  • Added Sliding window HLL version
  • Added bias correction from HLL++

References:

  1. http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
  2. http://hal.archives-ouvertes.fr/docs/00/46/53/13/PDF/sliding_HyperLogLog.pdf
  3. http://research.google.com/pubs/pub40671.html