Skip to content
master
Go to file
Code
This branch is 36 commits ahead of goncalvesnelson:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.rst

Python implementation of the Hyper LogLog and Sliding Hyper LogLog cardinality counter algorithms.

Installation:

Use pip install hyperloglog to install from PyPI.

Usage:

import hyperloglog
hll = hyperloglog.HyperLogLog(0.01)  # accept 1% counting error
hll.add("hello")
print len(hll)  # 1
hll.add("hello")
print len(hll)  # 1 as items aren't added more than once
hll.add("hello again")
print len(hll)  # 2

If we add a further 1000 random strings (giving a total of 1002 strings) we'll have a count roughly within 1% of the true value, in this case it counts 1007 (within +/- 10.2 of the true value)

# add 1000 random 30 char strings to hll
import random
import string
[hll.add("".join([string.ascii_letters[random.randint(0, len(string.ascii_letters)-1)] for n in range(30)])) for m in range(1000)]
print len(hll)  # 1007

Changes:

  • Added Sliding window HLL version
  • Added bias correction from HLL++

References:

  1. http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
  2. http://hal.archives-ouvertes.fr/docs/00/46/53/13/PDF/sliding_HyperLogLog.pdf
  3. http://research.google.com/pubs/pub40671.html

About

Python Implementation of Super and Hyper Log Log Sketches

Resources

License

Releases

No releases published

Languages

You can’t perform that action at this time.