Implements the Karnin-Lang-Liberty (KLL) algorithm in python
Python Gnuplot
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This code implements the simplest algorithm described in the paper Optimal Quantile Approximation in Streams (FOCS 2016) by Zohar Karnin, Kevin Lang and myself (Edo Liberty).

  • It is distributed free and with no warranty of any kind.
  • It should NOT be used for any commercial purposes.
  • It is written as an academic tool for readers of the paper to reproduce our results.
  • It is not optimized in any way.


Main class in

$ python -h

usage: [-h] [-k K] [-t {string,int,float}]
	optional arguments:
      -h, --help            show this help message and exit
      -k K                  controls the number of elements in the sketch which is
                            at most 3k+log2(n). n is the length of the stream.
      -t {string,int,float}
                            defines the type of stream items, default="string"

For convenience, the class StreamMaker is also included.

$ python -h

usage: [-h] [-n N] [-o {sorted,zoomin,zoomout,sqrt,random}]
optional arguments:
    -h, --help            show this help message and exit
	-n N                  the number of generated elements
	-o {sorted,zoomin,zoomout,sqrt,random}
                   		  the order of the streamed integers.


Try the following:

python -n 1000 -o zoomin | python -k 32 -t int

Or, if you want to save the files and look at the input-output do this:

python -n 1000 -o zoomin > numbers.txt
cat numbers.txt | python -k 32 -t int > cdf.csv

You should get an approximate CDF of the input stream. The file cdf.csv should look something like this:

0.016000	7
0.032000	23
0.048000	39
0.064000	55
0.080000	72
0.936000	923
0.952000	940
0.968000	956
0.984000	973
1.000000	989 

You can also try something (admittedly odd) like this:

man grep | python

This will give approximate quantiles of the lines in the grep man page where the order of lines is lexicographic.


This requires having gnuplot installed. On macs you can install like this:

brew install gnuplot

Assume you created a cdf.csv file like this:

python -n 1000 -o random > numbers.txt
cat numbers.txt | python -k 32 -t int > cdf.csv

You can than plot the approximate quantiles the sketch holds. A single dot is plotted for each item stored by the sketch.

cat cdf.csv | gnuplot > plot.eps; open plot.eps

You can then increase k and see that the plot aligns better along a straight diagonal line. Since the streams created by the stream maker are consecutive numbers 1,...,n a straight diagonal is the optimal result.

cat numbers.txt | python -k 16 -t int | gnuplot > plot16.eps; open plot16.eps
cat numbers.txt | python -k 32 -t int | gnuplot > plot32.eps; open plot32.eps
cat numbers.txt | python -k 64 -t int | gnuplot > plot64.eps; open plot64.eps

Keep in mind that larger values of k also increase the size of the sketch.