A utility script to create a distribution and obtain percentile values from a file containing series of <value,freq> pairs.
Reads each line from stdin, where each line is of one or two formats:
1. <to-be-ignored> <value>
2. <value> <freq>

This script outputs the distributions and various statistics of a group of such lines, on stdout as a comma separated set of lines. it has been tested to work on hundreds of millions of lines (which at the time of this writing take a few minutes on my laptop).

Example invocation:

#input.txt: a file containing values of type 1 above where the first column is ignored. We use the '-v'
#option for that format. Also, if you want the separator to be tab, on shell type Ctrl-v followed by <tab> key.
distribution.rb -v -t'   ' -p percentiles.txt < out_edges.txt