A utility script to create a distribution and obtain percentile values from a file containing series of <value,freq> pairs.
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Reads each line from stdin, where each line is of one or two formats:
1. <to-be-ignored> <value>
2. <value> <freq>

This script outputs the distributions and various statistics of a group of such lines, on stdout as a comma separated set of lines. it has been tested to work on hundreds of millions of lines (which at the time of this writing take a few minutes on my laptop).

Example invocation:

#input.txt: a file containing values of type 1 above where the first column is ignored. We use the '-v'
#option for that format. Also, if you want the separator to be tab, on shell type Ctrl-v followed by <tab> key.
distribution.rb -v -t'   ' -p percentiles.txt < out_edges.txt