Permalink
Browse files

formatted README with textile

  • Loading branch information...
1 parent de11e68 commit d65c83ccc3d00f76790e717e72bd250b0453b729 @josephruscio committed Sep 13, 2009
Showing with 60 additions and 19 deletions.
  1. +60 −19 README → README.textile
@@ -1,13 +1,20 @@
+h1. Aggregate
+
+By Joseph Ruscio
+
Aggregate is an intuitive ruby implementation of a statistics aggregator
including both default and configurable histogram support. It does this
without recording/storing any of the actual sample values, making it
suitable for tracking statistics across millions/billions of sample
without any impact on performance or memory footprint. Originally
-inspired by the Aggregate support in SystemTap (http://sourceware.org/systemtap/)
+inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
+
+h2. Getting Started
Aggregates are easy to instantiate, populate with sample data, and examine
statistics:
+<pre><code>
#After instantiation use the << operator to add a sample to the aggregate:
stats = Aggregate.new
@@ -30,14 +37,21 @@ stats.min
# The standard deviation
stats.std_dev
+</code></pre>
+
+h2. Histograms
Perhaps more importantly than the basic aggregate statistics detailed above
-Aggregate also maintains a histogram of samples. Good explanation of why
-its important: http://37signals.com/svn/posts/1836-the-problem-with-averages
+Aggregate also maintains a histogram of samples. For anything other than
+normally distributed data are insufficient at best and often downright misleading
+37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
+Aggregates maintains its histogram internally as a set of "buckets".
+Each bucket represents a range of possible sample values. The set of all buckets
+represents the range of "normal" sample values.
-The histogram is maintained as a set of "buckets". Each bucket represents a
-range of possible sample values. The set of all buckets represents the range
-of "normal" sample values. By default this is a binary histogram, where
+h3. Binary Histograms
+
+Without any configuration Aggregate instance maintains a binary histogram, where
each bucket represents a range twice as large as the preceding bucket i.e.
[1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
@@ -50,53 +64,74 @@ fall into some bucket. After using binary histograms to determine
the coarse-grained characteristics of your sample space you can
configure a linear histogram to examine it in closer detail.
+h3. Linear Histograms
+
Linear histograms are specified with the three values low, high, and width.
Low and high specifiy a range [low, high) of values included in the
histogram (all others are outliers). Width specifies the number of
values represented by each bucket and therefore the number of
buckets i.e. granularity of the histogram. The histogram range
(high - low) must be a multiple of width:
+<pre><code>
#Want to track aggregate stats on response times in ms
response_stats = Aggregate.new(0, 2000, 50)
+</code></pre>
The example above creates a linear histogram that tracks the
response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
-most of your samples fall in the first couple buckets! Any values added to the
-aggregate that fall outside of the histogram range are recorded as outliers:
+most of your samples fall in the first couple buckets!
+
+h3. Histogram Outliers
+An Aggregate records any samples that fall outside the histogram range as
+outliers:
+
+<pre><code>
# Number of samples that fall below the normal range
stats.outliers_low
# Number of samples that fall above the normal range
stats.outliers_high
+</code></pre>
+
+h3. Histogram Iterators
Once a histogram is populated Aggregate provides iterator support for
examining the contents of buckets. The iterators provide both the
number of samples in the bucket, as well as its range:
+<pre><code>
#Examine every bucket
@stats.each do |bucket, count|
end
#Examine only buckets containing samples
@stats.each_nonzero do |bucket, count|
end
+</code></pre>
-Finally Aggregate contains sophisticated pretty-printing support that for
-any given number of columns >= 80 (defaults to 80) and sample distribution
-properly sets a marker weight based on the samples per bucket and aligns all
-output. Empty buckets are skipped to conserve screen space.
+h3. Histogram Bar Chart
+Finally Aggregate contains sophisticated pretty-printing support to generate
+ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
+sample distribution the <code>to_s</code> method properly sets a marker weight based on the
+samples per bucket and aligns all output. Empty buckets are skipped to conserve
+screen space.
+
+<pre><code>
# Generate and display an 80 column histogram
puts stats.to_s
# Generate and display a 120 column histogram
puts stats.to_s(120)
+</code></pre>
-The following code populates both a binary and linear histogram with the same
-set of 65536 values generated by rand to produce two histograms:
+This code example populates both a binary and linear histogram with the same
+set of 65536 values generated by <code>rand</code> to produce the
+two histograms that follow it:
+<pre><code>
require 'rubygems'
require 'aggregate'
@@ -112,9 +147,11 @@ end
puts binary_aggregate.to_s
puts linear_aggregate.to_s
+</code></pre>
+
+h4. Binary Histogram
-** OUTPUT **
-** Binary Histogram**
+<pre><code>
value |------------------------------------------------------------------| count
1 | | 3
2 | | 1
@@ -134,8 +171,11 @@ value |------------------------------------------------------------------| count
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
~
Total |------------------------------------------------------------------| 65535
+</code></pre>
-** Linear (0, 65536, 4096) Histogram **
+h4. Linear (0, 65536, 4096) Histogram
+
+<pre><code>
value |------------------------------------------------------------------| count
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
@@ -154,11 +194,12 @@ value |------------------------------------------------------------------| count
57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
Total |------------------------------------------------------------------| 65532
-
+</code></pre>
We can see from these histograms that Ruby's rand function does a relatively good
job of distributing returned values in the requested range.
-** NOTES **
+h2. NOTES
+
Ruby doesn't have a log2 function built into Math, so we approximate with
log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
to precision limitations, once n reaches a certain size (somewhere > 32)

0 comments on commit d65c83c

Please sign in to comment.