Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

formatted README with textile

  • Loading branch information...
commit d65c83ccc3d00f76790e717e72bd250b0453b729 1 parent de11e68
@josephruscio authored
Showing with 60 additions and 19 deletions.
  1. +60 −19 README → README.textile
View
79 README → README.textile
@@ -1,13 +1,20 @@
+h1. Aggregate
+
+By Joseph Ruscio
+
Aggregate is an intuitive ruby implementation of a statistics aggregator
including both default and configurable histogram support. It does this
without recording/storing any of the actual sample values, making it
suitable for tracking statistics across millions/billions of sample
without any impact on performance or memory footprint. Originally
-inspired by the Aggregate support in SystemTap (http://sourceware.org/systemtap/)
+inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
+
+h2. Getting Started
Aggregates are easy to instantiate, populate with sample data, and examine
statistics:
+<pre><code>
#After instantiation use the << operator to add a sample to the aggregate:
stats = Aggregate.new
@@ -30,14 +37,21 @@ stats.min
# The standard deviation
stats.std_dev
+</code></pre>
+
+h2. Histograms
Perhaps more importantly than the basic aggregate statistics detailed above
-Aggregate also maintains a histogram of samples. Good explanation of why
-its important: http://37signals.com/svn/posts/1836-the-problem-with-averages
+Aggregate also maintains a histogram of samples. For anything other than
+normally distributed data are insufficient at best and often downright misleading
+37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
+Aggregates maintains its histogram internally as a set of "buckets".
+Each bucket represents a range of possible sample values. The set of all buckets
+represents the range of "normal" sample values.
-The histogram is maintained as a set of "buckets". Each bucket represents a
-range of possible sample values. The set of all buckets represents the range
-of "normal" sample values. By default this is a binary histogram, where
+h3. Binary Histograms
+
+Without any configuration Aggregate instance maintains a binary histogram, where
each bucket represents a range twice as large as the preceding bucket i.e.
[1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
@@ -50,6 +64,8 @@ fall into some bucket. After using binary histograms to determine
the coarse-grained characteristics of your sample space you can
configure a linear histogram to examine it in closer detail.
+h3. Linear Histograms
+
Linear histograms are specified with the three values low, high, and width.
Low and high specifiy a range [low, high) of values included in the
histogram (all others are outliers). Width specifies the number of
@@ -57,24 +73,35 @@ values represented by each bucket and therefore the number of
buckets i.e. granularity of the histogram. The histogram range
(high - low) must be a multiple of width:
+<pre><code>
#Want to track aggregate stats on response times in ms
response_stats = Aggregate.new(0, 2000, 50)
+</code></pre>
The example above creates a linear histogram that tracks the
response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
-most of your samples fall in the first couple buckets! Any values added to the
-aggregate that fall outside of the histogram range are recorded as outliers:
+most of your samples fall in the first couple buckets!
+
+h3. Histogram Outliers
+An Aggregate records any samples that fall outside the histogram range as
+outliers:
+
+<pre><code>
# Number of samples that fall below the normal range
stats.outliers_low
# Number of samples that fall above the normal range
stats.outliers_high
+</code></pre>
+
+h3. Histogram Iterators
Once a histogram is populated Aggregate provides iterator support for
examining the contents of buckets. The iterators provide both the
number of samples in the bucket, as well as its range:
+<pre><code>
#Examine every bucket
@stats.each do |bucket, count|
end
@@ -82,21 +109,29 @@ end
#Examine only buckets containing samples
@stats.each_nonzero do |bucket, count|
end
+</code></pre>
-Finally Aggregate contains sophisticated pretty-printing support that for
-any given number of columns >= 80 (defaults to 80) and sample distribution
-properly sets a marker weight based on the samples per bucket and aligns all
-output. Empty buckets are skipped to conserve screen space.
+h3. Histogram Bar Chart
+Finally Aggregate contains sophisticated pretty-printing support to generate
+ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
+sample distribution the <code>to_s</code> method properly sets a marker weight based on the
+samples per bucket and aligns all output. Empty buckets are skipped to conserve
+screen space.
+
+<pre><code>
# Generate and display an 80 column histogram
puts stats.to_s
# Generate and display a 120 column histogram
puts stats.to_s(120)
+</code></pre>
-The following code populates both a binary and linear histogram with the same
-set of 65536 values generated by rand to produce two histograms:
+This code example populates both a binary and linear histogram with the same
+set of 65536 values generated by <code>rand</code> to produce the
+two histograms that follow it:
+<pre><code>
require 'rubygems'
require 'aggregate'
@@ -112,9 +147,11 @@ end
puts binary_aggregate.to_s
puts linear_aggregate.to_s
+</code></pre>
+
+h4. Binary Histogram
-** OUTPUT **
-** Binary Histogram**
+<pre><code>
value |------------------------------------------------------------------| count
1 | | 3
2 | | 1
@@ -134,8 +171,11 @@ value |------------------------------------------------------------------| count
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
~
Total |------------------------------------------------------------------| 65535
+</code></pre>
-** Linear (0, 65536, 4096) Histogram **
+h4. Linear (0, 65536, 4096) Histogram
+
+<pre><code>
value |------------------------------------------------------------------| count
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
@@ -154,11 +194,12 @@ value |------------------------------------------------------------------| count
57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
Total |------------------------------------------------------------------| 65532
-
+</code></pre>
We can see from these histograms that Ruby's rand function does a relatively good
job of distributing returned values in the requested range.
-** NOTES **
+h2. NOTES
+
Ruby doesn't have a log2 function built into Math, so we approximate with
log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
to precision limitations, once n reaches a certain size (somewhere > 32)
Please sign in to comment.
Something went wrong with that request. Please try again.