Skip to content
Aggregate is a ruby implementation of a statistics aggregator including histogram support
Branch: master
Clone or download
Latest commit f697986 Apr 6, 2015
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib avoid allocating arrays for min/max calculation Feb 24, 2011
test Support more rubies Apr 3, 2015
.travis.yml Support more rubies Apr 3, 2015
Rakefile Update to minitest Apr 3, 2015
VERSION Version bump to 0.2.2 Mar 5, 2011
aggregate.gemspec Regenerated gemspec for version 0.2.2 Mar 5, 2011



By Joseph Ruscio

Aggregate is an intuitive ruby implementation of a statistics aggregator
including both default and configurable histogram support. It does this
without recording/storing any of the actual sample values, making it
suitable for tracking statistics across millions/billions of sample
without any impact on performance or memory footprint. Originally
inspired by the Aggregate support in SystemTap.

Getting Started

Aggregates are easy to instantiate, populate with sample data, and then
inspect for common aggregate statistics:

#After instantiation use the << operator to add a sample to the aggregate:
stats =

loop do
  # Take some action that generates a sample measurement
  stats << sample

# The number of samples

# The average

# Max sample value

# Min sample value

# The standard deviation


Perhaps more importantly than the basic aggregate statistics detailed above
Aggregate also maintains a histogram of samples. For anything other than
normally distributed data are insufficient at best and often downright misleading
37Signals recently posted a terse but effective explanation of the importance of histograms.
Aggregates maintains its histogram internally as a set of “buckets”.
Each bucket represents a range of possible sample values. The set of all buckets
represents the range of “normal” sample values.

Binary Histograms

Without any configuration Aggregate instance maintains a binary histogram, where
each bucket represents a range twice as large as the preceding bucket i.e.
[1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
provides for 128 buckets, theoretically covering the range [1, (2^127) – 1]
(See NOTES below for a discussion on the effects in practice of insufficient

Binary histograms are useful when we have little idea about what the
sample distribution may look like as almost any positive value will
fall into some bucket. After using binary histograms to determine
the coarse-grained characteristics of your sample space you can
configure a linear histogram to examine it in closer detail.

Linear Histograms

Linear histograms are specified with the three values low, high, and width.
Low and high specify a range [low, high) of values included in the
histogram (all others are outliers). Width specifies the number of
values represented by each bucket and therefore the number of
buckets i.e. granularity of the histogram. The histogram range
(high – low) must be a multiple of width:

#Want to track aggregate stats on response times in ms
response_stats =, 2000, 50)

The example above creates a linear histogram that tracks the
response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
most of your samples fall in the first couple buckets!

Histogram Outliers

An Aggregate records any samples that fall outside the histogram range as

# Number of samples that fall below the normal range

# Number of samples that fall above the normal range

Histogram Iterators

Once a histogram is populated Aggregate provides iterator support for
examining the contents of buckets. The iterators provide both the
number of samples in the bucket, as well as its range:

#Examine every bucket
@stats.each do |bucket, count|

#Examine only buckets containing samples
@stats.each_nonzero do |bucket, count|

Histogram Bar Chart

Finally Aggregate contains sophisticated pretty-printing support to generate
ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
sample distribution the to_s method properly sets a marker weight based on the
samples per bucket and aligns all output. Empty buckets are skipped to conserve
screen space.

# Generate and display an 80 column histogram
puts stats.to_s

# Generate and display a 120 column histogram
puts stats.to_s(120)

This code example populates both a binary and linear histogram with the same
set of 65536 values generated by rand to produce the
two histograms that follow it:

require 'rubygems'
require 'aggregate'

# Create an Aggregate instance
binary_aggregate =
linear_aggregate =, 65536, 8192)

65536.times do
  x = rand(65536)
  binary_aggregate << x
  linear_aggregate << x

puts binary_aggregate.to_s
puts linear_aggregate.to_s

Binary Histogram

value |------------------------------------------------------------------| count
    1 |                                                                  |     3
    2 |                                                                  |     1
    4 |                                                                  |     5
    8 |                                                                  |     9
   16 |                                                                  |    15
   32 |                                                                  |    29
   64 |                                                                  |    62
  128 |                                                                  |   115
  256 |                                                                  |   267
  512 |@                                                                 |   523
 1024 |@                                                                 |   970
 2048 |@@@                                                               |  1987
 4096 |@@@@@@@@                                                          |  4075
 8192 |@@@@@@@@@@@@@@@@                                                  |  8108
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                                  | 16405
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
Total |------------------------------------------------------------------| 65535

Linear (0, 65536, 4096) Histogram

value |------------------------------------------------------------------| count
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4094
 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|  4202
 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4118
12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4059
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |  3999
20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4083
24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4134
28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4143
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4152
36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4033
40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4064
45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4012
49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4070
53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4090
57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4135
61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4144
Total |------------------------------------------------------------------| 65532

We can see from these histograms that Ruby’s rand function does a relatively good
job of distributing returned values in the requested range.


Here’s an example of a handy timing benchmark
implemented with aggregate.


Ruby doesn’t have a log2 function built into Math, so we approximate with
log(x)/log(2). Theoretically log( 2^n – 1 )/ log(2) == n-1. Unfortunately due
to precision limitations, once n reaches a certain size (somewhere > 32)
this starts to return n. The larger the value of n, the more numbers i.e.
(2^n – 2), (2^n – 3), etc fall trap to this errors. Could probably look into
using something like BigDecimal, but for the current purposes of the binary
histogram i.e. a simple coarse-grained view the current implementation is

You can’t perform that action at this time.