A cross between Awk, a spreadsheet, and a relational database. A command line 'language' for statistical analysis.
Latest commit 19a9398 Feb 27, 2009 pivotal dunno
Failed to load latest commit information.
test temporary Feb 27, 2009
README marketing material Jan 19, 2009
ss dunno Feb 27, 2009


# Introducing SS

I do a lot of "forensics" at Twitter. That is, I analyze data to debug latency and throughput bottlenecks in the system. One of the most basic operations I perform is to watch the average, median, and standard deviation of various data.

SS is a tool to assist interactively watching (columnar) data in real-time. SS is a real-time interactive tool

Suppose you have input of the format:

1   2   3
4   5   6
7   8   9

To select specific columns, you can:

    % ss -c1,2,3 < input

Which is equivalent to `awk '{print $1, $2, $3} < input`.

Ranges are supported like:

    % ss -c1..3 < input
As well as negative indices:

    % ss -c-1 < input
Which in this case is equivalent to `ss -c3 < input`

You can select particular rows:

    % ss -r1,2 < input

Which will output only those rows 1 and 2.

You can compute aggregation functions, such as:

    % ss -c'stddev(1)' < input
The thing to bear in mind is you watch these aggregations change in real time. This is a real-time interactive tool.
Aggregation functions include avg, max, min, sum, count, stddev, and q (don't ask). Median is not yet supported, I'm still searching the internets for a dynamic linear-time algorithm. If I can't find one I'll do the O(n log n) one.

Features not yet supported but soon to be added include:

* Summarize all the data:

output all aggregation data (mean, median, stddev, etc.) after reaching EOF

* Selection:

add "where clauses"

    % ss -w'$1 == 5' < input
Which will filter out all rows where the first column is not equal to `5`.

* Arbitrary projections:

    % ss -c'$1 + $2'

This would output the sum of columns 1 and 2.