A cross between Awk, a spreadsheet, and a relational database. A command line 'language' for statistical analysis.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


# Introducing SS

I do a lot of "forensics" at Twitter. That is, I analyze data to debug latency and throughput bottlenecks in the system. One of the most basic operations I perform is to watch the average, median, and standard deviation of various data.

SS is a tool to assist interactively watching (columnar) data in real-time. SS is a real-time interactive tool

Suppose you have input of the format:

1   2   3
4   5   6
7   8   9

To select specific columns, you can:

    % ss -c1,2,3 < input

Which is equivalent to `awk '{print $1, $2, $3} < input`.

Ranges are supported like:

    % ss -c1..3 < input
As well as negative indices:

    % ss -c-1 < input
Which in this case is equivalent to `ss -c3 < input`

You can select particular rows:

    % ss -r1,2 < input

Which will output only those rows 1 and 2.

You can compute aggregation functions, such as:

    % ss -c'stddev(1)' < input
The thing to bear in mind is you watch these aggregations change in real time. This is a real-time interactive tool.
Aggregation functions include avg, max, min, sum, count, stddev, and q (don't ask). Median is not yet supported, I'm still searching the internets for a dynamic linear-time algorithm. If I can't find one I'll do the O(n log n) one.

Features not yet supported but soon to be added include:

* Summarize all the data:

output all aggregation data (mean, median, stddev, etc.) after reaching EOF

* Selection:

add "where clauses"

    % ss -w'$1 == 5' < input
Which will filter out all rows where the first column is not equal to `5`.

* Arbitrary projections:

    % ss -c'$1 + $2'

This would output the sum of columns 1 and 2.