Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: 54d5e35b51
Fetching contributors…

Cannot retrieve contributors at this time

35 lines (25 sloc) 1.066 kb

Google BigQuery + Github Archive

Google BigQuery is a web service that lets you do interactive analysis of massive datasets—up to billions of rows.

The Github Activity stream is automatically uploaded to BigQuery sevice to enable interactive analysis.

Sample Queries

/* count the number of events by type */
SELECT type, count(type) as total
    FROM github.events
    GROUP BY type
    ORDER BY total desc;

/* find the most watched repositories */
SELECT repository_name, count(repository_name) as new_watchers
    FROM github.events
    WHERE type = "WatchEvent"
    GROUP BY repository_name
    ORDER BY new_watchers desc;

For full schema of available fields to select, order, and group by, see schema.js.

Manually loading the data

If you want to load the archive data into your own BigQuery project:

$> wget http://data.githubarchive.org/2012-03-11-15.json.gz
$> ruby transform.rb -i 2012-03-11-15.json.gz
$> python bq.py --apilog true load github.events 2012-03-11-15.json.gz-out.csv.gz schema.js
Jump to Line
Something went wrong with that request. Please try again.