Open-source developers all over the world are working on millions of projects: writing code & documentation, fixing & submitting bugs, and so forth. GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
GitHub provides 18 event types, which range from new commits and fork events, to opening new tickets, commenting, and adding members to a project. The activity is aggregated in hourly archives, which you can access with any HTTP client:
|Activity for March 11, 2012 at 3PM PST||
|Activity for March 11, 2012||
|Activity for March 2012||
Note: timeline data is available starting March 11, 2012.
Each archive contains a stream of JSON encoded GitHub events (sample), which you can process in any language. Ruby example:
require 'open-uri' require 'zlib' require 'yajl' gz = open('http://data.githubarchive.org/2012-03-11-12.json.gz') js = Zlib::GzipReader.new(gz).read Yajl::Parser.parse(js) do |event| print event end
(MIT License) - Copyright (c) 2012 Ilya Grigorik