Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

The Apache SVN commits on Github

This repository contains the commits (aka 'revisions') extracted from the Apache SVN repository. It is meant to be used as scientific reference dataset for research and education in mining software repositories and empirical software engineering.

The goal of the repository is:

  1. to provide a stable dataset with a long-lived URL and SHA1-checksummed files
  2. to be able to search in commit messages thanks to Github search infrastructure (e.g. for npe)
  3. to save bandwidth of the Apache foundation

Each commit comes with the author name, the commit message, the date, and the list of changed files (added, modified, deleted).


On Oct 5 2015, the commits were extracted using svn log -v . This has resulted in 1,706,767 commits. They were then transformed into JSON files for Github to index them. Since Github has a limit of 500,000 files per repository for indexing, the JSON files are grouped by five.


The dataset is composed of two files:

  • apache-1-1000000-svn-log.xml.bz2 contains the first million of commits (SHA1: d156619cf2176cdea91eba8a20192cb6565f93ce)
  • apache-1000001-1706767-svn-log.xml.bz2 contains the rest up to revision r1706767 (SHA1: 8a83bbd4c112b60cea37961323cef16e11d002d0)

Data format

The apache-*-svn-log.xml are XML files containing self-explanatory <logentry>. In folder examples, there is a test file apache-1-1000-svn-log.xml containing the first 1000 commits.

<msg>added my details.</msg>

The JSON files contains the same info as JSON:

  "author": "senaka", 
  "date": "2010-09-22T14:48:18.695729Z", 
  "msg": [
   "added my details."
  "revision_id": "1000001"

To see the diff of a particular revision, say 1590251, simply visit

To checkout a particular revision, say 1000001, first idenfify the related project through the project name

$ svn diff --summarize -r <commit_id - 1>:<commit_id>
# example
$ svn diff --summarize -r 100000:100001

This shows that this comes from project httpd. Then checkout the revision:

$ svn checkout -r <commit_id><project-name>/trunk/
$ svn checkout -r 100001


Pull requests are welcome, in particular scripts that compute interesting things in folder scripts.


1,7 million commits of the main Apache SVN repository. Searchable thanks to Github.



No releases published
You can’t perform that action at this time.