Collected by Patrick Wagstrom <>
This is a Neo4j Database database containing most of the data surrounding the Tinkerpop stack of projects on Github. The data were collected over a three day period in May 2012 using the GitMiner project. More information about how to use the data is avaialble on the GitMiner project page.
I once wrote a blog entry about this data. See that entry for more information. usage rights
It's just data, however it was produced under the auspices of a joint study agreement between IBM and the University of Nebraska-Lincoln, therefore the data are licensed are the terms of that agreement which requires that all output be licensed under the Apache License v2.0. I'm not certain what that means for data, but that's the rules.
If you're an academic and publish a paper using this data then a citation is highly appreciated (co-authorship is even more appreciated, I can help you make sense of the data). If you're looking for a citation use something like this:
Wagstrom, Patrick. "Tinkerpop GitMiner Dataset", available from, May 2012.
Also, shoot me an email so I can keep track of publications that use this data.