Copyright (c) 2012 IBM Corporation
This is primarily a research tool that can connect to github.com/languages and save a list of the top projects per language. It is developed under the auspices of a joint study agreement between IBM Research and the University of Nebraska-Lincoln.
Patrick Wagstrom. LicenseThe primary developer of this software is
This tool is licensed under the terms of the Apache Software License v2.0.
This tool uses maven to build and manage dependencies. Use the following command to automatically download the dependencies and compile the software.
mvn clean compile package
The defaults for the script are pretty sane. You can just type:
If you would rather specify a configuration file use:
./github.sh -c [CONFIGURATION]
for the names of properties. The default values for those properties can be found in
Properties that are not set in the configuration file will use default values.
In general you can get by with a simple configuration file like this:
Everything else should magically work.
The schema is currently found in the project source code. See the top of
for the declarations of all of the tables.
I Don't Want to Run this Scraper
That's great to hear. You probably shouldn't. I've been running this scraper for a much longer time and can just provide you the data I've been collecting as a Derby database. Unfortunately, there are times when github changes how it works and I missed updates for a few months at a time. But hey, it's better than having to run the scraper again. Feel free to email me and I can get you a copy of this data.
If you have found a bug, please file an issue on github. If you then are able to patch the bug yourself, please create a pull request on github. If you have other contributions you think might be helpful feel free to create a pull request on github, although it might be helpful to contact me with your idea first.