R on github
This project uses githubs code and repository api to collect information about all github repositories that use R.
Getting the data
Getting all repo data is a two step process:
1a-search.rruns searches to find all R language repos. Searches are done by month to overcome the current search API limits, and are cached in the
1b-repos.rtakes each repo found by the search and creates a list with 5 components:
info: general information about the repository. http://developer.github.com/v3/repos/#get
lng: languages used in the repo. http://developer.github.com/v3/repos/#list-languages
dir: a directory listing of all files and directories in the repo root. http://developer.github.com/v3/repos/contents/#get-contents
desc: if a
DESCRIPTIONfile is found, the result of parsing that file with
read.dcfand converting into a list
tags: any tags used by the repo. http://developer.github.com/v3/repos/#list-tags
The data on each repo is cached in
To update repos for the current month,
source("1b-repos.r"). You'll need to set your github user name and password
into environment variables
GITHUB_PASS. All requests are
appropriately throttled to stay within github's rate limits - this means that
downloading all repo info from scratch will take a number of hours.
Exploring the data
If you just want to use the already cached data, see
2-packages.r for example exploratory analyses.