This repo includes a bunch of helper functions which are wrappers over github api v3 and a disco job that crawls github.
- Use a better way for sampling(e.g. snowball sampling)
- Use OAuth to increase the rate limit.
- Handle exceeding rate limit and sleep.
- Prune the final output only the extensions we care about.