Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts ignore GitHub API's pagination/rate limits, contain half of the work #1

Open
paravoid opened this issue Dec 13, 2014 · 2 comments

Comments

@paravoid
Copy link

The scripts are very naive: trying to consume a RESTful API with bash/curl isn't exactly a sound idea IMHO :)

For starters, greek-commiters.sh curls for followers/location, but ignores the fact that the results are rate limited and paginated. So, for example, followers>5 + Athens alone has 187 results (and "Athens, Greece" 144) , but the first API call (without a ?page= argument) returns 30 and the total users.txt has only 90 people. You're essentially missing half of your target set…

Moreover, just search for "Athens" isn't enough -- there's also an Athens in the state of Georgia in the United States; your result set includes users that have e.g. "Athens, GA" in their profile (for example, user "yegle"). The complete result set will have other permuations of that ("Athens, Georgia") as well as inconclusive results ("Athens"). Maybe you should only search for "Athens, Greece".

In general, I'd suggest something slightly more sophisticated, in a higher level language.

@tgkarounos
Copy link
Member

You are welcome to improve this script, if you have the time.

On Sat, Dec 13, 2014 at 7:36 PM, Faidon Liambotis notifications@github.com
wrote:

The scripts are very naive: trying to consume a RESTful API with bash/curl
isn't exactly a sound idea IMHO :)

For starters, greek-commiters.sh curls for followers/location, but ignores
the fact that the results are rate limited and paginated. So, for example,
followers>5 + Athens alone has 187 results (and "Athens, Greece" 144) , but
the first API call (without a ?page= argument) returns 30 and the total
users.txt has only 90 people. You're essentially missing half of your
target set…

Moreover, just search for "Athens" isn't enough -- there's also an Athens
in the state of Georgia in the United States; your result set includes
users that have e.g. "Athens, GA" in their profile (for example, user
"yegle"). The complete result set will have other permuations of that
("Athens, Georgia") as well as inconclusive results ("Athens"). Maybe you
should only search for "Athens, Greece".

In general, I'd suggest something slightly more sophisticated, in a higher
level language.


Reply to this email directly or view it on GitHub
#1.

http://www.eellak.gr/, http://mycontent.ellak.gr/,
http://www.creativecommons.gr/, http://mathe.ellak.gr/, http://ma.ellak.gr/

dspinellis added a commit that referenced this issue Dec 13, 2014
@dspinellis
Copy link
Contributor

Regarding rate limiting the update scripts pause sufficiently between requests. (Also an API key can be included.) The greek-committers.sh script is notional. It has already been executed and the results are included in the repo. You're probably right about the pagination and Athens, GA though. We'd welcome a pull request with corrected data/script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants