Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect rate limit, and fix warnings, etc. #21

Merged
merged 5 commits into from
May 12, 2023

Conversation

bockthom
Copy link
Contributor

@bockthom bockthom commented May 2, 2023

As GitHub has recently adjusted their rate limit for its GraphQL API (see https://docs.github.com/en/graphql/overview/resource-limitations), BoDeGHa runs into errors when the number of requests per hour exceeds the rate limit per hour. With this PR, I provide a fix in which BoDeGHa waits for more than one hour and retries sending the request afterwards. For more details, please see the commit message of the corresponding commit (04fd12a).

In addition, I fixed a few minor issues and pandas warnings that blow up the logs (i.e., deprecated append calls for DataFrames).

Perform certain checks only if the corresponding variables are true anyway.

Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
If the user-agent header is not set, the request is denied on some machines.

Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Recently, GitHub has added a rate limit for its GraphQL API:
https://docs.github.com/en/graphql/overview/resource-limitations

According to their description, BoDeGHa will use 100 * 100 * 2 = 2000 queries
(last 100 PRs * 100 comments + last 100 issues * 100 comments) for one
'download_comments' call. As 100 queries end up in a rate limit score of 1,
BoDeGHa's call ends up in a rate limit score of 200 per 'download_comments'
call. According to GitHub's current limit, this would allow less then 25
'download_comments' calls within one hour.

To be fail save (and account for counting the requests wrongly), limit BoDeGHa
to less than 20 'download_comments' calls per hour and wait more than one hour
afterwards before the next call is started. Also if the error occurs earlier
than expected, wait one hour until the next try is started.

The limits set in this commit are a little bit lower than what GitHub's
documentation allows. However, this is done on purpose, as lots of test runs
showed that calculating the score is not fully reliable and stopping earlier is
really benefitial to make sure that BoDeGHa runs without errors.

Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Replace deprecated 'append' of DataFrames by 'concat'.

Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
@AlexandreDecan
Copy link
Collaborator

Lgtm. Perhaps a message could be displayed in case of "sleep" so that users know why it takes so long :-)

@mehdigolzadeh I let you decide on this pr

@AlexandreDecan AlexandreDecan added the enhancement New feature or request label May 4, 2023
Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
@bockthom
Copy link
Contributor Author

Perhaps a message could be displayed in case of "sleep" so that users know why it takes so long :-)

Good idea! I've added some messages before and after the sleep.

@AlexandreDecan
Copy link
Collaborator

Thanks! I emailed Mehdi Golzadeh to warn him about this PR.

There's one thing that I think can be improved (not necessarily as part of this PR). I'm not at all familiar with the GraphQL API, but for the REST API, the response's headers include some information about the rate limit (see https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#rate-limit-headers)
In particular, it shows the x-ratelimit-remaining which can be used, when a response is received, to check whether the next call has chance to be achieved, but also a x-ratelimit-reset indicating when the current rate limit expires. The value is expressed in UTC epoch seconds, so either this can be compared with the local UTC epoch seconds from the system, or with the Date header provided in GitHub's response, converted to UTC epoch seconds. That way, the call to sleep in the code can be fine-tuned to avoid having to wait an excessive amount of time.

Another potential improvement (again, not necessarily as part of this PR) could be to support multiple API keys and to round-robin on them when one has its limit exceeded (but, while convenient, I do not really like this idea because it somehow "cheats" with api rate limitations).

Copy link
Owner

@mehdigolzadeh mehdigolzadeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mehdigolzadeh
Copy link
Owner

Thanks, @bockthom, for your contribution. I am going to merge this PR. @AlexandreDecan, good suggestion; I know GraphQL also returns such info. I will try to further improve the code to only wait until limit_reset as soon as I find some free time.

@mehdigolzadeh mehdigolzadeh merged commit b86b574 into mehdigolzadeh:master May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants