-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect rate limit, and fix warnings, etc. #21
Conversation
Perform certain checks only if the corresponding variables are true anyway. Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
If the user-agent header is not set, the request is denied on some machines. Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Recently, GitHub has added a rate limit for its GraphQL API: https://docs.github.com/en/graphql/overview/resource-limitations According to their description, BoDeGHa will use 100 * 100 * 2 = 2000 queries (last 100 PRs * 100 comments + last 100 issues * 100 comments) for one 'download_comments' call. As 100 queries end up in a rate limit score of 1, BoDeGHa's call ends up in a rate limit score of 200 per 'download_comments' call. According to GitHub's current limit, this would allow less then 25 'download_comments' calls within one hour. To be fail save (and account for counting the requests wrongly), limit BoDeGHa to less than 20 'download_comments' calls per hour and wait more than one hour afterwards before the next call is started. Also if the error occurs earlier than expected, wait one hour until the next try is started. The limits set in this commit are a little bit lower than what GitHub's documentation allows. However, this is done on purpose, as lots of test runs showed that calculating the score is not fully reliable and stopping earlier is really benefitial to make sure that BoDeGHa runs without errors. Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Replace deprecated 'append' of DataFrames by 'concat'. Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Lgtm. Perhaps a message could be displayed in case of "sleep" so that users know why it takes so long :-) @mehdigolzadeh I let you decide on this pr |
Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>
Good idea! I've added some messages before and after the sleep. |
Thanks! I emailed Mehdi Golzadeh to warn him about this PR. There's one thing that I think can be improved (not necessarily as part of this PR). I'm not at all familiar with the GraphQL API, but for the REST API, the response's headers include some information about the rate limit (see https://docs.github.com/en/rest/overview/resources-in-the-rest-api?apiVersion=2022-11-28#rate-limit-headers) Another potential improvement (again, not necessarily as part of this PR) could be to support multiple API keys and to round-robin on them when one has its limit exceeded (but, while convenient, I do not really like this idea because it somehow "cheats" with api rate limitations). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks, @bockthom, for your contribution. I am going to merge this PR. @AlexandreDecan, good suggestion; I know GraphQL also returns such info. I will try to further improve the code to only wait until limit_reset as soon as I find some free time. |
As GitHub has recently adjusted their rate limit for its GraphQL API (see https://docs.github.com/en/graphql/overview/resource-limitations), BoDeGHa runs into errors when the number of requests per hour exceeds the rate limit per hour. With this PR, I provide a fix in which BoDeGHa waits for more than one hour and retries sending the request afterwards. For more details, please see the commit message of the corresponding commit (04fd12a).
In addition, I fixed a few minor issues and pandas warnings that blow up the logs (i.e., deprecated
append
calls for DataFrames).