-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where is the best layer to add throttling to avoid being marked for abuse? #431
Comments
To be honest, every time I write a big API-dependent job, I do this: githubDelay = 720 * time.Millisecond // 5000 QPH = 83.3 QPM = 1.38 QPS = 720ms/query and then call I know this doesn't really help much... but is just another data point. |
Shortly after I wrote the comment above, I switched to using a time.Ticker: const githubRate = 720 * time.Millisecond
...
ticker := time.NewTicker(githubRate)
...
<-ticker.C // Don't hammer GitHub API
// call to GitHub API which allows my tool to run for potentially hours and not abuse the rate limit. |
It's similar to RateLimitError, but it's a different type of rate limit error. It is documented at: https://developer.github.com/v3/#abuse-rate-limits Parse and include the Retry-After header value, if present. It is documented at: https://developer.github.com/guides/best-practices-for-integrators/#dealing-with-abuse-rate-limits According to GitHub support, its type is an integer value representing seconds. Helps google#431.
We do this because the Github API triggers abuse warnings if a single user makes lots of concurrent requests. By having a shared ticker, we ensure that Github API requests don't happen in parallel. Moreover, we also ensure that there's a global rate limit of 1 request to the GithubAPI per 720 milliseconds. We're following the approach shown here: google/go-github#431 This means that we'll stay within the rate limit requirement of 5000 requests-per-hour for authenticated users. ``` 5000 QPH = 83.3 QPM = 1.38 QPS = 720ms/query ```
We do this because the Github API triggers abuse warnings if a single user makes lots of concurrent requests. By having a shared ticker, we ensure that Github API requests don't happen in parallel. Moreover, we also ensure that there's a global rate limit of 1 request to the GithubAPI per 720 milliseconds. We're following the approach shown here: google/go-github#431 This means that we'll stay within the rate limit requirement of 5000 requests-per-hour for authenticated users. ``` 5000 QPH = 83.3 QPM = 1.38 QPS = 720ms/query ```
Here's how we implemented linked best practices in our repository: I'd be more than happy to raise a PR if there's any interest in bringing these transports upstream along with tests. I don't expect everyone will want/need to use them, but it would be IMO nice to keep it upstream. |
Closing issue for lack of activity/interest. Feel free to reopen. |
GitHub API has some guidelines to follow when accessing their API to avoid hitting abuse rate limit:
https://developer.github.com/guides/best-practices-for-integrators/#dealing-with-abuse-rate-limits
An application should follow those rules. Some of them can be enforced in a general/automated way via throttling, I think, and so they can be factored out into a library (rather than having to be done manually inside the application layer). Specifically:
If we can assume a single
*github.Client
maps to a single user or client ID, then this can be done by using a mutex or semaphore or similar.Again, if we can assume a single
*github.Client
maps to a single user or client ID, then this can be done by keeping track of the lastPOST
,PATCH
,PUT
, orDELETE
request time, and sleeping if neccessary to make the next one happen at least one second later.We can detect abuse rate limit response (it's documented here), read the number of seconds specified in the
Retry-After
response header, and not allow making network requests until that time has passed (probably returning*RateLimitError
error, or an abuse-specific version thereof, right away for all requests).I see two places where this can be done:
Client
itself. Probably as a configurable option.So far, go-github has been a relatively "low-level" API client, meaning it helps you out with type safety and provides structured output, but aside from that, it maps relatively closely to 1:1 to making HTTP requests to GitHub API.
It has taken some steps to be a smarter/higher-level client where it could be done so transparently, without blocking. For example, after #277,
RateLimitError
type was added, and after #347, go-github is smart enough to track rate limit reset time and not make network calls when the rate limit is known to still be exceeded, returningRateLimitError
right away.So far it has also been a "return-right-away" API, where none of the calls are artificially throttled on the client side, they just return an error if there's a rate limit problem. So it seems adding throttling to the go-github client itself may be out of place, and would belong in a higher-level wrapper around go-github.
On the other hand, if done as an option that can be controlled, perhaps this is the best place to do these things, since it means more people can use the GitHub API correctly with less work from their side, by default. The user of the github client could make calls concurrently, but they would block and get serialized by the client as to follow the GH API guidelines. It would effectively shift some of the throttling that would otherwise be done on GitHub server side, to the go-github library client side.
@willnorris, @gmlewis, what are your thoughts on this? Where is the best place for this functionality?
Related issues: #152, #153, #277, #347, #304.
The text was updated successfully, but these errors were encountered: