Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceeded GitHub API rate limit #410

Closed
Tracked by #407
liammulh opened this issue May 25, 2023 · 7 comments
Closed
Tracked by #407

Exceeded GitHub API rate limit #410

liammulh opened this issue May 25, 2023 · 7 comments

Comments

@liammulh
Copy link
Member

liammulh commented May 25, 2023

EDIT: Rosetta has been in maintenance mode since May 24th at ~5pm mountain time.

Last night, I noticed Rosetta was erroring.

info: getting english string keys and values
error: API rate limit exceeded for user ID [... more info here, but not including it because this repo is public]

Apparently, we have exceeded the GitHub API rate limit.

This GitHub doc says:

User access token requests are limited to 5,000 requests per hour and per authenticated user. All requests from OAuth applications authorized by a user or a personal access token owned by the user, and requests authenticated with any of the user's authentication credentials, share the same quota of 5,000 requests per hour for that user.

Later in that doc, it provides a way to check your rate limit:

curl -i https://api.github.com/users/octocat
x-ratelimit-limit: 60
x-ratelimit-remaining: 56
x-ratelimit-used: 4
x-ratelimit-reset: 1372700873

When I ran this command for the GitHub user @phet-dev, who I suspect owns the personal access token we use in Rosetta, I see:

curl -i https://api.github.com/users/phet-dev
x-ratelimit-limit: 60
x-ratelimit-remaining: 55
x-ratelimit-used: 5
x-ratelimit-reset: 1685038604

Am I misreading that doc? I thought we should have 5,000 requests per hour, assuming @phet-dev is an authenticated user with a personal access token. But apparently we only have 60, which I assume is the default.

The error I saw provided a user ID. Maybe I can use it to look up the user who owns the personal access token.

@liammulh
Copy link
Member Author

In this https://stackoverflow.com/a/30579888/15027348 SO answer, I discovered we can look up the user name with an ID by going to https://api.github.com/user/:id

https://api.github.com/user/34923065
{
  "login": "phet-dev",
  ...
}

Why is @phet-dev's rate limit 60? Is it not authenticated?

@liammulh
Copy link
Member Author

Here's another wrinkle: GitHub has two types of tokens. There are classic tokens (if I had to guess I would say @phet-dev's personal access token is classic) and then there are new "fine-grained" tokens.

@liammulh
Copy link
Member Author

liammulh commented May 25, 2023

Why is @phet-dev's rate limit 60? Is it not authenticated?

It's because we weren't providing the PAT in the curl. When we provide the PAT in the curl, it has 5000 requests.

@liammulh
Copy link
Member Author

liammulh commented May 26, 2023

@jbphet and I figured this out. We discovered that @phet-dev's GitHub personal access token (hereafter referred to as "the PAT") is specifically for Rosetta. It is not used for anything other than Rosetta. We also discovered that the PAT has a rate limit of 5000 requests per hour. We figured this out by entering the following command:

curl --request GET \
--url "https://api.github.com/users/phet-dev" \
--header "Authorization: Bearer <PAT goes here>" \
--header "X-GitHub-Api-Version: 2022-11-28" \
-i

The -i option is important. According to the curl man page, -i (or --include) does the following:

Include the HTTP response headers in the output. The HTTP response headers can include things like server name, cookies, date of the document, HTTP version and more...

One of the headers in the response is x-ratelimit-reset, which gives an epoch timestamp. It doesn't seem like it is reset at the start of a new hour, so we think it's a "rolling window" hour.

It would seem Rosetta is somehow exhausting all 5000 of its requests in an hour. @jbphet and I were skeptical that it could be making that many requests without some sort of bug. However, upon further investigation, we discovered it is possible for Rosetta to exhaust its 5000 GitHub API requests in one hour:

  • Suppose a sim has sim-specific strings, and common strings from scenery-phet, joist, and sun. (This is pretty typical.)
  • The translation report for that sim needs:
    • sim-specific English strings file,
    • sim-specific translated strings file,
    • common English strings file for scenery-phet,
    • common translated strings for scenery-phet,
    • common English strings file for joist,
    • common translated strings for joist,
    • common English strings file for sun, and
    • common translated strings for sun.

Thus, for a typical PhET sim, Rosetta needs to perform 8 GitHub API requests to create a translation report object with total translated strings and total strings. If there are 100 sims, a full translation report requires approximately 800 GitHub API requests. We are limited to 5000 GitHub API requests per hour, so we can only generate the translation report for 6 different locales. Of course, once the translation report has been generated, it is cached in Rosetta's memory.

The proper solution to this problem is to use MongoDB to store the translated strings. However, tomorrow is my last day of work at PhET, and it is unlikely anyone else will have the time to do this. @jbphet and I decided to do the following:

When Rosetta gets an API request for a translation report, it will check how many requests it has left in the hour. If that number is lower than, say, 900, then we send a 429 status code to the client-side code saying "sorry, we've reached GitHub's API rate limit". The client-side code will then put up some sort of banner saying "sorry, the translation stats aren't available right now, but you can still translate" and we will put "--" or something where the stats would usually go.

@liammulh
Copy link
Member Author

When Rosetta gets an API request for a translation report, it will check how many requests it has left in the hour. If that number is lower than, say, 900, then we send a 429 status code to the client-side code saying "sorry, we've reached GitHub's API rate limit". The client-side code will then put up some sort of banner saying "sorry, the translation stats aren't available right now, but you can still translate" and we will put "--" or something where the stats would usually go.

The implementation ended up being slightly different than this — we set up an API endpoint to check how many requests we have left, and if it was below 900, we told the client-side code to not display translation stats.

@liammulh
Copy link
Member Author

This is now deployed.

@jbphet
Copy link
Contributor

jbphet commented Jun 16, 2023

We had a meeting this week to talk about priorities in Rosetta, and during that meeting we brought up the interface and noticed that the throttling message was showing. I thought I'd take a look at the logs and see if this is occurring a lot. The answer is "not a lot, but it is happening". The logs go back until May 22 2023, so almost 4 weeks, and it has happened 3 times over that period. I think one of those times - the one on June 13 - was due to my testing related to #412 (comment), so I think that one can be ignored. The longest one was on June 14 starting at 13:12:39 lasted around 20 minutes, which doesn't seem to be too bad.

Bottom line: So far this seems to be working reasonably well and isn't causing too many problems.

Here is the raw data. This was generated by getting the Rosetta log and using the command grep -B 2 "should show stats: false" rlog.txt. The threshold for the number of remaining requests is currently set at 900.

Jun 13 11:36:09 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 795
Jun 13 11:36:09 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:14:12 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 370
Jun 13 12:14:12 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:16:13 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 359
Jun 13 12:16:13 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:16:35 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 358
Jun 13 12:16:35 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:18:54 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 351
Jun 13 12:18:54 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:20:33 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 344
Jun 13 12:20:33 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:21:35 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 333
Jun 13 12:21:35 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:21:56 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 332
Jun 13 12:21:56 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:23:38 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 325
Jun 13 12:23:38 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:24:58 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 314
Jun 13 12:24:58 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:26:16 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 307
Jun 13 12:26:16 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:26:37 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 298
Jun 13 12:26:37 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:26:59 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 287
Jun 13 12:26:59 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:27:15 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 276
Jun 13 12:27:15 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:27:30 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 267
Jun 13 12:27:30 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:27:52 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 260
Jun 13 12:27:52 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:28:28 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 249
Jun 13 12:28:28 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:28:46 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 238
Jun 13 12:28:46 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:29:48 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 227
Jun 13 12:29:48 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:32:51 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 194
Jun 13 12:32:51 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 13 12:33:11 phet-server2.int.colorado.edu rosetta[1677018]: info: remaining github requests: 193
Jun 13 12:33:11 phet-server2.int.colorado.edu rosetta[1677018]: info: should show stats: false
--
Jun 14 13:12:39 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 631
Jun 14 13:12:39 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 14 13:23:29 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 401
Jun 14 13:23:29 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 14 13:26:21 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 392
Jun 14 13:26:21 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 14 13:26:33 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 391
Jun 14 13:26:33 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 14 13:27:55 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 371
Jun 14 13:27:55 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 14 13:30:51 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 360
Jun 14 13:30:51 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 14 14:44:30 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 657
Jun 14 14:44:30 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 14 14:51:30 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 482
Jun 14 14:51:30 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 15 12:23:07 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 588
Jun 15 12:23:07 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 15 12:23:09 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 564
Jun 15 12:23:09 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false
--
Jun 15 12:25:19 phet-server2.int.colorado.edu rosetta[1122667]: info: remaining github requests: 24
Jun 15 12:25:19 phet-server2.int.colorado.edu rosetta[1122667]: info: should show stats: false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants