Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/coordinator: add health check and graph to track hourly remaining GitHub API rate limit #44406

Closed
dmitshur opened this issue Feb 19, 2021 · 10 comments
Assignees
Milestone

Comments

@dmitshur
Copy link
Member

@dmitshur dmitshur commented Feb 19, 2021

While investigating #44404, I noticed in GopherBot's logs that it was failing to take some actions due to exceeding GitHub API rate limit quota:

$ kubectl logs -f gopherbot-deployment-6c6d86d5b9-s88c8 | grep "API rate limit"
[...]
2021/02/19 00:19:47 cl2issue: GET https://api.github.com/repos/golang/go/issues/44295/comments?per_page=1000&since=2021-02-19T00%3A14%3A00Z: 403 API rate limit of 5000 still exceeded until 2021-02-19 00:47:55 +0000 UTC, not making remote request. [rate reset in 28m08s]
[...]

When rate limit is exceeded, GopherBot stops being reliable for its users, and regular maintenance tasks do not occur.

This may be related to heavy activity, or perhaps it's caused by increased deterioration of issue #28320. This issue is to keep an eye on big of a problem it is and what we need to do here.

CC @golang/release.

@dmitshur dmitshur added this to the Unreleased milestone Feb 19, 2021
@gopherbot gopherbot added the Builders label Feb 19, 2021
@dmitshur dmitshur changed the title x/build/cmd/gopherbot: GopherBot may exceed rate limit x/build/cmd/gopherbot: may exceed GitHub API rate limit Feb 19, 2021
@dmitshur
Copy link
Member Author

@dmitshur dmitshur commented Feb 19, 2021

Based on golang/crypto#143 (comment), this issue might be affecting more services than just GopherBot.

@dmitshur
Copy link
Member Author

@dmitshur dmitshur commented Mar 1, 2021

This is happening today too, affecting the "close cherry pick issues" task.

2021/03/01 20:54:36 close cherry pick issues: GET https://api.github.com/repos/golang/go/issues/44464/comments?per_page=1000&since=2021-02-26T10%3A29%3A53Z: 403 API rate limit of 5000 still exceeded until 2021-03-01 20:56:13 +0000 UTC, not making remote request. [rate reset in 1m43s]
@dmitshur dmitshur added this to Planned in Go Release Team Mar 2, 2021
@dmitshur
Copy link
Member Author

@dmitshur dmitshur commented Mar 15, 2021

From GerritBot logs today:

2021/03/15 17:47:59 getFullPR(ctx, "golang", "website", 40): b.githubClient.Do: GET https://api.github.com/repos/golang/website/pulls/40: 403 API rate limit of 5000 still exceeded until 2021-03-15 18:05:45 +0000 UTC, not making remote request. [rate reset in 17m45s]
@toothrot toothrot moved this from Planned to In Progress in Go Release Team Mar 16, 2021
@toothrot toothrot assigned toothrot and jeremyfaller and unassigned toothrot Mar 16, 2021
@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Mar 16, 2021

Ticket opened w/ Github, awaiting response.

@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Mar 17, 2021

Poked Github. No rate limit increase is in the cards. We'll be held to 5k/hour unless we upgrade to an enterprise account (being related to the Google {which is an enterprise account} doesn't seem to help us here). I think we'll need to fix the underlying issues.

edit: We'd get 15k/hour if we upgraded.

@dmitshur
Copy link
Member Author

@dmitshur dmitshur commented Mar 17, 2021

Thanks for the update.

I'm working on getting a graph of our rate limit usage. Having that should help get a sense of how much/often the rate limit is being exceeded, and how much we need to decrease its usage by.

@dmitshur dmitshur self-assigned this Mar 19, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 22, 2021

Change https://golang.org/cl/303670 mentions this issue: cmd/coordinator: add health check for GitHub API quota

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 22, 2021

Change https://golang.org/cl/303669 mentions this issue: cmd/coordinator: migrate to OpenCensus for metrics

gopherbot pushed a commit to golang/build that referenced this issue Mar 23, 2021
Replace low-level Stackdriver monitoring API usage for OpenCensus
with a Stackdriver exporter. To benefit local development, expose
metrics at an /metrics endpoint (to be picked up with Prometheus).

This makes it much easier to add new metrics, to test them locally,
and brings our metrics solution in sync with what's currently in
use in x/playground (see CL 302769). It's expected to be preferable
to migrate to OpenTelemetry in the future when a good migration path
becomes available, and both x/build and x/playground can be updated
at that time.

This CL is based on work in CL 229679 and CL 138522.

For golang/go#26779.
For golang/go#44406.
For golang/go#17104.

Co-authored-by: Alexander Rakoczy <alex@golang.org>
Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Change-Id: Iad45730feace471db1668e828b7c9775377be8a9
Reviewed-on: https://go-review.googlesource.com/c/build/+/303669
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
gopherbot pushed a commit to golang/build that referenced this issue Mar 23, 2021
In recent times, it has been observed that the GitHub API rate limit
quota of 5000 requests per hour is being occasionally exceeded.

It should be very helpful to have a graph that tracks remaining rate
limit over time to better understand the current state and how much
effect future code changes have on improving it.

Also add a health check to coordinator's health section that prints
a warning when the GitHub rate limit is known to be exceeded. This
can help when observing GopherBot or GerritBot problems: we'll be
able to tell if they're likely caused by GitHub rate limit issues
or if the cause must be something else.

For golang/go#44406.

Change-Id: Id75d70129a75292a6d3f9c722636a8b740ca05a1
Reviewed-on: https://go-review.googlesource.com/c/build/+/303670
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
@dmitshur
Copy link
Member Author

@dmitshur dmitshur commented Apr 6, 2021

I'm going to retitle this issue to be about adding a health check and metrics to improve visibility into this issue, and close it since it's done.

We can file new issues for future improvements to reduce the amount of time the rate limit is exceeded.

@dmitshur dmitshur changed the title x/build/cmd/gopherbot: may exceed GitHub API rate limit x/build/cmd/coordinator: add health check and graph to track hourly remaining GitHub API rate limit Apr 6, 2021
@dmitshur dmitshur closed this Apr 6, 2021
Go Release Team automation moved this from In Progress to Done Apr 6, 2021
@dmitshur dmitshur added NeedsFix and removed NeedsInvestigation labels Apr 6, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 9, 2021

Change https://golang.org/cl/308790 mentions this issue: cmd/gopherbot: add more deleted issues to deletedIssues map

gopherbot pushed a commit to golang/build that referenced this issue Apr 9, 2021
A good amount of time has passed since the deletedIssues map was last
updated, and the "freeze old issues" task was needlessly making 34 API
calls to freeze issues that are gone. After this change, that task is
making 0 API calls (whenever there aren't existing issues to freeze).

Some gardening tasks were converted to be more general and run on more
issue trackers in CL 233377, so update the deletedIssues map to track
the repo ID in addition to the issue number.

For golang/go#28320.
Updates golang/go#22635.
Updates golang/go#44406.
Updates golang/go#39008.

Change-Id: I3b477bf717f7d97676e9ef950214a3598ec3abd2
Reviewed-on: https://go-review.googlesource.com/c/build/+/308790
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants