New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utils/github: use GraphQL PR searching #16886
Conversation
8eae4ea
to
2fe1098
Compare
pull_requests = [] | ||
query += " repo:#{tap_remote_repo} in:title" | ||
query += " state:#{state}" if state.present? | ||
graphql_query = <<~EOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This probably makes more sense as a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frozen_string_literal
already makes this a single memory object across function calls and it is very specific to this method itself so making it global feels weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best fix here is going to be to avoid using the search API entirely and instead using the pull request list API which isn't so aggressively rate limited: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests--parameters
Similarly, #15518 would help save the rate limit here too.
Not directly related to PR but on topic of
|
Problem is for Homebrew-core this would require 1,605 requests to get the full list. Even if you limited scope to the last month, that's still 35-40 requests. Ideally this would be cached once, but realistically this doesn't work because we subshell to |
Will check
Is this still a problem now that manual bumps that overlap with autobumps don't happen anymore? |
It is @BrewTestBot that is creating duplicates of its own PRs due to hitting API rate limit and given merge goes during that 1 hour interval, e.g. Homebrew/homebrew-core#166068 If we can totally avoid rate limit, it should no longer be an issue. If we can't avoid it, then keeping git repo slightly closer to current timestamp reduces those PRs (but not entirely as merge can happen just seconds before bump attempt). Alternatively chunking the 1000+ formulae (which may grow to full repo of 6000+ non-deprecated/disabled) into multiple |
yeah hope first is to fix the rate limit problem because it's an issue not just for merged PRs, but PRs that were closed without merging. If it doesn't work then partial solutions like fixing the merged PRs case would make sense |
Fair enough. For at least the case of "are there any open PRs": we can definitely check that first. That seems to be the main duplicate search that's being done in CI?
If we use e.g. |
I do have an implementation of open PR only checking with the PR API, though it doubles the amount of code. Idea was to try this and if there's rate limits still then introduce that for open PR checking, but I can introduce it now if you prefer?
Closed PR checking is definitely important otherwise a failed bump PR will continuously be opened again and again. Maintainers have noticed and mentioned this on multiple occasions. |
I think it's worth it when possible, the search API causes us no end up rate limit problems 😭
Yeh 👍🏻. Longer-term: I think we could benefit with having better state usage here. We could pretty easily have a nightly job that maps all previous PRs to formulae, caches the data somewhere and avoids repeatedly querying all the same data every time. |
2fe1098
to
30dbe62
Compare
Not super convinced only half avoiding the search API is ideal, however to be safe I've added back in the open PR checking using the PR API for Homebrew CI. It's definitely slower than search API by half a second or so but not as bad as it was before where it was multiple seconds under the REST API. I have expanded the PR API case to also work for If you're happy, I'll merge and monitor this. I think basically anything here will be better than the status quo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks @Bo98!
30dbe62
to
7b0c3d5
Compare
Worth noting here: Octokit does not support GraphQL (though does allow you to do a custom |
Result: https://github.com/Homebrew/homebrew-core/actions/runs/8304989363/job/22731571713 No rate limits hit anymore. Seems like the GraphQL search API has a significantly higher limit than REST search API. Whole workflow is also running 2x faster than usual (#16887) so we're sending requests twice as fast and still not hitting it. |
Good to know 👍🏻
Great work @Bo98 🎉!
Weird. I think on paper they are meant to have the same. Is this definitely a search API and not a search-like interface to a non-search API? Great news either way 🎉 |
If it is then it's not documented. It is the only interface to searching things and uses the same search query syntax. What happens under the hood is for GitHub's server code to decide. In then end if it works, it works. |
The search API is restricted to 30/minute, which is too little for our autobump workflow: https://github.com/Homebrew/homebrew-core/actions/runs/8273488630/job/22637346680
However GraphQL searching seems to be higher - it's unclear exactly how high until we try it out. So let's try use it and see if it improves things.
I've also tried to fold the
fetch_open_pull_requests
CI workaround into this, because it's slow (3 seconds), the cache doesn't work in autobump workflows since it's distinctbrew
processes and will be inferior if the new method works. However I do have yet another implementation offetch_open_pull_requests
that's still significantly faster than the old one that I can add if we find even the GraphQL search API is not good enough.