utils/github: use GraphQL PR searching #16886

Bo98 · 2024-03-14T02:02:31Z

The search API is restricted to 30/minute, which is too little for our autobump workflow: https://github.com/Homebrew/homebrew-core/actions/runs/8273488630/job/22637346680

However GraphQL searching seems to be higher - it's unclear exactly how high until we try it out. So let's try use it and see if it improves things.

I've also tried to fold the fetch_open_pull_requests CI workaround into this, because it's slow (3 seconds), the cache doesn't work in autobump workflows since it's distinct brew processes and will be inferior if the new method works. However I do have yet another implementation of fetch_open_pull_requests that's still significantly faster than the old one that I can add if we find even the GraphQL search API is not good enough.

apainintheneck · 2024-03-14T03:49:44Z

Library/Homebrew/utils/github.rb

-      pull_requests = []
+    query += " repo:#{tap_remote_repo} in:title"
+    query += " state:#{state}" if state.present?
+    graphql_query = <<~EOS


Nit: This probably makes more sense as a constant.

frozen_string_literal already makes this a single memory object across function calls and it is very specific to this method itself so making it global feels weird.

MikeMcQuaid

I think the best fix here is going to be to avoid using the search API entirely and instead using the pull request list API which isn't so aggressively rate limited: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests--parameters

Similarly, #15518 would help save the rate limit here too.

cho-m · 2024-03-14T13:42:19Z

Not directly related to PR but on topic of brew bump in CI:

Can we have a fast path in CI if new_version == old_version to avoid PR searching? Seems like a lot of PR searching is just informational when we already know PR will not be opened.
Also is it possible to enable auto-updates during a brew bump (maybe when --open-pr subshells where we can unset HOMEBREW_AUTO_UPDATE_CHECKED some times)? Not as important if we can avoid rate limits, but if we still hit some then avoiding repo getting noticeably out-of-date can reduce chances of duplicate PRs.

Bo98 · 2024-03-14T13:49:29Z

I think the best fix here is going to be to avoid using the search API entirely and instead using the pull request list API which isn't so aggressively rate limited: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests--parameters

Problem is for Homebrew-core this would require 1,605 requests to get the full list. Even if you limited scope to the last month, that's still 35-40 requests.

Ideally this would be cached once, but realistically this doesn't work because we subshell to bump-formula-pr.

Bo98 · 2024-03-14T13:52:23Z

Can we have a fast path in CI if new_version == old_version to avoid PR searching? Seems like a lot of PR searching is just informational when we already know PR will not be opened.

Will check

Also is it possible to enable auto-updates during a brew bump (maybe when --open-pr subshells where we can unset HOMEBREW_AUTO_UPDATE_CHECKED some times)?

Is this still a problem now that manual bumps that overlap with autobumps don't happen anymore?

cho-m · 2024-03-14T14:02:36Z

Is this still a problem now that manual bumps that overlap with autobumps don't happen anymore?

It is @BrewTestBot that is creating duplicates of its own PRs due to hitting API rate limit and given merge goes during that 1 hour interval, e.g. Homebrew/homebrew-core#166068

If we can totally avoid rate limit, it should no longer be an issue. If we can't avoid it, then keeping git repo slightly closer to current timestamp reduces those PRs (but not entirely as merge can happen just seconds before bump attempt).

Alternatively chunking the 1000+ formulae (which may grow to full repo of 6000+ non-deprecated/disabled) into multiple brew bump commands can also work (EDIT: serial as parallel would just make API limit worse).

Bo98 · 2024-03-14T14:06:18Z

yeah hope first is to fix the rate limit problem because it's an issue not just for merged PRs, but PRs that were closed without merging.

If it doesn't work then partial solutions like fixing the merged PRs case would make sense

MikeMcQuaid · 2024-03-14T14:07:40Z

Problem is for Homebrew-core this would require 1,605 requests to get the full list. Even if you limited scope to the last month, that's still 35-40 requests.

Fair enough. For at least the case of "are there any open PRs": we can definitely check that first. That seems to be the main duplicate search that's being done in CI?

Ideally this would be cached once, but realistically this doesn't work because we subshell to bump-formula-pr.

If we use e.g. octokit.rb it gets much easier to cache these results on disk.

Bo98 · 2024-03-14T14:59:03Z

For at least the case of "are there any open PRs": we can definitely check that first.

I do have an implementation of open PR only checking with the PR API, though it doubles the amount of code. Idea was to try this and if there's rate limits still then introduce that for open PR checking, but I can introduce it now if you prefer?

That seems to be the main duplicate search that's being done in CI?

Closed PR checking is definitely important otherwise a failed bump PR will continuously be opened again and again. Maintainers have noticed and mentioned this on multiple occasions.

MikeMcQuaid · 2024-03-14T16:50:46Z

Idea was to try this and if there's rate limits still then introduce that for open PR checking, but I can introduce it now if you prefer?

I think it's worth it when possible, the search API causes us no end up rate limit problems 😭

Closed PR checking is definitely important otherwise a failed bump PR will continuously be opened again and again.

Yeh 👍🏻.

Longer-term: I think we could benefit with having better state usage here. We could pretty easily have a nightly job that maps all previous PRs to formulae, caches the data somewhere and avoids repeatedly querying all the same data every time.

Bo98 · 2024-03-15T17:36:38Z

Not super convinced only half avoiding the search API is ideal, however to be safe I've added back in the open PR checking using the PR API for Homebrew CI. It's definitely slower than search API by half a second or so but not as bad as it was before where it was multiple seconds under the REST API.

I have expanded the PR API case to also work for brew bump whereas previously it only worked in brew-*-pr level.

If you're happy, I'll merge and monitor this. I think basically anything here will be better than the status quo.

MikeMcQuaid

Great work, thanks @Bo98!

Bo98 · 2024-03-16T03:15:41Z

If we use e.g. octokit.rb it gets much easier to cache these results on disk.

Worth noting here: Octokit does not support GraphQL (though does allow you to do a custom client.post "graphql", { query:, variables: }) and most HTTP libraries don't support caching POST requests, so while caching makes sense it might have to be something more custom.

Bo98 · 2024-03-16T04:16:57Z

Result: https://github.com/Homebrew/homebrew-core/actions/runs/8304989363/job/22731571713

No rate limits hit anymore. Seems like the GraphQL search API has a significantly higher limit than REST search API. Whole workflow is also running 2x faster than usual (#16887) so we're sending requests twice as fast and still not hitting it.

MikeMcQuaid · 2024-03-18T12:23:29Z

Worth noting here: Octokit does not support GraphQL (though does allow you to do a custom client.post "graphql", { query:, variables: }) and most HTTP libraries don't support caching POST requests, so while caching makes sense it might have to be something more custom.

Good to know 👍🏻

No rate limits hit anymore.

Great work @Bo98 🎉!

Seems like the GraphQL search API has a significantly higher limit than REST search API.

Weird. I think on paper they are meant to have the same. Is this definitely a search API and not a search-like interface to a non-search API?

Great news either way 🎉

Bo98 · 2024-03-18T20:28:24Z

Is this definitely a search API and not a search-like interface to a non-search API?

If it is then it's not documented. It is the only interface to searching things and uses the same search query syntax. What happens under the hood is for GitHub's server code to decide.

In then end if it works, it works.

Bo98 force-pushed the pr-search-improvements branch 2 times, most recently from 8eae4ea to 2fe1098 Compare March 14, 2024 02:54

carlocab approved these changes Mar 14, 2024

View reviewed changes

apainintheneck reviewed Mar 14, 2024

View reviewed changes

Bo98 mentioned this pull request Mar 14, 2024

dev-cmd/bump: skip Repology checks in CI for livecheckables #16887

Merged

MikeMcQuaid reviewed Mar 14, 2024

View reviewed changes

Bo98 force-pushed the pr-search-improvements branch from 2fe1098 to 30dbe62 Compare March 15, 2024 17:36

MikeMcQuaid approved these changes Mar 15, 2024

View reviewed changes

utils/github: use GraphQL PR searching

7b0c3d5

Bo98 force-pushed the pr-search-improvements branch from 30dbe62 to 7b0c3d5 Compare March 15, 2024 21:27

Bo98 merged commit 3c811fd into master Mar 16, 2024
26 checks passed

Bo98 deleted the pr-search-improvements branch March 16, 2024 03:13

EricFromCanada mentioned this pull request Mar 19, 2024

bump-formula-pr: fix type of detected version #16920

Merged

7 tasks

github-actions bot added the outdated PR was locked due to age label Apr 18, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils/github: use GraphQL PR searching #16886

utils/github: use GraphQL PR searching #16886

Bo98 commented Mar 14, 2024 •

edited

apainintheneck Mar 14, 2024

Bo98 Mar 15, 2024 •

edited

MikeMcQuaid left a comment

cho-m commented Mar 14, 2024 •

edited

Bo98 commented Mar 14, 2024 •

edited

Bo98 commented Mar 14, 2024 •

edited

cho-m commented Mar 14, 2024 •

edited

Bo98 commented Mar 14, 2024

MikeMcQuaid commented Mar 14, 2024

Bo98 commented Mar 14, 2024 •

edited

MikeMcQuaid commented Mar 14, 2024

Bo98 commented Mar 15, 2024 •

edited

MikeMcQuaid left a comment

Bo98 commented Mar 16, 2024 •

edited

Bo98 commented Mar 16, 2024 •

edited

MikeMcQuaid commented Mar 18, 2024

Bo98 commented Mar 18, 2024 •

edited

utils/github: use GraphQL PR searching #16886

utils/github: use GraphQL PR searching #16886

Conversation

Bo98 commented Mar 14, 2024 • edited

apainintheneck Mar 14, 2024

Choose a reason for hiding this comment

Bo98 Mar 15, 2024 • edited

Choose a reason for hiding this comment

MikeMcQuaid left a comment

Choose a reason for hiding this comment

cho-m commented Mar 14, 2024 • edited

Bo98 commented Mar 14, 2024 • edited

Bo98 commented Mar 14, 2024 • edited

cho-m commented Mar 14, 2024 • edited

Bo98 commented Mar 14, 2024

MikeMcQuaid commented Mar 14, 2024

Bo98 commented Mar 14, 2024 • edited

MikeMcQuaid commented Mar 14, 2024

Bo98 commented Mar 15, 2024 • edited

MikeMcQuaid left a comment

Choose a reason for hiding this comment

Bo98 commented Mar 16, 2024 • edited

Bo98 commented Mar 16, 2024 • edited

MikeMcQuaid commented Mar 18, 2024

Bo98 commented Mar 18, 2024 • edited

Bo98 commented Mar 14, 2024 •

edited

Bo98 Mar 15, 2024 •

edited

cho-m commented Mar 14, 2024 •

edited

Bo98 commented Mar 14, 2024 •

edited

Bo98 commented Mar 14, 2024 •

edited

cho-m commented Mar 14, 2024 •

edited

Bo98 commented Mar 14, 2024 •

edited

Bo98 commented Mar 15, 2024 •

edited

Bo98 commented Mar 16, 2024 •

edited

Bo98 commented Mar 16, 2024 •

edited

Bo98 commented Mar 18, 2024 •

edited