Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils/github: use GraphQL PR searching #16886

Merged
merged 1 commit into from Mar 16, 2024
Merged

utils/github: use GraphQL PR searching #16886

merged 1 commit into from Mar 16, 2024

Conversation

Bo98
Copy link
Member

@Bo98 Bo98 commented Mar 14, 2024

The search API is restricted to 30/minute, which is too little for our autobump workflow: https://github.com/Homebrew/homebrew-core/actions/runs/8273488630/job/22637346680

However GraphQL searching seems to be higher - it's unclear exactly how high until we try it out. So let's try use it and see if it improves things.

I've also tried to fold the fetch_open_pull_requests CI workaround into this, because it's slow (3 seconds), the cache doesn't work in autobump workflows since it's distinct brew processes and will be inferior if the new method works. However I do have yet another implementation of fetch_open_pull_requests that's still significantly faster than the old one that I can add if we find even the GraphQL search API is not good enough.

@Bo98 Bo98 force-pushed the pr-search-improvements branch 2 times, most recently from 8eae4ea to 2fe1098 Compare March 14, 2024 02:54
pull_requests = []
query += " repo:#{tap_remote_repo} in:title"
query += " state:#{state}" if state.present?
graphql_query = <<~EOS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This probably makes more sense as a constant.

Copy link
Member Author

@Bo98 Bo98 Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frozen_string_literal already makes this a single memory object across function calls and it is very specific to this method itself so making it global feels weird.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best fix here is going to be to avoid using the search API entirely and instead using the pull request list API which isn't so aggressively rate limited: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests--parameters

Similarly, #15518 would help save the rate limit here too.

@cho-m
Copy link
Member

cho-m commented Mar 14, 2024

Not directly related to PR but on topic of brew bump in CI:

  • Can we have a fast path in CI if new_version == old_version to avoid PR searching? Seems like a lot of PR searching is just informational when we already know PR will not be opened.
  • Also is it possible to enable auto-updates during a brew bump (maybe when --open-pr subshells where we can unset HOMEBREW_AUTO_UPDATE_CHECKED some times)? Not as important if we can avoid rate limits, but if we still hit some then avoiding repo getting noticeably out-of-date can reduce chances of duplicate PRs.

@Bo98
Copy link
Member Author

Bo98 commented Mar 14, 2024

I think the best fix here is going to be to avoid using the search API entirely and instead using the pull request list API which isn't so aggressively rate limited: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests--parameters

Problem is for Homebrew-core this would require 1,605 requests to get the full list. Even if you limited scope to the last month, that's still 35-40 requests.

Ideally this would be cached once, but realistically this doesn't work because we subshell to bump-formula-pr.

@Bo98
Copy link
Member Author

Bo98 commented Mar 14, 2024

Can we have a fast path in CI if new_version == old_version to avoid PR searching? Seems like a lot of PR searching is just informational when we already know PR will not be opened.

Will check

Also is it possible to enable auto-updates during a brew bump (maybe when --open-pr subshells where we can unset HOMEBREW_AUTO_UPDATE_CHECKED some times)?

Is this still a problem now that manual bumps that overlap with autobumps don't happen anymore?

@cho-m
Copy link
Member

cho-m commented Mar 14, 2024

Is this still a problem now that manual bumps that overlap with autobumps don't happen anymore?

It is @BrewTestBot that is creating duplicates of its own PRs due to hitting API rate limit and given merge goes during that 1 hour interval, e.g. Homebrew/homebrew-core#166068

If we can totally avoid rate limit, it should no longer be an issue. If we can't avoid it, then keeping git repo slightly closer to current timestamp reduces those PRs (but not entirely as merge can happen just seconds before bump attempt).

Alternatively chunking the 1000+ formulae (which may grow to full repo of 6000+ non-deprecated/disabled) into multiple brew bump commands can also work (EDIT: serial as parallel would just make API limit worse).

@Bo98
Copy link
Member Author

Bo98 commented Mar 14, 2024

yeah hope first is to fix the rate limit problem because it's an issue not just for merged PRs, but PRs that were closed without merging.

If it doesn't work then partial solutions like fixing the merged PRs case would make sense

@MikeMcQuaid
Copy link
Member

Problem is for Homebrew-core this would require 1,605 requests to get the full list. Even if you limited scope to the last month, that's still 35-40 requests.

Fair enough. For at least the case of "are there any open PRs": we can definitely check that first. That seems to be the main duplicate search that's being done in CI?

Ideally this would be cached once, but realistically this doesn't work because we subshell to bump-formula-pr.

If we use e.g. octokit.rb it gets much easier to cache these results on disk.

@Bo98
Copy link
Member Author

Bo98 commented Mar 14, 2024

For at least the case of "are there any open PRs": we can definitely check that first.

I do have an implementation of open PR only checking with the PR API, though it doubles the amount of code. Idea was to try this and if there's rate limits still then introduce that for open PR checking, but I can introduce it now if you prefer?

That seems to be the main duplicate search that's being done in CI?

Closed PR checking is definitely important otherwise a failed bump PR will continuously be opened again and again. Maintainers have noticed and mentioned this on multiple occasions.

@MikeMcQuaid
Copy link
Member

Idea was to try this and if there's rate limits still then introduce that for open PR checking, but I can introduce it now if you prefer?

I think it's worth it when possible, the search API causes us no end up rate limit problems 😭

Closed PR checking is definitely important otherwise a failed bump PR will continuously be opened again and again.

Yeh 👍🏻.

Longer-term: I think we could benefit with having better state usage here. We could pretty easily have a nightly job that maps all previous PRs to formulae, caches the data somewhere and avoids repeatedly querying all the same data every time.

@Bo98
Copy link
Member Author

Bo98 commented Mar 15, 2024

Not super convinced only half avoiding the search API is ideal, however to be safe I've added back in the open PR checking using the PR API for Homebrew CI. It's definitely slower than search API by half a second or so but not as bad as it was before where it was multiple seconds under the REST API.

I have expanded the PR API case to also work for brew bump whereas previously it only worked in brew-*-pr level.

If you're happy, I'll merge and monitor this. I think basically anything here will be better than the status quo.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks @Bo98!

@Bo98 Bo98 merged commit 3c811fd into master Mar 16, 2024
26 checks passed
@Bo98 Bo98 deleted the pr-search-improvements branch March 16, 2024 03:13
@Bo98
Copy link
Member Author

Bo98 commented Mar 16, 2024

If we use e.g. octokit.rb it gets much easier to cache these results on disk.

Worth noting here: Octokit does not support GraphQL (though does allow you to do a custom client.post "graphql", { query:, variables: }) and most HTTP libraries don't support caching POST requests, so while caching makes sense it might have to be something more custom.

@Bo98
Copy link
Member Author

Bo98 commented Mar 16, 2024

Result: https://github.com/Homebrew/homebrew-core/actions/runs/8304989363/job/22731571713

No rate limits hit anymore. Seems like the GraphQL search API has a significantly higher limit than REST search API. Whole workflow is also running 2x faster than usual (#16887) so we're sending requests twice as fast and still not hitting it.

@MikeMcQuaid
Copy link
Member

Worth noting here: Octokit does not support GraphQL (though does allow you to do a custom client.post "graphql", { query:, variables: }) and most HTTP libraries don't support caching POST requests, so while caching makes sense it might have to be something more custom.

Good to know 👍🏻

No rate limits hit anymore.

Great work @Bo98 🎉!

Seems like the GraphQL search API has a significantly higher limit than REST search API.

Weird. I think on paper they are meant to have the same. Is this definitely a search API and not a search-like interface to a non-search API?

Great news either way 🎉

@Bo98
Copy link
Member Author

Bo98 commented Mar 18, 2024

Is this definitely a search API and not a search-like interface to a non-search API?

If it is then it's not documented. It is the only interface to searching things and uses the same search query syntax. What happens under the hood is for GitHub's server code to decide.

In then end if it works, it works.

@github-actions github-actions bot added the outdated PR was locked due to age label Apr 18, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants