Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of brew readall #16007

Merged
merged 1 commit into from Sep 28, 2023
Merged

Conversation

Bo98
Copy link
Member

@Bo98 Bo98 commented Sep 15, 2023

  • Significantly improve base performance (60s -> 30s for Homebrew/core before the extra checks were added)
  • Add to_hash check to ensure the method used for API generation (and brew info etc) doesn't error. This increases that 30s back up to 47s (after some optimisations) but does mean we could remove the 90 second brew generate-formula-api call in tap_syntax checks as the other elements of that command shouldn't be affected by formula changes.

These changes affect formulae only. I have not touched casks.

@apainintheneck
Copy link
Contributor

If we're going to move the Formula#to_h check, I think that it'd make more sense to have it in brew audit than in brew readall.

@Bo98
Copy link
Member Author

Bo98 commented Sep 15, 2023

We ideally want to to_hash for all OS and arch combos if we want to replace brew generate-formula-api, but brew audit (in our CI at least) only does the running OS and would probably add ~45s (way more than that if we treat it as a generic audit and run all audits on a simulation basis) to brew audit's runtime to start doing that, rather than ~15s to brew readall.

However we could just get rid of brew readall entirely in CI as adding to_hash to brew audit means that we have already done all the formula loading checks brew readall would normally do in order to be able to call to_hash on a loaded formula object on all simulated OS and arch combos. The to_hash is one line of code addition here.

@Bo98
Copy link
Member Author

Bo98 commented Sep 15, 2023

In theory to_hash should only break if there is fundamentally wrong with the formula, such as passing garbage somewhere. Now that Sorbet is enabled, it probably shouldn't ever happen now to be honest but I'm not sure if that's guaranteed. Will need to look back to see what scenario lead to brew generate-formula-api needing to be added.

@apainintheneck
Copy link
Contributor

Sorry, where would we be removing the brew generate-api-* check from in CI: here or in core? To be pedantic I believe we use #to_hash_with_variations when generating the API. This probably gets us most of the way there though and I don't think we've had any problems with generation on either the formula or the cask side recently. The only bugs I remember happening recently were with reading casks.

I still think it's a bit of a weird thing to add to brew readall but it sounds like it will save a minute in CI, is that right?

@MikeMcQuaid
Copy link
Member

However we could just get rid of brew readall entirely in CI as adding to_hash to brew audit means that we have already done all the formula loading checks brew readall would normally do in order to be able to call to_hash on a loaded formula object on all simulated OS and arch combos. The to_hash is one line of code addition here.

This makes sense to me. I was also leaning towards "let's just have a single check that loads all formulae". From that perspective, I agree to_h makes more sense in audit than readall.

I still think it's a bit of a weird thing to add to brew readall but it sounds like it will save a minute in CI, is that right?

Sounds like it.

Sorry, where would we be removing the brew generate-api-* check from in CI: here or in core?

Core, not here.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance changes look good. Maybe split out the to_h ones into another PR?

Comment on lines +62 to +65
cache[:valid_formulae][file] = if readall_formula.on_system_blocks_exist?
[bottle_tag, *cache[:valid_formulae][file]]
else
true
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cache[:valid_formulae][file] = if readall_formula.on_system_blocks_exist?
[bottle_tag, *cache[:valid_formulae][file]]
else
true
end
unless readall_formula.on_system_blocks_exist?
cache[:valid_formulae][file] = true
end

I don't think we really need to track the case where a formula uses on system blocks. We only run this method once per os/arch combination anyway.

The main thing we're trying to do here is avoid reading formula without on system blocks multiple times (since they shouldn't change), right?

[2] brew(main)> CoreTap.instance.formula_files.count do |file|
[2] brew(main)*   Formulary.factory(file).on_system_blocks_exist?
[2] brew(main)* end
=> 880
[3] brew(main)> CoreTap.instance.formula_files.count
=> 6781

Isn't it possible to use the manual os and arch functions in formula definitions though like OS.mac?? Or maybe that's only limited to certain stanzas like install and test that aren't relevant for brew readall. To be fair this command was update recently to test against different os/arch combinations but it does change the behavior, right?

Copy link
Member Author

@Bo98 Bo98 Sep 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually allow on_system blocks inside install and test. We do however in caveats, which is read by to_hash.

Idea being it would catch something like:

def caveats
  on_linux do
    oopsie
  end

  "test"
end

Basically the hope is that brew generate-formula-api should never fail if brew readall passes, or at least 99.9% of the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Do you think this sort of optimization would be useful for the cask repo as well at some point?

@apainintheneck
Copy link
Contributor

I've changed my mind. I think adding the Formula#to_hash check here makes sense to me.

@apainintheneck apainintheneck mentioned this pull request Sep 17, 2023
10 tasks
@Bo98 Bo98 changed the title Improve performance of brew readall and add additional checks Improve performance of brew readall Sep 28, 2023
@Bo98
Copy link
Member Author

Bo98 commented Sep 28, 2023

Split this into:

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks again @Bo98!

@MikeMcQuaid MikeMcQuaid merged commit 6b01bc1 into Homebrew:master Sep 28, 2023
25 checks passed
@Bo98 Bo98 deleted the better-readall branch September 28, 2023 13:49
@github-actions github-actions bot added the outdated PR was locked due to age label Oct 29, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants