Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cask: always return short cask tokens from core cask tap #16867

Merged
merged 2 commits into from Mar 13, 2024

Conversation

apainintheneck
Copy link
Contributor

@apainintheneck apainintheneck commented Mar 10, 2024

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

For some reason Tap#formula_names returns the short name while Tap#cask_tokens returns the full name for some reason. I'd overlooked that when generating the internal JSON. This just gets the short name for casks from the casks themselves.

brew(main):004:0> ENV["HOMEBREW_NO_INSTALL_FROM_API"] = "1"
=> "1"
brew(main):005:0> CoreCaskTap.instance.cask_tokens.sample(5)
=> ["homebrew/cask/ssh-config-editor", "homebrew/cask/deskpad", "homebrew/cask/zoom-for-it-admins", "homebrew/cask/ukelele", "homebrew/cask/baiduinput"]
brew(main):006:0> CoreTap.instance.formula_names.sample(5)
=> ["lilv", "libccd", "chmlib", "mpfrcx", "cgit"]
brew(main):007:0> :deskpad.c.token
=> "deskpad"

Edit: Based on code review we've decided to change the behavior of CoreCaskTap.instance.cask_tokens to always return the short tokens. This matches the behavior when the core cask tap is loaded from the API. The other bug with brew untap is being handled in a separate PR: #16875

@apainintheneck
Copy link
Contributor Author

This is a follow-up to #16798 that I discovered while trying to write the loader code.

@apainintheneck
Copy link
Contributor Author

It's also interesting how the behavior is different when using the API. Tap#cask_tokens returns short names in that case.

brew(main):008:0> ENV.delete("HOMEBREW_NO_INSTALL_FROM_API")
=> "1"
brew(main):009:0> CoreCaskTap.instance.cask_tokens.sample(5)
=> ["mediainfo", "movist-pro", "fluor", "opendnsupdater", "wavebox"]

@ZhongRuoyu
Copy link
Member

Would it be reasonable to tweak CoreCaskTap#cask_tokens instead? Since its result is different with and without API.

@ZhongRuoyu
Copy link
Member

ZhongRuoyu commented Mar 10, 2024

I also noticed that CoreCaskTap.instance.cask_tokens is used in cmd/untap.rb for checking installed casks before untapping, but its doesn't seem to do the intended work as I could still untap the cask tap:

$ export HOMEBREW_NO_INSTALL_FROM_API=1
$ brew developer off
$ brew untap homebrew/cask
Untapping homebrew/cask...
Untapped 4397 casks (6,488 files, 423.9MB).

But with the following change (the same override exists in CoreTap):

diff --git a/Library/Homebrew/tap.rb b/Library/Homebrew/tap.rb
index 8e779c6ec..68058e7f5 100644
--- a/Library/Homebrew/tap.rb
+++ b/Library/Homebrew/tap.rb
@@ -1337,6 +1337,12 @@ class CoreCaskTap < AbstractCoreTap
     Homebrew::API::Cask.all_casks.keys
   end
 
+  # @private
+  sig { params(file: Pathname).returns(String) }
+  def formula_file_to_name(file)
+    file.basename(".rb").to_s
+  end
+
   # @private
   sig { override.returns(T::Hash[String, Pathname]) }
   def cask_files_by_name

(After I have homebrew/cask back again:)

$ brew untap homebrew/cask
Error: Refusing to untap homebrew/cask because it contains the following installed formulae or casks:
[list of casks installed]

@apainintheneck
Copy link
Contributor Author

Yeah, that's probably a better approach especially since the behavior in brew untap is clearly a bug. I'll check the other places it's used like in the tap auditor and descriptions code to make sure nothing else needs to be changed.

@apainintheneck apainintheneck force-pushed the stop-using-full-name-as-key-in-cask-json-v3 branch from 2f1e1cd to 2fdb268 Compare March 10, 2024 02:46
@apainintheneck apainintheneck marked this pull request as draft March 10, 2024 02:49
@apainintheneck apainintheneck marked this pull request as draft March 10, 2024 02:49
This seems to be a bug with how we handle name shortening for the
core cask tap. The core tap always returns short formula names
and returning long names from the core cask tap when not using
the API leads to unexpected behavior.

Specifically this can trick the `brew untap` command into thinking
that there aren't any installed casks in the core cask tap and that
it can be removed even when that is not the case.

One risk here is that the full names were used when caching
descriptions so descriptions could be out of date for people in
the short term though hopefully that's not the end of the world.
@apainintheneck apainintheneck force-pushed the stop-using-full-name-as-key-in-cask-json-v3 branch from 2fdb268 to 6e0e78c Compare March 10, 2024 03:03
It was possible for the cask tokens to be included twice in the
cask_tokens array. This was cancelled out by the fact that we |=
the arrays together but still it was unnecessary work that is best
avoided and makes the code harder to reason about. This is simpler.
@apainintheneck
Copy link
Contributor Author

Now that I look at it a bit more I think that the bug in brew untap is actually unrelated to the cask token being the full name. This wouldn't have worked for other cask taps either.

diff --git a/Library/Homebrew/cmd/untap.rb b/Library/Homebrew/cmd/untap.rb
index 90696f7f55..d087c1cc17 100644
--- a/Library/Homebrew/cmd/untap.rb
+++ b/Library/Homebrew/cmd/untap.rb
@@ -52,7 +52,7 @@ module Homebrew
           next unless installed_cask_tokens.include?(cask_token)
 
           cask = begin
-            Cask::CaskLoader.load("#{tap.name}/#{cask_token}")
+            Cask::CaskLoader.load(cask_token)
           rescue
             # Don't blow up because of a single unavailable cask.
             next

We also should probably warn here at the very least because we've seemingly been swallowing this error for months now.

@apainintheneck
Copy link
Contributor Author

Formulary.factory("#{tap.name}/#{formula_name}")

It looks like it's broken for formula too for everything except the core tap and it seems like it was changed in December 2023.

@Bo98
Copy link
Member

Bo98 commented Mar 10, 2024

Formula.installed_formula_names doesn't return full names for technical reasons, so that patch looks OK but you will need to change the installed check to consider formula_name.split("/").last

@apainintheneck
Copy link
Contributor Author

@Bo98 You beat me to it. I think that brew untapped is broken beyond what I originally thought.

installed_tap_casks = tap.cask_tokens.filter_map do |cask_token|
# initialise lazily in case there's no casks in this tap
installed_cask_tokens ||= Set.new(Cask::Caskroom.tokens)
next unless installed_cask_tokens.include?(cask_token)

installed_tap_formulae = tap.formula_names.filter_map do |formula_name|
# initialise lazily in case there's no formulae in this tap
installed_formula_names ||= Set.new(Formula.installed_formula_names)
next unless installed_formula_names.include?(formula_name)

Both of the open rescue blocks for loading casks and formulae in the loop are unreachable because we're comparing full cask or formula names to Formula.installed_formula_names and Cask::Caskroom.tokens which are just arrays of the short cask and formula names. IMO it basically needs to be entirely rewritten.

Is it fine if that gets handled in a follow-up PR since it doesn't seem to be related to what I'm trying to do here?

@Bo98
Copy link
Member

Bo98 commented Mar 10, 2024

IMO it basically needs to be entirely rewritten.

Not really given we already check the tap part here:

# Can't use Formula#any_version_installed? because it doesn't consider
# taps correctly.
formula if formula.installed_kegs.any? { |keg| keg.tab.tap == tap }

So the logic seems right, just not handling formula_name correctly.

We don't for casks purely because that information is simply not available at all. It is impossible to check which tap a cask was installed from, unless it's a JSON file but that's core-only anyway. So we just assume token shadowing doesn't happen at the moment.

Is it fine if that gets handled in a follow-up PR since it doesn't seem to be related to what I'm trying to do here?

Sure. This PR already improves things rather than regresses.

@apainintheneck
Copy link
Contributor Author

That should still be unreachable for all non-core taps since it will never get past the next unless installed_formula_names.include?(formula_name) check in the loop. installed_formula_names is an array of short names and formula_name will always be a long name. Note that the core tap is never evaluated in this loop to begin with.

I'm also not really sure why we check every formula and cask in each named tap to see if it's installed. It seems like it'd be simpler to check every installed formula and cask to see if it's included in one of the named taps. But I guess that's an implementation detail.

@Bo98
Copy link
Member

Bo98 commented Mar 10, 2024

That should still be unreachable for all non-core taps since it will never get past the next unless installed_formula_names.include?(formula_name) check in the loop. installed_formula_namesis an array of short names and formula_name will always be a long name.

Not if you do

next unless installed_formula_names.include?(formula_name.split("/").last)

In the end, installed_formula_names.include is kinda useless as it does the "real" install check later, but does mean not having to load formulae unnecessarily which is ideal.

@apainintheneck
Copy link
Contributor Author

I'm fine with keeping the current logic intact and making the small changes mentioned here to get it working again. I do want to refactor it into a method or module so that I can add some regression tests though. I'll take a stab at that tomorrow.

@apainintheneck apainintheneck mentioned this pull request Mar 11, 2024
7 tasks
@reitermarkus
Copy link
Member

I'm fine with keeping the current logic intact

To me it makes more sense to eventually make all these methods only return short names/tokens instead of doing .split("/").last almost everywhere we use them.

That said, I think it's fine to fix the logic as-is for now.

@apainintheneck apainintheneck changed the title internal cask json v3: stop using full name as keys in hash cask: cask tokens should always be short form core cask tap Mar 12, 2024
@apainintheneck apainintheneck changed the title cask: cask tokens should always be short form core cask tap cask: always return short cask tokens from core cask tap Mar 12, 2024
@apainintheneck apainintheneck marked this pull request as ready for review March 12, 2024 04:20
@apainintheneck
Copy link
Contributor Author

To me it makes more sense to eventually make all these methods only return short names/tokens instead of doing .split("/").last almost everywhere we use them.

That said, I think it's fine to fix the logic as-is for now.

@reitermarkus That sounds like an interesting refactoring opportunity. Do you want to jump on that or should I create an issue for it?

@Bo98
Copy link
Member

Bo98 commented Mar 12, 2024

I agree with the stance that short tokens make sense, but the problem is that is definitely a backwards incompatible change to a public API and we've already disrupted third party code enough. It's also not a very obvious break as well if they pass it immediately to say Formula#[].

I'm less concerned with cask-only API changes as there's significantly less third-party taps around that, so in the scenario where we have to break one then we usually side on casks.

@apainintheneck
Copy link
Contributor Author

Since the untap command bugs will be handled in a separate PR now, is this fine as is or are their more changes that need to be made?

cask_tokens = Tap.each_with_object([]) do |tap, array|
# We can exclude the core cask tap because `CoreCaskTap#cask_tokens` returns short names by default.
if tap.official? && !tap.core_cask_tap?
tap.cask_tokens.each { |token| array << token.sub(%r{^homebrew/cask.*/}, "") }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was there before, but what's the point of this sub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that we want to only search by cask token for internal taps so that brew search version doesn't return all cask-versions casks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this is kinda begging for something like Formula#name vs. Formula#full_name

@apainintheneck apainintheneck merged commit 58b84c3 into master Mar 13, 2024
26 checks passed
@apainintheneck apainintheneck deleted the stop-using-full-name-as-key-in-cask-json-v3 branch March 13, 2024 05:51
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apainintheneck!

@github-actions github-actions bot added the outdated PR was locked due to age label Apr 13, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants