Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only upgrade :latest casks when --greedy and the cask has been updated #13275

Merged
merged 1 commit into from May 18, 2022

Conversation

apainintheneck
Copy link
Contributor

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This PR is related to issue #11513 where version :latest casks would always be reinstalled when the --greedy or --greed-latest flags were passed to brew upgrade regardless of whether or not the casks had been updated.

Basically the idea is to store a sha256 hash of the original download file every time a version :latest cask is installed. Then, before upgrading casks cask#outdated? is queried which internally compares the new and old sha of version :latest casks to see if they should be updated. As far as I can see, the first download used to check the sha should be cached so it shouldn't need to be downloaded again before installing it.

The sha is stored in a file called LATEST_DOWNLOAD_SHA256 in the .metadata/ cask subdirectory. If there is no previous sha stored, it just automatically assumes the cask should be updated.

Fixes #11513.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

Would using an ETag, HTTP HEAD and/or on-demand SHA hashing allow simplifying any logic?

Comment on lines 137 to 139
@new_download_sha ||= Installer.new(
self, verify_download_integrity: false
).download(quiet: true).instance_eval { |x| Digest::SHA256.file(x).hexdigest }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@new_download_sha ||= Installer.new(
self, verify_download_integrity: false
).download(quiet: true).instance_eval { |x| Digest::SHA256.file(x).hexdigest }
@new_download_sha ||= Installer.new(
self, verify_download_integrity: false
).download(quiet: true)
.instance_eval { |x| Digest::SHA256.file(x).hexdigest }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll just line up .new, .download and .instance_eval vertically instead.

Comment on lines 167 to 177
if greedy || greedy_latest || (greedy_auto_updates && auto_updates)
if latest_version.latest?
return outdated_download_sha? ? versions : []
end
elsif auto_updates
return []
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if greedy || greedy_latest || (greedy_auto_updates && auto_updates)
if latest_version.latest?
return outdated_download_sha? ? versions : []
end
elsif auto_updates
return []
end
if greedy || greedy_latest || (greedy_auto_updates && auto_updates)
if latest_version.latest?
if outdated_download_sha?
return versions
else
return []
end
end
elsif auto_updates
return []
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this doesn't pass the linter. We could use a guard clause but that seems even uglier than the ternary operator imo. Which do you prefer?

# before
return outdated_download_sha? ? versions : []

# after
return versions if outdated_download_sha?

return []

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after

return versions if outdated_download_sha?
return []

This seems nice!

@apainintheneck
Copy link
Contributor Author

apainintheneck commented May 15, 2022

Would using an ETag, HTTP HEAD and/or on-demand SHA hashing allow simplifying any logic?

Maybe? I'm not super familiar with ETags to be honest but being able to see if files were changed at an address does sound useful. I think the last modified date of the HTTP HEAD command could be used here.

I don't think adding any of those things would simplify the logic per se but it could help reduce unnecessary downloads and shas. As far as I can see, a lot of that work is already being done in the download method where it defaults to the cached download if the remote one hasn't been updated (now that I write this maybe something like ETags or HTTP HEAD are being used internally by the downloader to verify that).

With the current code the following things need to be checked when trying to verify if a :latest version cask is outdated.

  1. No previous sha -> Outdated
    • No download or extra work
  2. Compare previous sha to new sha.
    • They match -> Up to date
    • They don't match -> Outdated
      • The downside of this approach is that the new sha ALWAYS get calculated even if we use the same cached download file.

Would it be possible to just use the ETag assuming it is unique enough to verify that the resource is newer than the last install? The thing I don't know off the top of my head is whether or not that those are used universally enough to be able to count on them. They seem to be optional and there is no agreed upon way to generate them.

I think the next thing I'll do is look at how the downloader decides when to use the cached download. This could in theory be used to decide when casks need updating.

@apainintheneck
Copy link
Contributor Author

Okay, I took a look around and I think that HTTP HEAD could be used to prevent unnecessarily downloading most files but not all. Here are the problems with using it to determine if the file in question is new.

  1. Not all URLs support HEAD requests.
  2. Those that do don't always include the Etag and/or Last-Modified elements needed to check the "freshness" of the file.

So in conclusion, we'd still need to use hashing as a backup way to make sure we aren't reinstalling the same cask when upgrading unless we're fine with a 90% solution (that's just an estimate based on looking at casks in the default tap).

It also would mean adding a method/class for retrieving and checking HEAD requests and all of that download information like the header elements and hash would need to be stored somewhere, generated and updated when installing version :latest casks. I could try and build that by using CurlDownloadStrategy as a guide. I'm honestly not sure how involved it would be though.

@apainintheneck
Copy link
Contributor Author

I might be wrong but it seems like at least the CurlDownloadStrategy checks if it should use the cached download by making another GET request. This means that it will download the same file multiple times which is not ideal.

I believe the download still gets cached though so one solution would be to store that pathname in the cask object instead of the installer. Then, if a cask uses an installer to download a file twice during the same session, it will default to the cached download.

Also, is HTTP HEAD used anywhere in the codebase currently?

@MikeMcQuaid
Copy link
Member

I might be wrong but it seems like at least the CurlDownloadStrategy checks if it should use the cached download by making another GET request. This means that it will download the same file multiple times which is not ideal.

I think it asks curl to resume the existing download with possible possibly making one of these a no-op?

Also, is HTTP HEAD used anywhere in the codebase currently?

Unsure! To be clear: was just spitballing enough and I'm happy for this to be merged as-is when you are 👍🏻

@apainintheneck
Copy link
Contributor Author

Sounds good. I think those were some good ideas and it makes sense to follow up on them. I'll keep poking around for now.

A sha256 hash of the previous download is stored and compared with
new downloads before updating :latest casks. This prevents unnecessary
reinstalls when the cask hasn't been updated.

Move download path to cask from installer to prevent unnecessary
redownloads of casks.
@apainintheneck
Copy link
Contributor Author

Okay, well after looking around I just don't think it's really worth it to pursue the HTTP HEAD angle right now.

Essentially, there are a few reasons for this.

  1. Very few casks (79/4016 in Homebrew-cask by my count) actually have version :latest set so it's seems like premature optimization.
  2. It would probably require changing multiple download strategies to get and store additional last-modified and ETag information during non-upgrade installs which seems like more trouble than it's worth.
  3. Like I mentioned before HTTP HEAD support doesn't seem to be universal so we'd still have to fallback to the sha anyway.

That being said I did change one thing and you'll have to tell me what you think. Essentially, I decided to store the download path in the Cask object instead of the Installer. This means that the download will only happen once during the lifetime of a Cask object. It passes all of the tests but there might be a scenario I'm not thinking of where you'd need to download a Cask multiple times in one go. It looks like this now.

def download(quiet: nil, timeout: nil)
  # Store cask download path in cask to prevent multiple downloads in a row when checking if it's outdated
  @cask.download ||= downloader.fetch(quiet: quiet, verify_download_integrity: @verify_download_integrity,
                                      timeout: timeout)
end

Other than that this PR should be ready.

@MikeMcQuaid
Copy link
Member

Okay, well after looking around I just don't think it's really worth it to pursue the HTTP HEAD angle right now.

Thanks for investigating and the great write-up ❤️

That being said I did change one thing and you'll have to tell me what you think. Essentially, I decided to store the download path in the Cask object instead of the Installer. This means that the download will only happen once during the lifetime of a Cask object. It passes all of the tests but there might be a scenario I'm not thinking of where you'd need to download a Cask multiple times in one go. It looks like this now.

Amazing work!

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work as usual, thanks @apainintheneck!

@MikeMcQuaid MikeMcQuaid merged commit e279c93 into Homebrew:master May 18, 2022
@apainintheneck apainintheneck deleted the upgrade-latest branch May 29, 2022 02:39
@github-actions github-actions bot added the outdated PR was locked due to age label Jun 29, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't run update on casks with :latest when the artifacts haven't changed
2 participants