Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vcpkg] RFC: Binarycaching #11204

Merged

Conversation

ras0219-msft
Copy link
Contributor

@ras0219-msft ras0219-msft commented May 6, 2020

We are currently working on enabling caching and reuse of binaries to accelerate CI and per-project vcpkg instances; this our current working draft of the specification, ready for public commentary!

A limited form of this spec is already implemented and available in the tool today, via either --binarycaching or set VCPKG_FEATURE_FLAGS=binarycaching which will default to using $vcpkg_root/archives.

We would love feedback about whether this feature is useful to you or any additional scenarios you'd like to see covered!


However, we notably do not currently track the compiler used. This is critical for all cross-machine scenarios, as the environment is likely to change incompatibly from machine to machine. We propose hashing the compiler that will used by CMake. This can be accomplished either by reimplementing the logic of CMake or running some partial project and extracting the results. For performance reasons, we will prefer first using heuristics to approximate the CMake logic with accompanying documentation for users that fall outside those bounds.

Another aspect of the environment we don't currently track is the CRT version on Linux systems. Currently, we believe this will not cause as many problems in most practices (thus not suitable for an MVP), since the compiler will (generally) link against the system CRT and should sufficiently reflect any differences. This can also be easily worked around by the user with documentation – the toolchain file can simply have a comment such as "# this uses muslc", which will cause it to hash differently.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that stuff like the glibc version may differ on different distributions unrelated to the compiler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure, the hash is produced from the toolchain file as a plain file, without any structuring, correct? To reuse the same binaries you'd need to have exactly the same toolchain files?
For example, if you want to enable a different library feature - hashes won't clash, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the whole file is hashed. If there's any difference you won't end up with the same hash for the package.

@traversaro
Copy link
Contributor

Thanks a lot for working on this. This is probably out of scope of the initial MVP, but have you consider the possibility of optionally download a binary with a different hash then what it would be build locally? I am thinking of cases like #10522 , in which a port does not compile under a given Visual Studio version for a Visual Studio regression, and then a binary compiled with a different version can be downloaded instead.

@ras0219-msft
Copy link
Contributor Author

ras0219-msft commented May 7, 2020

One core concept of our binarycaching approach is that this is only intended as an accelerant for what could be done locally; it's not intended to enable developers to (for example) not have a compiler toolchain at all.

In the case you mentioned, the workaround would be to have both toolchains available locally and use something like

if(PORT STREQUAL "ace")
  set(VCPKG_VISUAL_STUDIO_PATH "C:\\path\\to\\older\\vs")
endif()

This does have some problems (absolute paths to VS instances aren't portable), but it illustrates the overall direction where we would look for a solution.

[1] https://github.com/Microsoft/vcpkg/blob/master/docs/users/triplets.md#vcpkg_visual_studio_path

@meastp meastp mentioned this pull request May 8, 2020
@BillyONeal
Copy link
Member

After seeing what the space consumption for our CI looks like, we also probably need to think about an eviction policy, like 'delete items from the cache which are not current hashes and have not been accessed in over $timeframe.

@bouffa
Copy link

bouffa commented May 30, 2020

Will be there a way to remove older versions of the packages from the binary archive? If I get it right, for now, if a port got updated (or a new CMake version etc...), a new binary archive will be built for it, but the older archives still remain in the archive directory, am I right?

@ras0219-msft
Copy link
Contributor Author

In this initial implementation, we are not building any specific cache invalidation functionality into vcpkg. Depending on what backend you use, there might be an option on the provider's side to perform access-time based garbage collection (for example, Azure DevOps Artifacts has that for NuGet feeds).

We think that for most users, a once-in-a-while delete of ~/.vcpkg/archives will be sufficient; I could see us providing a courtesy function for that. In the future, we may investigate providing a more sophisticated system for the files-based backend based on file modified/access times.

docs/specifications/binarycaching.md Outdated Show resolved Hide resolved
docs/specifications/binarycaching.md Outdated Show resolved Hide resolved
Note change to XDG directory structure.
Replace 'upload' config keyword with more flexible read/write/readwrite keywords.
@ras0219-msft ras0219-msft merged commit f38a61d into microsoft:master Jul 14, 2020
BillyONeal added a commit to microsoft/vcpkg-tool that referenced this pull request Dec 8, 2020
Changes to the binary caching spec made as comments over at microsoft/vcpkg#11204 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:vcpkg-feature The issue is a new capability of the tool that doesn’t already exist and we haven’t committed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants