Binary Cache: Add write-back support #1406

sadroeck · 2024-05-13T18:19:37Z

Write-back support for Binary Caching

This PR adds support for a cache write-back phase during package installation when the relevant cache entry was not newly built. This allows ports to be synced to all binary-cache providers marked as write or readwrite upon a successful installation.

Currently the vcpkg binary-caching layer will only populate a write-capable if the port was built from scratch, i.e. not restored from a read-capable cache provider. This means that a consumer has no way of guaranteeing other caches to be populated, unless they themselves build the port.

Use-cases

As a developer I have a local cache, e.g. file-system & a remote cache, e.g. S3. The remote cache is meant to be a fallback in case the local cache isn't populated. Once restored from the remote cache, I'd like the local cache to be populated with the restored entry, so if the port needs to be restored at a later time, it can be done from the cheaper local cache i.s.o. the remote cache.

Implementation details

Provider identity

The binary caching layer generally works by adding a list of read & write capable cache providers. The read providers are typically single-source, while the write providers are multi-source, e.g. multiple local file-system directories belong to 1 write provider. In order to identify which providers need be written back to, we need to uniquely identify each provider that qualifies for a write-back. This is done using a unique ProviderId, assigned during the parsing of the binary-cache segment list.

Write-back

The push_success() function is now also called post-install to populate the write-capable providers that were (during prefetch) marked as missing the port. These individually mark the cache entry as written back using mark_written_back().

Relevant issues/discussions

sadroeck · 2024-05-13T18:31:34Z

@microsoft-github-policy-service agree company="SingleStore"

sadroeck · 2024-05-19T00:41:09Z

Added some fixes related to Nuget prefetches & write-only binary providers.

BillyONeal

I agree this feature is something that we should have.

I think we might need a separate setting to control it, e.g. 'read', 'write', 'writeback'. For example, if someone has a stack of providers which are in 'furthest away' order, it would be bad to try to write back to a cache on another continent when a cache hit happens on a local directory, so users probably want some control over that and silently doing it to them without an opt in may be a breaking change.

To that end I believe there is a longstanding structural problem we probably need to fix before we can go here. Specifically, the documentation suggests and the original design intent was that cache providers would be visited in order. Unfortunately, there is actually an assumed order that sorts whatever the user says in an internally defined provider order, which seems like a bug.

The existing design is excessively complex because backends are tried to be smashed together, which was done to try to save the work in making zip files. The change here kinda makes that worse. It looks like the first half of getting rid of this problem was already done in #998

Instead of stacking ProviderId on top, now that there is no longer a reason to merge different users of the same provider type together, I would prefer to see a change that removes the multiple entries per provider entirely and just stores them in the user provided order. This will reduce complexity, actually match with what the documentation and design intent does, and make implementing the write-back behavior you're trying to achieve here much easier.

We probably should look at trying to land #908 first because autoantwort has been waiting forever ....

BillyONeal · 2024-06-05T19:21:36Z

include/vcpkg/base/util.h

@@ -44,6 +44,19 @@ namespace vcpkg::Util

            return false;
        }
+        template<class Vec, class Filter>
+        std::vector<ElementT<Vec>> filtered_copy(const Vec& container, const Filter&& filter)


Suggested change

std::vector<ElementT<Vec>> filtered_copy(const Vec& container, const Filter&& filter)

std::vector<ElementT<Vec>> copy_if(const Vec& container, Filter filter)

The standard algorithm this duplicates is copy_if so I think that should be in the name.

Filter functors are traditionally passed by value. (See for example key_equal above)

BillyONeal · 2024-06-05T19:22:09Z

include/vcpkg/binarycaching.h

@@ -114,44 +134,62 @@ namespace vcpkg
        std::string instantiate_variables(const BinaryPackageReadInfo& info) const;
    };

+    struct GithubActionsInfo


Don't think this should be checked in :)

sadroeck force-pushed the binary-cache-write-back-support branch from 8adc114 to efdf99f Compare May 13, 2024 18:28

sadroeck marked this pull request as draft May 13, 2024 18:38

sadroeck force-pushed the binary-cache-write-back-support branch 8 times, most recently from 7929022 to acb9b24 Compare May 19, 2024 00:05

feat(binary-cache): Add write-back support to binary cache

690198c

sadroeck force-pushed the binary-cache-write-back-support branch from acb9b24 to 690198c Compare May 19, 2024 00:23

sadroeck marked this pull request as ready for review May 19, 2024 00:41

BillyONeal reviewed Jun 7, 2024

View reviewed changes

WangWeiLin-MV linked an issue Jun 14, 2024 that may be closed by this pull request

[vcpkg-tool] vcpkg should cache binaries downloaded from AWS locally instead of downloading them every time microsoft/vcpkg#38684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary Cache: Add write-back support #1406

Binary Cache: Add write-back support #1406

sadroeck commented May 13, 2024

sadroeck commented May 13, 2024

sadroeck commented May 19, 2024

BillyONeal left a comment

BillyONeal Jun 5, 2024

BillyONeal Jun 5, 2024

	std::vector<ElementT<Vec>> filtered_copy(const Vec& container, const Filter&& filter)
	std::vector<ElementT<Vec>> copy_if(const Vec& container, Filter filter)

Binary Cache: Add write-back support #1406

Are you sure you want to change the base?

Binary Cache: Add write-back support #1406

Conversation

sadroeck commented May 13, 2024

Write-back support for Binary Caching

Use-cases

Implementation details

Provider identity

Write-back

Relevant issues/discussions

sadroeck commented May 13, 2024

sadroeck commented May 19, 2024

BillyONeal left a comment

Choose a reason for hiding this comment

BillyONeal Jun 5, 2024

Choose a reason for hiding this comment

BillyONeal Jun 5, 2024

Choose a reason for hiding this comment