feat: lake: add revDiscovery policy to cache services#14230
Open
marcelolynch wants to merge 9 commits into
Open
feat: lake: add revDiscovery policy to cache services#14230marcelolynch wants to merge 9 commits into
revDiscovery policy to cache services#14230marcelolynch wants to merge 9 commits into
Conversation
This PR lets a cache service opt into SHA-isolated downloads via a per-service `revDiscovery` policy. With `revDiscovery = "head"`, `lake cache get` consults only the current commit's mapping and never walks the Git history, so a build is only served the cache of the exact commit it is on — useful for consuming low-trust caches such as fork artifacts. The default, `revDiscovery = "nearest"`, keeps the existing behavior of backtracking to the nearest cached ancestor. The policy is set per service in the system Lake configuration and threaded through to `cache get`'s revision lookup. For a `head` service, `--max-revs` does not apply and is ignored with a warning (which `--fail-level` can escalate), so the configured policy is always respected; `--rev` still pins an explicit revision and an explicit mappings file bypasses revision discovery entirely. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Consolidate the per-service revision-discovery policy onto the `RevDiscovery` type: `cache get`'s revision lookup now lives in a single `RevDiscovery.discover` method that switches on the strategy, rather than `findOutputs` branching on a `headOnly` boolean. Adding a future policy becomes one constructor plus one match arm. Also make `RevDiscovery` single-source — `all` and `toString` are canonical, with `ofString?` and the TOML decode error derived from them — and surface the policy in `lake cache services`, which now annotates a service with its `revDiscovery` when it is not the default `nearest`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Use a `match` on `RevDiscovery` rather than `matches .nearest`/`else` when annotating the policy in `lake cache services`, reusing the bound value and reading as discrimination rather than a pattern guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an offline, deterministic test for `RevDiscovery.discover`. A `discoverWalk` lakefile script runs the walk over a controlled three-commit history with a stub lookup (no network or storage), so the test can assert that `nearest` walks back to a cached ancestor, `head` consults only `HEAD`, and `--max-revs` bounds the `nearest` walk while `head` ignores it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Mathlib CI status (docs):
|
Collaborator
|
Reference manual CI status:
|
tydeu
requested changes
Jul 1, 2026
tydeu
left a comment
Member
There was a problem hiding this comment.
Looks generally good! have a few requests regarding the code, and I will also need to evaluate the online test locally afterwards.
These local RFC design notes were pushed by mistake and are not part of the feature. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Mac Malone <tydeu@hatpress.net>
Move revision discovery out of the public Lake API into a private helper in `Lake.CLI.Main`, and revert `lake cache services` to a plain machine-parseable list of names. Update the command help and cache tests to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tydeu
reviewed
Jul 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a
revDiscoverysetting to remote cache services that controls howlake cache getpicks a revision's cached mapping: the defaultnearestwalks Git history fromHEADto the closest cached revision (the existing behavior), whileheadserves only the current commit's mapping, isolating a build's cache to its exact revision.For a
headservice,--max-revsdoes not apply and is ignored with a warning (escalatable to an error under--wfail). Set the policy per service in the system Lake config, e.g.revDiscovery = "head"under[[cache.service]], for either ans3orreservoirservice.lake cache servicesnow annotates each service with its policy when it is not the defaultnearest.Closes #14151