Skip to content

feat: lake: add revDiscovery policy to cache services#14230

Open
marcelolynch wants to merge 9 commits into
leanprover:masterfrom
marcelolynch:lake-cache-rev-discovery
Open

feat: lake: add revDiscovery policy to cache services#14230
marcelolynch wants to merge 9 commits into
leanprover:masterfrom
marcelolynch:lake-cache-rev-discovery

Conversation

@marcelolynch

Copy link
Copy Markdown
Contributor

This PR adds a revDiscovery setting to remote cache services that controls how lake cache get picks a revision's cached mapping: the default nearest walks Git history from HEAD to the closest cached revision (the existing behavior), while head serves only the current commit's mapping, isolating a build's cache to its exact revision.

For a head service, --max-revs does not apply and is ignored with a warning (escalatable to an error under --wfail). Set the policy per service in the system Lake config, e.g. revDiscovery = "head" under [[cache.service]], for either an s3 or reservoir service. lake cache services now annotates each service with its policy when it is not the default nearest.

Closes #14151

marcelolynch and others added 4 commits June 29, 2026 12:29
This PR lets a cache service opt into SHA-isolated downloads via a per-service `revDiscovery` policy. With `revDiscovery = "head"`, `lake cache get` consults only the current commit's mapping and never walks the Git history, so a build is only served the cache of the exact commit it is on — useful for consuming low-trust caches such as fork artifacts. The default, `revDiscovery = "nearest"`, keeps the existing behavior of backtracking to the nearest cached ancestor.

The policy is set per service in the system Lake configuration and threaded through to `cache get`'s revision lookup. For a `head` service, `--max-revs` does not apply and is ignored with a warning (which `--fail-level` can escalate), so the configured policy is always respected; `--rev` still pins an explicit revision and an explicit mappings file bypasses revision discovery entirely.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Consolidate the per-service revision-discovery policy onto the `RevDiscovery` type: `cache get`'s revision lookup now lives in a single `RevDiscovery.discover` method that switches on the strategy, rather than `findOutputs` branching on a `headOnly` boolean. Adding a future policy becomes one constructor plus one match arm.

Also make `RevDiscovery` single-source — `all` and `toString` are canonical, with `ofString?` and the TOML decode error derived from them — and surface the policy in `lake cache services`, which now annotates a service with its `revDiscovery` when it is not the default `nearest`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Use a `match` on `RevDiscovery` rather than `matches .nearest`/`else` when annotating the policy in `lake cache services`, reusing the bound value and reading as discrimination rather than a pattern guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an offline, deterministic test for `RevDiscovery.discover`. A `discoverWalk` lakefile script runs the walk over a controlled three-commit history with a stub lookup (no network or storage), so the test can assert that `nearest` walks back to a cached ancestor, `head` consults only `HEAD`, and `--max-revs` bounds the `nearest` walk while `head` ignores it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@marcelolynch marcelolynch requested a review from tydeu as a code owner June 30, 2026 22:41
@github-actions github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jun 30, 2026
@mathlib-lean-pr-testing

mathlib-lean-pr-testing Bot commented Jun 30, 2026

Copy link
Copy Markdown

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 9d0f28f22fc60d7a189c7486ed149b8b4d463d43 --onto 84b251f7390c20a0a00a221050ab4a6e6c46a191. You can force Mathlib CI using the force-mathlib-ci label. (2026-06-30 23:19:29)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 9d0f28f22fc60d7a189c7486ed149b8b4d463d43 --onto 89d728b53ccfda77971b5381649d2bbe9eae2039. You can force Mathlib CI using the force-mathlib-ci label. (2026-07-01 16:32:49)

@leanprover-bot

leanprover-bot commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 9d0f28f22fc60d7a189c7486ed149b8b4d463d43 --onto 28b99ec01f6275235fedecbac8e382cac682bf80. You can force reference manual CI using the force-manual-ci label. (2026-06-30 23:19:31)
  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 9d0f28f22fc60d7a189c7486ed149b8b4d463d43 --onto 89d728b53ccfda77971b5381649d2bbe9eae2039. You can force reference manual CI using the force-manual-ci label. (2026-07-01 17:36:41)

@tydeu tydeu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good! have a few requests regarding the code, and I will also need to evaluate the online test locally afterwards.

Comment thread src/lake/Lake/CLI/Main.lean Outdated
Comment thread src/lake/Lake/Config/Cache.lean Outdated
Comment thread src/lake/Lake/Load/Toml.lean Outdated
Comment thread src/lake/Lake/Config/Cache.lean Outdated
Comment thread src/lake/rfc-cache-read-chain.md Outdated
@tydeu tydeu added the changelog-lake Lake label Jul 1, 2026
marcelolynch and others added 3 commits July 1, 2026 09:58
These local RFC design notes were pushed by mistake and are not part of the feature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Mac Malone <tydeu@hatpress.net>
Move revision discovery out of the public Lake API into a private helper in `Lake.CLI.Main`, and revert `lake cache services` to a plain machine-parseable list of names. Update the command help and cache tests to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/lake/Lake/Config/Cache.lean Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-lake Lake toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: SHA-isolated lake cache get via configurable revision discovery

3 participants