Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - feat: caching build files on Azure #1230

Closed
wants to merge 44 commits into from

Conversation

arthurpaulino
Copy link
Collaborator

@arthurpaulino arthurpaulino commented Dec 27, 2022

This PR implements caching for Mathlib build files. The basic API is:

  • lake exe cache get to download missing build files
  • lake exe cache put to upload files that are missing on the server
  • lake exe cache to print the help menu

This caching method has a few enhancements on top of the solution we had for mathlib in Lean 3:

  1. Source files are content-addressed and build files are referenced by such hashes separately
  2. Minimized download traffic with queries that only pull files that are missing locally
  3. Minimized upload traffic with queries that only push files that are missing on the server

@arthurpaulino arthurpaulino added the awaiting-review The author would like community review of the PR label Dec 27, 2022
Copy link
Member

@gebner gebner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also upload some kind of manifest file for every upload, containing at least the list of referenced cache files and the git revision of mathlib4, so that we can do garbage collection.

Cache/IOUtils.lean Outdated Show resolved Hide resolved
Cache/IOUtils.lean Outdated Show resolved Hide resolved
Cache/IOUtils.lean Outdated Show resolved Hide resolved
Cache/IOUtils.lean Outdated Show resolved Hide resolved
Cache/IOUtils.lean Outdated Show resolved Hide resolved
Cache/Hashing.lean Show resolved Hide resolved
Cache/Requests.lean Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
Cache/Requests.lean Outdated Show resolved Hide resolved
@gebner gebner added awaiting-author A reviewer has asked the author a question or requested changes and removed awaiting-review The author would like community review of the PR labels Dec 27, 2022
@arthurpaulino
Copy link
Collaborator Author

arthurpaulino commented Dec 27, 2022

We should also upload some kind of manifest file for every upload, containing at least the list of referenced cache files and the git revision of mathlib4, so that we can do garbage collection.

Can we do garbage collection by detecting files that were created a long time ago? I believe anything older than 7 days can be discarded safely (it's easy to replace it anyways)

Cache/IOUtils.lean Outdated Show resolved Hide resolved
Co-authored-by: Gabriel Ebner <gebner@gebner.org>
.gitignore Outdated Show resolved Hide resolved
Cache/Hashing.lean Outdated Show resolved Hide resolved
Cache/Requests.lean Outdated Show resolved Hide resolved
Cache/Requests.lean Outdated Show resolved Hide resolved
@gebner
Copy link
Member

gebner commented Jan 3, 2023

There's still a couple TODOs here, but I think it's best to merge this now and iterate later.

Please add copyright headers (just copy&paste from the Mathlib files).

bors d+

@bors
Copy link

bors bot commented Jan 3, 2023

✌️ arthurpaulino can now approve this pull request. To approve and merge a pull request, simply reply with bors r+. More detailed instructions are available here.

@github-actions github-actions bot added delegated and removed awaiting-review The author would like community review of the PR labels Jan 3, 2023
@arthurpaulino
Copy link
Collaborator Author

bors r+

bors bot pushed a commit that referenced this pull request Jan 3, 2023
This PR implements caching for Mathlib build files. The basic API is:
* `lake exe cache get` to download missing build files
* `lake exe cache put` to upload files that are missing on the server
* `lake exe cache` to print the help menu

This caching method has a few enhancements on top of the solution we had for mathlib in Lean 3:
1. Source files are content-addressed and build files are referenced by such hashes separately
2. Minimized download traffic with queries that only pull files that are missing locally
3. Minimized upload traffic with queries that only push files that are missing on the server
@bors
Copy link

bors bot commented Jan 3, 2023

Pull request successfully merged into master.

Build succeeded:

@bors bors bot changed the title feat: caching build files on Azure [Merged by Bors] - feat: caching build files on Azure Jan 3, 2023
@bors bors bot closed this Jan 3, 2023
@bors bors bot deleted the ap/caching branch January 3, 2023 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants