Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use leantar in lake exe cache #5710

Merged
merged 10 commits into from
Jul 17, 2023
Merged

feat: use leantar in lake exe cache #5710

merged 10 commits into from
Jul 17, 2023

Conversation

digama0
Copy link
Member

@digama0 digama0 commented Jul 4, 2023

This hooks up the new leantar tool from https://github.com/digama0/leangz to lake exe cache. It uses an olean compressor to achieve much smaller file sizes, and faster decompression. See the Zulip discussion.

The new files have a new file extension, 123.ltar instead of 123.tar.gz. This implementation does not multiplex between them, so it will be a hard reset. In fact, the new cache doesn't even know how to clear the old cache files, so you might need to do that manually. Switching between branches will still work during the interim, since the cache will have both kinds of file and use the right ones.


Open in Gitpod

Cache/IO.lean Outdated Show resolved Hide resolved
@semorrison
Copy link
Contributor

All working for me now, and a massive speedup:

# Before #5710
lake clean && lake exe cache get!   # 343s
lake exe cache get                  # 17s

# After #5710
lake clean && lake exe cache get!   # 49s
lake exe cache get                  # 4s

@semorrison semorrison added the blocked-by-other-PR This PR depends on another PR to Mathlib label Jul 4, 2023
@digama0
Copy link
Member Author

digama0 commented Jul 11, 2023

Oh wow, I just had the pleasure of running lake exe cache get on a low bandwidth hotel wifi, and it took 25 minutes. A (projected) 7x decrease to 3.57 minutes would make quite a big difference!

@semorrison semorrison added merge-conflict The PR has a merge conflict with master, and needs manual merging. and removed merge-conflict The PR has a merge conflict with master, and needs manual merging. blocked-by-other-PR This PR depends on another PR to Mathlib labels Jul 11, 2023
@semorrison
Copy link
Contributor

This PR/issue depends on:

@semorrison
Copy link
Contributor

bors merge

@github-actions github-actions bot added ready-to-merge This PR has been sent to bors. and removed awaiting-review labels Jul 13, 2023
bors bot pushed a commit that referenced this pull request Jul 13, 2023
This hooks up the new `leantar` tool from https://github.com/digama0/leangz to `lake exe cache`. It uses an olean compressor to achieve much smaller file sizes, and faster decompression. See the [Zulip discussion](https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/leangz.3A.20olean.20file.20compressor/near/369738648).

The new files have a new file extension, `123.ltar` instead of `123.tar.gz`. This implementation does not multiplex between them, so it will be a hard reset. In fact, the new cache doesn't even know how to clear the old cache files, so you might need to do that manually. Switching between branches will still work during the interim, since the cache will have both kinds of file and use the right ones.
@bors
Copy link

bors bot commented Jul 13, 2023

Pull request successfully merged into master.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title feat: use leantar in lake exe cache [Merged by Bors] - feat: use leantar in lake exe cache Jul 13, 2023
@bors bors bot closed this Jul 13, 2023
@bors bors bot deleted the leantar branch July 13, 2023 06:09
semorrison added a commit that referenced this pull request Jul 13, 2023
@semorrison semorrison restored the leantar branch July 13, 2023 07:22
@semorrison semorrison reopened this Jul 13, 2023
@semorrison semorrison changed the title [Merged by Bors] - feat: use leantar in lake exe cache feat: use leantar in lake exe cache Jul 13, 2023
@semorrison semorrison added awaiting-review and removed ready-to-merge This PR has been sent to bors. labels Jul 13, 2023
bors bot pushed a commit that referenced this pull request Jul 13, 2023
This reverts commit 6ed1d66.

Currently broken on gitpod (needs `sudo apt-get install xz-utils`) and on Windows (likely the same issue).



Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
gebner
gebner previously requested changes Jul 13, 2023
Cache/IO.lean Outdated Show resolved Hide resolved
@digama0 digama0 requested a review from gebner July 14, 2023 21:32
s!"https://github.com/digama0/leangz/releases/download/v{LEANTARVERSION}/leantar-v{LEANTARVERSION}-{target}.{ext}",
"-L", "-o", s!"{LEANTARBIN}.{ext}"]
let _ ← runCmd "tar" #["-xf", s!"{LEANTARBIN}.{ext}",
"-C", IO.CACHEDIR.toString, "--strip-components=1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on windows (i.e., with zip)?

Copy link
Member Author

@digama0 digama0 Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work... not sure how universally available tar is on windows but this exact command was tested and works on at least a few machines. (The old cache was also using tar for unpacking on windows and I don't think we have received any complaints about it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

The old cache was also using tar for unpacking on windows and I don't think we have received any complaints about it.

The old cache used tar to unpack tar files. This line here uses tar to unpack zip files. But if it works. 🤷

@gebner
Copy link
Member

gebner commented Jul 15, 2023

I am happy with the static linking issue now. The branch works fine on nixos.

@fpvandoorn complained about windows performance on zulip. Is this still an issue (is it even a regression) and is there something we can do about it?

@digama0
Copy link
Member Author

digama0 commented Jul 15, 2023

We investigated it and it doesn't seem like there is much we can do about it, the bottleneck is the write itself and I think windows is just slow at writing lots of files. I don't think it is a regression compared to the original tar script, but perhaps @fpvandoorn could do a clean test and confirm this.

@fpvandoorn
Copy link
Member

When testing a bit more carefully yesterday, I think now that leantar is still quite a bit quicker than the original tar. I don't have time to test it atm, but I'm happy if this gets merged.

@gebner
Copy link
Member

gebner commented Jul 15, 2023

bors r+

@bors
Copy link

bors bot commented Jul 15, 2023

Already running a review

@github-actions github-actions bot added ready-to-merge This PR has been sent to bors. and removed awaiting-review labels Jul 15, 2023
@semorrison
Copy link
Contributor

bors seems to not be working. I'll try again.

@semorrison
Copy link
Contributor

bors r-

@semorrison
Copy link
Contributor

bors merge

@bors
Copy link

bors bot commented Jul 17, 2023

Already running a review

@semorrison semorrison dismissed gebner’s stale review July 17, 2023 22:04

Seems to be resolved, and I'm dismissing this in case it is interacting with the strange bors behaviour.

@semorrison semorrison added awaiting-review and removed ready-to-merge This PR has been sent to bors. labels Jul 17, 2023
@semorrison
Copy link
Contributor

I think bors is failing here because it has already merged this PR once (my fault, I merged too early and had to revert). I'm just going to squash merge now.

@semorrison semorrison merged commit ede7b29 into master Jul 17, 2023
28 checks passed
@bors bors bot deleted the leantar branch July 17, 2023 22:09
semorrison pushed a commit that referenced this pull request Aug 14, 2023
This hooks up the new `leantar` tool from https://github.com/digama0/leangz to `lake exe cache`. It uses an olean compressor to achieve much smaller file sizes, and faster decompression. See the [Zulip discussion](https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/leangz.3A.20olean.20file.20compressor/near/369738648).

The new files have a new file extension, `123.ltar` instead of `123.tar.gz`. This implementation does not multiplex between them, so it will be a hard reset. In fact, the new cache doesn't even know how to clear the old cache files, so you might need to do that manually. Switching between branches will still work during the interim, since the cache will have both kinds of file and use the right ones.
semorrison added a commit that referenced this pull request Aug 14, 2023
This reverts commit 6ed1d66.

Currently broken on gitpod (needs `sudo apt-get install xz-utils`) and on Windows (likely the same issue).



Co-authored-by: Scott Morrison <scott.morrison@gmail.com>
semorrison pushed a commit that referenced this pull request Aug 14, 2023
* feat: use `leantar` in `lake exe cache`

* run pack in parallel

* bump leantar

* version check

* use hex for hashes

* chore: bump to 0.1.2, add .exe on windows

* fix: file extension on linux

* fix: arch is reported as arm64 sometimes

* fix: statically linked 0.1.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants